Hadoop for Developers and Administrators Schulung

Kurs Code

HadoopDevAd

Dauer

21 hours (üblicherweise 3 Tage inklusive Pausen)

Überblick

Hadoop ist das beliebteste Big Data-Verarbeitungsframework .

Machine Translated

Schulungsübersicht

Module 1. Introduction to Hadoop

  • The Hadoop Distributed File System (HDFS)
  • The Read Path and The Write Path
  • Managing Filesystem Metadata
  • The Namenode and the Datanode
  • The Namenode High Availability
  • Namenode Federation
  • The Command-Line Tools
  • Understanding REST Support

Module 2. Introduction to MapReduce

  • Analyzing the Data with Hadoop
  • Map and Reduce Pattern
  • Java MapReduce
  • Scaling Out
  • Data Flow
  • Developing Combiner Functions
  • Running a Distributed MapReduce Job

Module 3. Planning a Hadoop Cluster

  • Picking a Distribution and Version of Hadoop
  • Versions and Features
  • Hardware Selection
  • Master and Worker Hardware Selection
  • Cluster Sizing
  • Operating System Selection and Preparation
  • Deployment Layout
  • Setting up Users, Groups, and Privileges
  • Disk Configuration
  • Network Design

Module 4. Installation and Configuration

  • Installing Hadoop
  • Configuration: An Overview
  • The Hadoop XML Configuration Files
  • Environment Variables and Shell Scripts
  • Logging Configuration
  • Managing HDFS
  • Optimization and Tuning
  • Formatting the Namenode
  • Creating a /tmp Directory
  • Thinking Namenode High Availability
  • The Fencing Options
  • Automatic Failover Configuration
  • Format and Bootstrap the Namenodes
  • Namenode Federation

Module 5. Understanding Hadoop I/O

  • Data Integrity in HDFS  
  • Understanding Codecs
  • Compression and Input Splits
  • Using Compression in MapReduce
  • The Serialization mechanism
  • File-Based Data Structures
  • The SequenceFile format
  • Other File Formats and Column-Oriented Formats

Module 6. Developing a MapReduce Application

  • The Configuration API 
  • Setting Up the Development Environment
  • Managing Configuration
  • GenericOptionsParser, Tool, and ToolRunner
  • Writing a Unit Test with MRUnit
  • The Mapper and Reducer
  • Running Locally on Test Data 
  • Testing the Driver
  • Running on a Cluster
  • Packaging and Launching a Job
  • The MapReduce Web UI
  • Tuning a Job

Module 7. Identity, Authentication, and Authorization

  • Managing Identity
  • Kerberos and Hadoop
  • Understanding Authorization

Module 8. Resource Management

  • What Is Resource Management?
  • HDFS Quotas
  • MapReduce Schedulers
  • Anatomy of a YARN Application Run
  • Resource Requests
  • Application Lifespan
  • YARN Compared to MapReduce 1
  • Scheduling in YARN
  • Scheduler Options
  • Capacity Scheduler Configuration
  • Fair Scheduler Configuration
  • Delay Scheduling
  • Dominant Resource Fairness

Module 9. MapReduce Types and Formats

  • MapReduce Types
  • The Default MapReduce Job
  • Defining the Input Formats
  • Managing Input Splits and Records
  • Text Input and Binary Input
  • Managing Multiple Inputs
  • Database Input (and Output)
  • Output Formats
  • Text Output and Binary Output
  • Managing Multiple Outputs
  • The Database Output

Module 10. Using MapReduce Features

  • Using Counters
  • Reading Built-in Counters
  • User-Defined Java Counters
  • Understanding Sorting
  • Using the Distributed Cache

Module 11. Cluster Maintenance and Troubleshooting

  • Managing Hadoop Processes
  • Starting and Stopping Processes with Init Scripts
  • Starting and Stopping Processes Manually
  • HDFS Maintenance Tasks
  • Adding a Datanode
  • Decommissioning a Datanode
  • Checking Filesystem Integrity with fsck
  • Balancing HDFS Block Data
  • Dealing with a Failed Disk
  • MapReduce Maintenance Tasks 
  • Killing a MapReduce Job
  • Killing a MapReduce Task
  • Managing Resource Exhaustion

Module 12. Monitoring

  • The available Hadoop Metrics
  • The role of SNMP
  • Health Monitoring
  • Host-Level Checks
  • HDFS Checks
  • MapReduce Checks

Module 13. Backup and Recovery

  • Data Backup
  • Distributed Copy (distcp)
  • Parallel Data Ingestion
  • Namenode Metadata

Erfahrungsberichte

★★★★★
★★★★★

Verwandte Kategorien

Sonderangebote

Sonderangebote Newsletter

Wir behandeln Ihre Daten vertraulich und werden sie nicht an Dritte weitergeben.
Sie können Ihre Einstellungen jederzeit ändern oder sich ganz abmelden.

EINIGE UNSERER KUNDEN

is growing fast!

We are looking to expand our presence in Austria!

As a Business Development Manager you will:

  • expand business in Austria
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!