SlideShare a Scribd company logo
Page 1 of 4
Big Data – Apache Hadoop Administrator Training
Objective
This training aims to provide the participants with a comprehensive understanding
of all the steps necessary to operate and maintain a Hadoop cluster. From
Installation and configuration through load-balancing and tuning.
The participants will learn the complete Installation of Hadoop Cluster, understand
the basic and advanced concepts of Map Reduce and the best practices for Apache
Hadoop Development as experienced by the developers and architects of core
Apache Hadoop. With the help of hands-on exercises, participants will learn the
following topics during the course.
1. The internals of MapReduce and HDFS and how to build Hadoop
Architecture.
2. Proper cluster configuration and deployment to integrate with systems
and hardware in data centre.
3. How to load data into cluster from dynamically-generated files using
Flume and from RDBMS using Sqoop.
4. Configuring the FairScheduler to provide service-level agreements for
multiple users of a cluster.
5. Discussing Kerberos-based security for your cluster.
6. Best practices for preparing and maintaining Apache Hadoop in
production.
7. Troubleshooting, diagnosing, tuning and solving Hadoop issues.
Note: The course will be have 20% of theoretical discussion and 80% of actual
hands on
Audience & Pre-Requisites
This course is designed for Systems Administrators and IT Managers who have
basic Linux experience. No need for prior knowledge of Apache Hadoop.
Duration: 30 hours
Course Outline
• Introduction
• The Case for Apache Hadoop
o A Brief History of Hadoop
Page 2 of 4
o Core Hadoop Components
o Fundamental Concepts
• The Hadoop Distributed File System
o HDFS Features
o HDFS Design Assumptions
o Overview of HDFS Architecture
• MapReduce and YARN
o What Is MapReduce?
o Features of MapReduce
o Basic MapReduce Concepts
o Architectural Overview
o Hands-On Exercise
• An Overview of the Hadoop Ecosystem
o What is the Hadoop Ecosystem?
o Analysis Tools
o Data Storage and Retrieval Tools
• Overview of Cloudera Distributions of Hadoop
o What is CDH?
• Overview of Hortonworks Distributions of Hadoop
• Planning your Hadoop Cluster
o General planning Considerations
o Choosing the Right Hardware
o Network Considerations
• Gen1 – Pseudo and 4 Node Cluster -Vanilla Hadoop
o Installation
o Configuration
o Performance Aspects
• Installation a 4 Node with NN, SNN, JT in EC2
• Hadoop Installation
o Deployment Types
o Installing Hadoop
o Basic Configuration Parameters
o Hands-On Exercise
Page 3 of 4
• Advanced Configuration
o Advanced Parameters
o Configuring Rack Awareness
• Hadoop Security
o Why Hadoop Security Is Important
o Hadoop’ s Security System Concepts
o What Kerberos Is and How it Works
• Gen2 Pseudo Cluster – Vanilla Cluster
o Installation of Hadoop
o Hadoop 2 Configuration
o Hadoop Federation Capability
• Configuring HA in Gen2
• Configuring Federation in Gen2
Managing and Scheduling Jobs
o Managing Running Jobs
o Hands-On Exercise
o The Capacity Scheduler
• Cluster Maintenance
o Checking HDFS Status
o Hands-On Exercise
o Copying Data Between Clusters
o Adding and Removing Cluster Nodes [ Node Maintenance]
o Rebalancing the Cluster
o Hands-On Exercise
o NameNode Metadata Backup
o Cluster Upgrading
o User Management
o Quota Management
• Cluster Monitoring and Troubleshooting
o General System Monitoring
o Managing Hadoop’ s Log Files
o Using the NameNode and JobTracker Web UIs
o Hands-On Exercise
o Cluster Monitoring with Ganglia
o Common Troubleshooting Issues
o Benchmarking Your Cluster
Page 4 of 4
• Installing and Managing Other Hadoop Projects
o Hive
o Pig
o Sqoop
• Working with Apache Ambari
o Installation of a 4 Node cluster
o Web HDFS
o Security in Ambari
o Adding new host via Ambari
o Configuring Capacity Scheduler
o Mounting HDFS
o HDFS Snapshots

More Related Content

PDF
Hadoop_Architect__eVenkat
PDF
Day1_23Aug.txt - Notepad
PDF
Hadoop_RealTime_Processing_eVenkat
PPTX
project--2 nd review_2
PDF
Hadoop ecosystem
PPTX
Hadoop
PPTX
Hadoop And Their Ecosystem
PPTX
Intro to Apache Spark by Marco Vasquez
Hadoop_Architect__eVenkat
Day1_23Aug.txt - Notepad
Hadoop_RealTime_Processing_eVenkat
project--2 nd review_2
Hadoop ecosystem
Hadoop
Hadoop And Their Ecosystem
Intro to Apache Spark by Marco Vasquez

What's hot (20)

PPTX
Hadoop vs Apache Spark
PPTX
PPT on Hadoop
PPTX
HADOOP TECHNOLOGY ppt
PDF
Introduction To Hadoop Ecosystem
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPT
Hadoop distributions - ecosystem
PDF
Big Data and Hadoop Ecosystem
PPTX
Hadoop Architecture
PPTX
Hadoop
PPT
HW09 Hadoop Vaidya
PDF
Hadoop Ecosystem
DOCX
Rameez Rangrez_Hadoop_Admin
PDF
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
PPTX
Apache hadoop technology : Beginners
PDF
Spark vs Hadoop
PPTX
Big Data and Hadoop Introduction
PDF
알쓸신잡
PDF
SQOOP - RDBMS to Hadoop
PPTX
Big data and tools
Hadoop vs Apache Spark
PPT on Hadoop
HADOOP TECHNOLOGY ppt
Introduction To Hadoop Ecosystem
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop distributions - ecosystem
Big Data and Hadoop Ecosystem
Hadoop Architecture
Hadoop
HW09 Hadoop Vaidya
Hadoop Ecosystem
Rameez Rangrez_Hadoop_Admin
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Apache hadoop technology : Beginners
Spark vs Hadoop
Big Data and Hadoop Introduction
알쓸신잡
SQOOP - RDBMS to Hadoop
Big data and tools
Ad

Viewers also liked (10)

DOCX
Owez_IBM_Hadoop_Admin
DOCX
Vijay_hadoop admin
PDF
CV_SONU..
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
DOC
Archana Jaiswal Resume
PDF
UX, ethnography and possibilities: for Libraries, Museums and Archives
PDF
Designing Teams for Emerging Challenges
PDF
Visual Design with Data
PDF
3 Things Every Sales Team Needs to Be Thinking About in 2017
PDF
How to Become a Thought Leader in Your Niche
Owez_IBM_Hadoop_Admin
Vijay_hadoop admin
CV_SONU..
Introduction to Cloudera's Administrator Training for Apache Hadoop
Archana Jaiswal Resume
UX, ethnography and possibilities: for Libraries, Museums and Archives
Designing Teams for Emerging Challenges
Visual Design with Data
3 Things Every Sales Team Needs to Be Thinking About in 2017
How to Become a Thought Leader in Your Niche
Ad

Similar to Hadoop_Admin_eVenkat (20)

PDF
Cloudera Hadoop Administrator Content - ReadyNerd
PDF
Hadoop Administration Certification Training in Bangalore
PDF
Power Hadoop Cluster with AWS Cloud
PDF
Hadoop Administration Online Training.pdf
PDF
Apache hadoop-administrator-training
PDF
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
PDF
Hadoop administarrtion
PDF
Top 5 Hadoop Admin Tasks
PDF
Webinar: Top 5 Hadoop Admin Tasks
ODT
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
PDF
Secure Hadoop Cluster With Kerberos
PDF
Introduction to hadoop administration jk
PDF
Hadoop training kit from lcc infotech
PDF
Hadoop Mapreduce Cookbook Srinath Perera Thilina Gunarathne
PPTX
A day in the life of hadoop administrator!
PDF
Hadoop Operations 1st Edition Eric Sammer
PPTX
Distro-independent Hadoop cluster management
PPTX
A Day in the Life of a Hadoop Administrator
PPTX
Introduction to Hadoop Administration
PDF
Hadoop course content
Cloudera Hadoop Administrator Content - ReadyNerd
Hadoop Administration Certification Training in Bangalore
Power Hadoop Cluster with AWS Cloud
Hadoop Administration Online Training.pdf
Apache hadoop-administrator-training
Technix-Pro Cloudera Certified Admin for Hadoop Exam Prep.
Hadoop administarrtion
Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Secure Hadoop Cluster With Kerberos
Introduction to hadoop administration jk
Hadoop training kit from lcc infotech
Hadoop Mapreduce Cookbook Srinath Perera Thilina Gunarathne
A day in the life of hadoop administrator!
Hadoop Operations 1st Edition Eric Sammer
Distro-independent Hadoop cluster management
A Day in the Life of a Hadoop Administrator
Introduction to Hadoop Administration
Hadoop course content

Hadoop_Admin_eVenkat

  • 1. Page 1 of 4 Big Data – Apache Hadoop Administrator Training Objective This training aims to provide the participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. From Installation and configuration through load-balancing and tuning. The participants will learn the complete Installation of Hadoop Cluster, understand the basic and advanced concepts of Map Reduce and the best practices for Apache Hadoop Development as experienced by the developers and architects of core Apache Hadoop. With the help of hands-on exercises, participants will learn the following topics during the course. 1. The internals of MapReduce and HDFS and how to build Hadoop Architecture. 2. Proper cluster configuration and deployment to integrate with systems and hardware in data centre. 3. How to load data into cluster from dynamically-generated files using Flume and from RDBMS using Sqoop. 4. Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster. 5. Discussing Kerberos-based security for your cluster. 6. Best practices for preparing and maintaining Apache Hadoop in production. 7. Troubleshooting, diagnosing, tuning and solving Hadoop issues. Note: The course will be have 20% of theoretical discussion and 80% of actual hands on Audience & Pre-Requisites This course is designed for Systems Administrators and IT Managers who have basic Linux experience. No need for prior knowledge of Apache Hadoop. Duration: 30 hours Course Outline • Introduction • The Case for Apache Hadoop o A Brief History of Hadoop
  • 2. Page 2 of 4 o Core Hadoop Components o Fundamental Concepts • The Hadoop Distributed File System o HDFS Features o HDFS Design Assumptions o Overview of HDFS Architecture • MapReduce and YARN o What Is MapReduce? o Features of MapReduce o Basic MapReduce Concepts o Architectural Overview o Hands-On Exercise • An Overview of the Hadoop Ecosystem o What is the Hadoop Ecosystem? o Analysis Tools o Data Storage and Retrieval Tools • Overview of Cloudera Distributions of Hadoop o What is CDH? • Overview of Hortonworks Distributions of Hadoop • Planning your Hadoop Cluster o General planning Considerations o Choosing the Right Hardware o Network Considerations • Gen1 – Pseudo and 4 Node Cluster -Vanilla Hadoop o Installation o Configuration o Performance Aspects • Installation a 4 Node with NN, SNN, JT in EC2 • Hadoop Installation o Deployment Types o Installing Hadoop o Basic Configuration Parameters o Hands-On Exercise
  • 3. Page 3 of 4 • Advanced Configuration o Advanced Parameters o Configuring Rack Awareness • Hadoop Security o Why Hadoop Security Is Important o Hadoop’ s Security System Concepts o What Kerberos Is and How it Works • Gen2 Pseudo Cluster – Vanilla Cluster o Installation of Hadoop o Hadoop 2 Configuration o Hadoop Federation Capability • Configuring HA in Gen2 • Configuring Federation in Gen2 Managing and Scheduling Jobs o Managing Running Jobs o Hands-On Exercise o The Capacity Scheduler • Cluster Maintenance o Checking HDFS Status o Hands-On Exercise o Copying Data Between Clusters o Adding and Removing Cluster Nodes [ Node Maintenance] o Rebalancing the Cluster o Hands-On Exercise o NameNode Metadata Backup o Cluster Upgrading o User Management o Quota Management • Cluster Monitoring and Troubleshooting o General System Monitoring o Managing Hadoop’ s Log Files o Using the NameNode and JobTracker Web UIs o Hands-On Exercise o Cluster Monitoring with Ganglia o Common Troubleshooting Issues o Benchmarking Your Cluster
  • 4. Page 4 of 4 • Installing and Managing Other Hadoop Projects o Hive o Pig o Sqoop • Working with Apache Ambari o Installation of a 4 Node cluster o Web HDFS o Security in Ambari o Adding new host via Ambari o Configuring Capacity Scheduler o Mounting HDFS o HDFS Snapshots