SlideShare a Scribd company logo
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
1
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
2 
HADOOP BIG DATA TRAINING 
Big Data is huge amount of structured or unstructured data, to process this data using traditional methods having a problem of execution time and cost, and needs to process Bytes of data from different locations. The Big Data technologies can do this work four or more times faster than traditional data warehousing approaches 
Apache Hadoop is an open source framework designed to maintain huge amount of data. It helps organizations to analyse huge volumes like data size in Peta bytes. The data may be structured or unstructured. 
Hadoop scales virtually without limits and handles any type of data. No matter how data is encoded or formatted and structured or unstructured. Hadoop is Enterprise proven, already delivering on the promise of Big data across numerous industries - Cloudera 
Apache Hadoop is an open source framework that supports data intensive distributed computing application to process huge amount of data. It is licensed under Apache v2 license. It supports the running of applications on large clusters of commodity hardware - En.wikipedia 
We have designed this course to provide information and technical skills to become successful Hadoop developer. 
About certification: 
Cloudera delivers a developer-focused certification program CLOUDERA CERTIFIED HADOOP DEVELOPER (CCDH) CDH-410 that closely analyses Hadoop’s structure and provides hands-on exercises that teach you how to import data from existing sources; process data with a variety of techniques such as Java Map Reduce programs and Hadoop Streaming jobs; and work with Apache Hive and Pig
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
3 
Training content: 
MODULE 1: HADOOP AND BIG DATA 
CHAPTER 1: INTRODUCTION TO BIG DATA 
In This Chapter you will understand what Big Data, Hadoop is 
Sub-Projects, How Big Data Analytics used to analyse data. 
Big data and Big Data Analytical services 
What is Hadoop? 
History of Hadoop 
Hadoop Eco-Systems 
Hadoop Framework 
Hadoop Distribution 
Hadoop in Retail, Healthcare, Telecom, etc., use cases…, 
CHAPTER 2: HADOOP DISTRIBUTED FILE SYSTEMS 
In this chapter you will understand the difference between 
Local and Hdfs file system, identifying the Nodes; how to check the status of job, you can able to identify file locations and how user request or job processed. 
Hdfs design 
Hadoop file systems 
Hdfs Daemons 
Master and Slave node architecture 
Hdfs webUI 
How to load the data into HDFS 
File system API –java 
Serialization in Hadoop – Reading and writing files from/to Hadoop URL
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
4 
CHPTER 3: ADMINISTERING HADOOP CLUSTER: 
In this chapter you will learn how to install Hadoop in production environment and practice environment. You can able to modify the Hadoop configuration files. 
Cluster specification, 
Hadoop cluster setup and installation 
--standalone 
--pseudo distributed mode 
--fully distributed mode 
Step-by-step multiNode installation 
Hadoop Configurations 
nameNode and dataNode directory structure 
User commands: 
fs, fsck,distcp,archive, ----- 
Administration commands 
dfsadmin, balancer,jobtracker,tasktracker,namenode---- 
MODULE 2: DATA IMPORTING AND EXPORTING USING SQOOP 
CHAPTER 4: WORKING WITH SQOOP 
In this chapter you will understand how to import data from Rdbms to Hdfs. How to get the data from Mysql tables to Hadoop Hdfs, how to perform import and export between the data stores, 
Sqoop installations 
Sqoop1 vs Sqoop2 
Sqoop commands 
Sqoop connectors for oracle, Mysql, Teradata and other Rdbms 
Importing the data from Mysql with all possible imports 
Exporting the data from HDFS to Rdbms 
Creating hive tables by importing data
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
5 
MODULE 3: DATA PROCESSING USING HIVE, PIG AND MAPREDUCE 
CHAPTER 5: HIVE 
In this chapter you will understand what is Hive, how to load the data from Hdfs or local file system to hive tables, How to query for 
large volumes of data 
Hive concepts 
Hive installation 
Hive configuration, Hive services 
Hive query processing 
Hive Metastore configurations 
Hive data types - primitive and complex types 
Hive operators 
Hive Built-in functions 
Hive Tables 
--creating tables 
--partitions and buckets 
--browsing tables and partitions 
--storage formats 
--loading data 
--joins 
--aggregations and sorting 
--insert into local files 
--altering, dropping tables 
--importing data 
Word count in Hive 
Hive Compression 
Hive Security
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
6 
CHAPTER 6: PIG 
In this chapter you will understand what data flow is, how pig is tightly coupled with map reduce. How efficiently you can retrieve the data from data store 
Why pig? 
Pig installation 
Pig Latin commands 
Pig Latin relational operators 
Pig Latin diagnostic operators 
Data types and Expressions 
Built-in functions 
Data processing in pig 
--load and store 
Filtering the data 
Grouping the data 
Joining the data 
Sorting the data 
Word count in Pig 
CHAPTER 7: MAPREDUCE 
In this chapter you will get an idea of how Map-Reduce framework works. How to create and run a map reduce program MapReduce Overview and Architecture Developing MapReduce Jobs Word count program in MapReduce Input and Output Formats Job Configuration Packaging, Launching, Debugging jobs Job Submission MapReduce workflows MapReduce types Input and Output Formats Running MapReduce jobs in Eclipse Practicing MapReduce Programs
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
7 
MODULE 4: YARN ARCHITECTURE 
CHAPTER 8: NEW MAPREDUCE (MRV2) 
Yarn: Yet Another Resource Negotiator 
Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. 
Yarn Architecture 
Yarn Configurations 
Yarn and MapReduce 2.0 Daemons 
Yarn New MapReduce API 
Yarn command line tools 
MODULE 5: NOSQL PROGRAMMING 
CHAPTER 9: MONGODB 
Introduction to NoSql Programming 
MongoDB Overview 
Installing mongoDB on Windows, Linux, Mac OS 
communication between Hadooop to mongoDB 
mongoDB management services 
MongoDB Programming 
- creating Databases 
-Creating Documents, adding Data, Retrieving data from Documents 
Update,Delete data in mongoDB 
-Datatypes,Arrays, Regular Expressions 
CHAPTER 10: OTHER HADOOP SUBPROJECTS 
Hbase introduction 
Oozie basics 
Working with Impala.
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
8 
MODULE 6: HADOOP BI/REPORTING TOOLS 
CHAPTER 11: REPORTING TOOLS 
Generating Reports using 
Pentaho BI Tools : Reporting on Hdfs file data, Reporting on Hive, 
Hbase data 
Jaspersoft: Generating reports using iReportdesigner 
CHAPTER 12: HADOOP CLUSTER MONITORING TOOLS: 
Working with Cloudera Manager 
Hadoop cluster monitoring using Ganglia 
Course Highlights: 
 8+ years of Experience instructors 
 Less theory, More Practical Environment 
 Hadoop Installation support on Personal computers 
 Datasets, Programs to practice 
 Notes on Each topic 
 Post training Guidance and Support 
 Interview questions, Resume preparation 
 Course Duration 40 Hours 
 Each Session Duration: 60 to 90Mins 
 Five sessions in a Week 
 Session Environment on GotoWebinar or Webex 
 Practical Environments: Vmplayer with Hadoop 0.20, Oracle VirtualBox Hadoop 2.0 Yarn for MapReduce2, cloudera Quickstart vm CDH4.0 
AJAY KUMAR 
www.beinghadoop.com 
hadoopframework@gmail.com 
https://www.facebook.com/hadoopinfo 
http://www.slideshare.net/hadoopframework
COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 
9

More Related Content

PDF
20131205 hadoop-hdfs-map reduce-introduction
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
PDF
Hadoop Administration pdf
PDF
Introduction to Hadoop
PDF
Hadoop Architecture and HDFS
PDF
Big data hadooop analytic and data warehouse comparison guide
PPTX
Hadoop online training
PDF
Interview questions on Apache spark [part 2]
20131205 hadoop-hdfs-map reduce-introduction
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Administration pdf
Introduction to Hadoop
Hadoop Architecture and HDFS
Big data hadooop analytic and data warehouse comparison guide
Hadoop online training
Interview questions on Apache spark [part 2]

What's hot (20)

PDF
Hadoop Developer
PDF
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
PDF
Big data Hadoop Analytic and Data warehouse comparison guide
PDF
Hadoop Overview
 
PDF
Big Data Hadoop Training
DOCX
Hadoop admin training
PDF
XML Parsing with Map Reduce
PPTX
ODT
Hadoop Interview Questions and Answers by rohit kapa
PDF
Introduction to Hadoop part1
ODP
Hadoop - Overview
PDF
Basics of big data analytics hadoop
PPT
Hadoop - Introduction to Hadoop
PPTX
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
PPT
Hadoop Tutorial
PDF
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
PPTX
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
PPSX
PPTX
Big Data Training in Amritsar
PDF
Hadoop hdfs interview questions
Hadoop Developer
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Big data Hadoop Analytic and Data warehouse comparison guide
Hadoop Overview
 
Big Data Hadoop Training
Hadoop admin training
XML Parsing with Map Reduce
Hadoop Interview Questions and Answers by rohit kapa
Introduction to Hadoop part1
Hadoop - Overview
Basics of big data analytics hadoop
Hadoop - Introduction to Hadoop
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Hadoop Tutorial
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Big Data Training in Amritsar
Hadoop hdfs interview questions
Ad

Viewers also liked (13)

PDF
Sandhi Wimarshana
PDF
Survey onhpcs languages
PPSX
Introduction to Bigdata & Hadoop
PDF
Hadoop description
PPTX
High Performance Data Analytics with Java on Large Multicore HPC Clusters
PPTX
A Basic Introduction to the Hadoop eco system - no animation
PDF
Hadoop File System Shell Commands,
PDF
Introduction to Hadoop
PPTX
Hadoop architecture by ajay
PDF
Towards a Systematic Study of Big Data Performance and Benchmarking
PDF
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
PPTX
MapReduce in Simple Terms
PPTX
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Sandhi Wimarshana
Survey onhpcs languages
Introduction to Bigdata & Hadoop
Hadoop description
High Performance Data Analytics with Java on Large Multicore HPC Clusters
A Basic Introduction to the Hadoop eco system - no animation
Hadoop File System Shell Commands,
Introduction to Hadoop
Hadoop architecture by ajay
Towards a Systematic Study of Big Data Performance and Benchmarking
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
MapReduce in Simple Terms
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Ad

Similar to Hadoop content (20)

PDF
Hadoop course content Syed Academy
PPTX
Big Data and Hadoop Training in Bangalore by myTectra
DOCX
Hadoop admin online training
PPTX
Hadoop course curriculm
PDF
Hadoop_Architect__eVenkat
PPTX
Hadoop training-and-placement
PPTX
Hadoop training-and-placement
DOCX
Hadoop online training in india
PDF
Hadoop 2.0-development
DOCX
PDF
Hadoop course content
PDF
Hadoop training in Bangalore
PDF
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
ODT
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
PPTX
Big data - Online Training
PDF
Hadoop training kit from lcc infotech
PPTX
Hadoop Training in Hyderabad
PPTX
Hadoop Training in Hyderabad
PPTX
Why Java Professionals Should Learn Hadoop
PPT
Hadoop course content @ a1 trainingss
Hadoop course content Syed Academy
Big Data and Hadoop Training in Bangalore by myTectra
Hadoop admin online training
Hadoop course curriculm
Hadoop_Architect__eVenkat
Hadoop training-and-placement
Hadoop training-and-placement
Hadoop online training in india
Hadoop 2.0-development
Hadoop course content
Hadoop training in Bangalore
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big data - Online Training
Hadoop training kit from lcc infotech
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Why Java Professionals Should Learn Hadoop
Hadoop course content @ a1 trainingss

Recently uploaded (20)

PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Institutional Correction lecture only . . .
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Cell Types and Its function , kingdom of life
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Institutional Correction lecture only . . .
TR - Agricultural Crops Production NC III.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
PPH.pptx obstetrics and gynecology in nursing
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Cell Types and Its function , kingdom of life
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Final Presentation General Medicine 03-08-2024.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf

Hadoop content

  • 1. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 1
  • 2. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 2 HADOOP BIG DATA TRAINING Big Data is huge amount of structured or unstructured data, to process this data using traditional methods having a problem of execution time and cost, and needs to process Bytes of data from different locations. The Big Data technologies can do this work four or more times faster than traditional data warehousing approaches Apache Hadoop is an open source framework designed to maintain huge amount of data. It helps organizations to analyse huge volumes like data size in Peta bytes. The data may be structured or unstructured. Hadoop scales virtually without limits and handles any type of data. No matter how data is encoded or formatted and structured or unstructured. Hadoop is Enterprise proven, already delivering on the promise of Big data across numerous industries - Cloudera Apache Hadoop is an open source framework that supports data intensive distributed computing application to process huge amount of data. It is licensed under Apache v2 license. It supports the running of applications on large clusters of commodity hardware - En.wikipedia We have designed this course to provide information and technical skills to become successful Hadoop developer. About certification: Cloudera delivers a developer-focused certification program CLOUDERA CERTIFIED HADOOP DEVELOPER (CCDH) CDH-410 that closely analyses Hadoop’s structure and provides hands-on exercises that teach you how to import data from existing sources; process data with a variety of techniques such as Java Map Reduce programs and Hadoop Streaming jobs; and work with Apache Hive and Pig
  • 3. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 3 Training content: MODULE 1: HADOOP AND BIG DATA CHAPTER 1: INTRODUCTION TO BIG DATA In This Chapter you will understand what Big Data, Hadoop is Sub-Projects, How Big Data Analytics used to analyse data. Big data and Big Data Analytical services What is Hadoop? History of Hadoop Hadoop Eco-Systems Hadoop Framework Hadoop Distribution Hadoop in Retail, Healthcare, Telecom, etc., use cases…, CHAPTER 2: HADOOP DISTRIBUTED FILE SYSTEMS In this chapter you will understand the difference between Local and Hdfs file system, identifying the Nodes; how to check the status of job, you can able to identify file locations and how user request or job processed. Hdfs design Hadoop file systems Hdfs Daemons Master and Slave node architecture Hdfs webUI How to load the data into HDFS File system API –java Serialization in Hadoop – Reading and writing files from/to Hadoop URL
  • 4. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 4 CHPTER 3: ADMINISTERING HADOOP CLUSTER: In this chapter you will learn how to install Hadoop in production environment and practice environment. You can able to modify the Hadoop configuration files. Cluster specification, Hadoop cluster setup and installation --standalone --pseudo distributed mode --fully distributed mode Step-by-step multiNode installation Hadoop Configurations nameNode and dataNode directory structure User commands: fs, fsck,distcp,archive, ----- Administration commands dfsadmin, balancer,jobtracker,tasktracker,namenode---- MODULE 2: DATA IMPORTING AND EXPORTING USING SQOOP CHAPTER 4: WORKING WITH SQOOP In this chapter you will understand how to import data from Rdbms to Hdfs. How to get the data from Mysql tables to Hadoop Hdfs, how to perform import and export between the data stores, Sqoop installations Sqoop1 vs Sqoop2 Sqoop commands Sqoop connectors for oracle, Mysql, Teradata and other Rdbms Importing the data from Mysql with all possible imports Exporting the data from HDFS to Rdbms Creating hive tables by importing data
  • 5. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 5 MODULE 3: DATA PROCESSING USING HIVE, PIG AND MAPREDUCE CHAPTER 5: HIVE In this chapter you will understand what is Hive, how to load the data from Hdfs or local file system to hive tables, How to query for large volumes of data Hive concepts Hive installation Hive configuration, Hive services Hive query processing Hive Metastore configurations Hive data types - primitive and complex types Hive operators Hive Built-in functions Hive Tables --creating tables --partitions and buckets --browsing tables and partitions --storage formats --loading data --joins --aggregations and sorting --insert into local files --altering, dropping tables --importing data Word count in Hive Hive Compression Hive Security
  • 6. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 6 CHAPTER 6: PIG In this chapter you will understand what data flow is, how pig is tightly coupled with map reduce. How efficiently you can retrieve the data from data store Why pig? Pig installation Pig Latin commands Pig Latin relational operators Pig Latin diagnostic operators Data types and Expressions Built-in functions Data processing in pig --load and store Filtering the data Grouping the data Joining the data Sorting the data Word count in Pig CHAPTER 7: MAPREDUCE In this chapter you will get an idea of how Map-Reduce framework works. How to create and run a map reduce program MapReduce Overview and Architecture Developing MapReduce Jobs Word count program in MapReduce Input and Output Formats Job Configuration Packaging, Launching, Debugging jobs Job Submission MapReduce workflows MapReduce types Input and Output Formats Running MapReduce jobs in Eclipse Practicing MapReduce Programs
  • 7. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 7 MODULE 4: YARN ARCHITECTURE CHAPTER 8: NEW MAPREDUCE (MRV2) Yarn: Yet Another Resource Negotiator Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Yarn Architecture Yarn Configurations Yarn and MapReduce 2.0 Daemons Yarn New MapReduce API Yarn command line tools MODULE 5: NOSQL PROGRAMMING CHAPTER 9: MONGODB Introduction to NoSql Programming MongoDB Overview Installing mongoDB on Windows, Linux, Mac OS communication between Hadooop to mongoDB mongoDB management services MongoDB Programming - creating Databases -Creating Documents, adding Data, Retrieving data from Documents Update,Delete data in mongoDB -Datatypes,Arrays, Regular Expressions CHAPTER 10: OTHER HADOOP SUBPROJECTS Hbase introduction Oozie basics Working with Impala.
  • 8. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 8 MODULE 6: HADOOP BI/REPORTING TOOLS CHAPTER 11: REPORTING TOOLS Generating Reports using Pentaho BI Tools : Reporting on Hdfs file data, Reporting on Hive, Hbase data Jaspersoft: Generating reports using iReportdesigner CHAPTER 12: HADOOP CLUSTER MONITORING TOOLS: Working with Cloudera Manager Hadoop cluster monitoring using Ganglia Course Highlights:  8+ years of Experience instructors  Less theory, More Practical Environment  Hadoop Installation support on Personal computers  Datasets, Programs to practice  Notes on Each topic  Post training Guidance and Support  Interview questions, Resume preparation  Course Duration 40 Hours  Each Session Duration: 60 to 90Mins  Five sessions in a Week  Session Environment on GotoWebinar or Webex  Practical Environments: Vmplayer with Hadoop 0.20, Oracle VirtualBox Hadoop 2.0 Yarn for MapReduce2, cloudera Quickstart vm CDH4.0 AJAY KUMAR www.beinghadoop.com hadoopframework@gmail.com https://www.facebook.com/hadoopinfo http://www.slideshare.net/hadoopframework
  • 9. COURSE: HADOOP DEVELOPER DURATION: 40 HOURS www.beinghadoop.com 9