SlideShare a Scribd company logo
Bigdata And Hadoop
Big Data Hadoop Training
What is Hadoop?
Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a
distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes
of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the
system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic
system failure, even if a significant number of nodes become inoperative.
Why Hadoop?
Large Volumes of Data:
Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly.
With data volumes and varieties constantly increasing, especially from social media and the Internet of Things
(IoT), that’s a key consideration.
Fault Tolerance:
Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically
redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored
automatically.
Flexibility:
Unlike traditional rdecideelational database, you don’t have to process data before storing it, You can store as much
data as you want and how to use it later. That includes unstructured data like text, images and videos etc.
Low Cost:
The open-source framework is free and used commodity hardware to store large quantities of data.
Scalability:
You can easily grow your system to handle more data simply by adding nodes. Little administration is required.
Copyright @ 2018 Learntek. All Rights Reserved. 3
The following topics will be covered in our Big Data and Hadoop Online Training
Copyright @ 2018 Learntek. All Rights Reserved.
4
Big Data Hadoop Training Topics :
Hadoop Introduction:
Big Data Hadoop Training : Introduction to Data and System
Types of Data
Traditional way of dealing large data and its problems
Types of Systems & Scaling
What is Big Data
Challenges in Big Data
Challenges in Traditional Application
New Requirements
What is Hadoop? Why Hadoop?
Brief history of Hadoop
Features of Hadoop
Hadoop and RDBMS
Hadoop Ecosystem’s overview
Copyright @ 2018 Learntek. All Rights Reserved. 5
Hadoop Installation :
Installation in detail
Creating Ubuntu image in VMware Downloading Hadoop
Installing SSH
Configuring Hadoop, HDFS & MapReduce
Download, Installation & Configuration Hive
Download, Installation & Configuration Pig
Download, Installation & Configuration Sqoop
Download, Installation & Configuration Hive
Configuring Hadoop in Different Modes
Copyright @ 2018 Learntek. All Rights Reserved. 6
Hadoop Distribute File System (HDFS) :
File System – Concepts
Blocks
Replication Factor
Version File
Safe mode
Namespace IDs
Purpose of Name Node
Purpose of Data Node
Purpose of Secondary Name Node
Purpose of Job Tracker
Purpose of Task Tracker
HDFS Shell Commands – copy, delete, create directories etc.
Reading and Writing in HDFS
Difference of Unix Commands and HDFS commands
Read / Write in HDFS – Internal Process between
Client, Name Node & Data Nodes.
Accessing HDFS using Java API
Various Ways of Accessing HDFS
Understanding HDFS Java classes and methods
Admin: 1. Commissioning / Decommissioning Data
Node
Balancer
Replication Policy
Network Distance / Topology Script
Copyright @ 2018 Learntek. All Rights Reserved. 7
Map Reduce Programming :
About MapReduce
Understanding block and input splits
MapReduce Data types
Understanding Writable
Data Flow in MapReduce Application
Understanding MapReduce problem on datasets
MapReduce and Functional Programming
Writing MapReduce Application
Understanding Mapper function
Understanding Reducer Function
Understanding Driver
Usage of Combiner
Understanding Partitioned
Usage of Distributed Cache
Passing the parameters to mapper and reducer
Analyzing the Results
Log files
Input Formats and Output Formats
Counters, Skipping Bad and unwanted Records
Writing Join’s in MapReduce with 2 Input files. Join Types.
Execute MapReduce Job – Insights.
Exercise’s on MapReduce.
Job Scheduling: Type of Schedulers.
Copyright @ 2018 Learntek. All Rights Reserved.
8
Hive
Hive concepts
Schema on Read VS Schema on Write
Hive architecture
Install and configure hive on cluster
Meta Store – Purpose & Type of Configurations
Different type of tables in Hive
Buckets
Partitions
Joins in hive
Hive Query Language
Hive Data Types
Data Loading into Hive Tables
Hive Query Execution
Hive library functions
Hive UDF
Hive Limitations
Pig
Pig basics
Install and configure PIG on a cluster
PIG Library functions
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Running as Java program
PIG UDFs
Copyright @ 2018 Learntek. All Rights Reserved. 9
HBase :
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database, Develop and run sample applications
Access data stored in HBase using Java API
Sqoop :
Install and configure Sqoop on cluster
Connecting to RDBMS
Installing MySQL
Import data from MySQL to hive
Export data to MySQL
Internal mechanism of import/export
Copyright @ 2018 Learntek. All Rights Reserved. 10
Oozie :
Introduction to OOZIE
Oozie architecture
XML file specifications
Specifying Work flow
Control nodes
Oozie job coordinator
Flume
Introduction to Flume
Configuration and Setup
Flume Sink with example
Channel
Flume Source with example
Complex flume architecture
Copyright @ 2018 Learntek. All Rights Reserved. 11
Zookeeper :
Introduction to Zookeeper
Challenges in distributed Applications
Coordination
ZooKeeper : Design Goals
Data Model and Hierarchical namespace
Client APIs
YARN
Hadoop 1.0 Limitations
MapReduce Limitations
History of Hadoop 2.0
HDFS 2: Architecture
HDFS 2: Quorum based storage
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN
YARN Apps
YARN multitenancy
YARN Capacity Scheduler
Prerequisites :
Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python
knowledge helpful.
Copyright @ 2018 Learntek. All Rights Reserved. 12

More Related Content

DOCX
PPTX
HADOOP TECHNOLOGY ppt
PPTX
PPTX
Hadoop Tutorial For Beginners
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPTX
PPT on Hadoop
PPT
Introducing the hadoop ecosystem
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
HADOOP TECHNOLOGY ppt
Hadoop Tutorial For Beginners
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPT on Hadoop
Introducing the hadoop ecosystem
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...

What's hot (20)

PDF
Apache Hadoop - Big Data Engineering
PPTX
PPTX
Hadoop and Big Data
PDF
Introduction to Hadoop part1
PPTX
Big Data and Hadoop
PPTX
Big Data and Hadoop Introduction
PPTX
Apache Hadoop
PPTX
Comparison - RDBMS vs Hadoop vs Apache
PPTX
Hadoop in three use cases
PPTX
Hadoop File system (HDFS)
PPTX
Big Data & Hadoop Tutorial
PDF
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
PPTX
Hadoop basics
PPTX
Big data and Hadoop
PDF
Hadoop Architecture and HDFS
PPTX
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
PPTX
Hadoop Presentation - PPT
PPTX
Big data Hadoop presentation
PPTX
Apache hadoop introduction and architecture
ODP
Hadoop seminar
Apache Hadoop - Big Data Engineering
Hadoop and Big Data
Introduction to Hadoop part1
Big Data and Hadoop
Big Data and Hadoop Introduction
Apache Hadoop
Comparison - RDBMS vs Hadoop vs Apache
Hadoop in three use cases
Hadoop File system (HDFS)
Big Data & Hadoop Tutorial
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop basics
Big data and Hadoop
Hadoop Architecture and HDFS
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Hadoop Presentation - PPT
Big data Hadoop presentation
Apache hadoop introduction and architecture
Hadoop seminar
Ad

Similar to Big data and hadoop product page (20)

PPT
Hadoop in action
PPTX
Hadoop and BigData - July 2016
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
PPTX
Overview of big data & hadoop version 1 - Tony Nguyen
PPTX
Overview of big data & hadoop v1
PPT
Hadoop training by keylabs
PDF
Bigdata and Hadoop Bootcamp
PDF
What is hadoop
PPTX
Big data
PPTX
Hadoop jon
PPTX
Data infrastructure at Facebook
PPTX
Hadoop info
PPTX
Big data - Online Training
PPTX
Managing Big data with Hadoop
PPTX
Big Data & Analytics (CSE6005) L6.pptx
PDF
Hadoop .pdf
PPTX
ch 01B Introduction to Hadoop components
PPT
Hadoop a Natural Choice for Data Intensive Log Processing
PPT
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Hadoop in action
Hadoop and BigData - July 2016
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop v1
Hadoop training by keylabs
Bigdata and Hadoop Bootcamp
What is hadoop
Big data
Hadoop jon
Data infrastructure at Facebook
Hadoop info
Big data - Online Training
Managing Big data with Hadoop
Big Data & Analytics (CSE6005) L6.pptx
Hadoop .pdf
ch 01B Introduction to Hadoop components
Hadoop a Natural Choice for Data Intensive Log Processing
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
Ad

More from Janu Jahnavi (20)

PDF
Analytics using r programming
PDF
Software testing
PPTX
Software testing
PPTX
Spring
PDF
Stack skills
PPTX
Ui devopler
PPTX
Apache flink
PDF
Apache flink
PDF
Angular js
PDF
Mysql python
PPTX
Mysql python
PDF
Ruby with cucmber
PPTX
Apache kafka
PDF
Apache kafka
PPTX
Google cloud platform
PPTX
Google cloud Platform
PDF
Apache spark with java 8
PPTX
Apache spark with java 8
PDF
Categorizing and pos tagging with nltk python
PPTX
Categorizing and pos tagging with nltk python
Analytics using r programming
Software testing
Software testing
Spring
Stack skills
Ui devopler
Apache flink
Apache flink
Angular js
Mysql python
Mysql python
Ruby with cucmber
Apache kafka
Apache kafka
Google cloud platform
Google cloud Platform
Apache spark with java 8
Apache spark with java 8
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python

Recently uploaded (20)

PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharma ospi slides which help in ospi learning
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
01-Introduction-to-Information-Management.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Computing-Curriculum for Schools in Ghana
PPH.pptx obstetrics and gynecology in nursing
Complications of Minimal Access Surgery at WLH
Pharma ospi slides which help in ospi learning
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O5-L3 Freight Transport Ops (International) V1.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
01-Introduction-to-Information-Management.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Final Presentation General Medicine 03-08-2024.pptx
Classroom Observation Tools for Teachers
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Computing-Curriculum for Schools in Ghana

Big data and hadoop product page

  • 2. Big Data Hadoop Training What is Hadoop? Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Why Hadoop? Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration.
  • 3. Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically. Flexibility: Unlike traditional rdecideelational database, you don’t have to process data before storing it, You can store as much data as you want and how to use it later. That includes unstructured data like text, images and videos etc. Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data. Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required. Copyright @ 2018 Learntek. All Rights Reserved. 3 The following topics will be covered in our Big Data and Hadoop Online Training
  • 4. Copyright @ 2018 Learntek. All Rights Reserved. 4 Big Data Hadoop Training Topics : Hadoop Introduction: Big Data Hadoop Training : Introduction to Data and System Types of Data Traditional way of dealing large data and its problems Types of Systems & Scaling What is Big Data Challenges in Big Data Challenges in Traditional Application New Requirements What is Hadoop? Why Hadoop? Brief history of Hadoop Features of Hadoop Hadoop and RDBMS Hadoop Ecosystem’s overview
  • 5. Copyright @ 2018 Learntek. All Rights Reserved. 5 Hadoop Installation : Installation in detail Creating Ubuntu image in VMware Downloading Hadoop Installing SSH Configuring Hadoop, HDFS & MapReduce Download, Installation & Configuration Hive Download, Installation & Configuration Pig Download, Installation & Configuration Sqoop Download, Installation & Configuration Hive Configuring Hadoop in Different Modes
  • 6. Copyright @ 2018 Learntek. All Rights Reserved. 6 Hadoop Distribute File System (HDFS) : File System – Concepts Blocks Replication Factor Version File Safe mode Namespace IDs Purpose of Name Node Purpose of Data Node Purpose of Secondary Name Node Purpose of Job Tracker Purpose of Task Tracker HDFS Shell Commands – copy, delete, create directories etc. Reading and Writing in HDFS Difference of Unix Commands and HDFS commands Read / Write in HDFS – Internal Process between Client, Name Node & Data Nodes. Accessing HDFS using Java API Various Ways of Accessing HDFS Understanding HDFS Java classes and methods Admin: 1. Commissioning / Decommissioning Data Node Balancer Replication Policy Network Distance / Topology Script
  • 7. Copyright @ 2018 Learntek. All Rights Reserved. 7 Map Reduce Programming : About MapReduce Understanding block and input splits MapReduce Data types Understanding Writable Data Flow in MapReduce Application Understanding MapReduce problem on datasets MapReduce and Functional Programming Writing MapReduce Application Understanding Mapper function Understanding Reducer Function Understanding Driver Usage of Combiner Understanding Partitioned Usage of Distributed Cache Passing the parameters to mapper and reducer Analyzing the Results Log files Input Formats and Output Formats Counters, Skipping Bad and unwanted Records Writing Join’s in MapReduce with 2 Input files. Join Types. Execute MapReduce Job – Insights. Exercise’s on MapReduce. Job Scheduling: Type of Schedulers.
  • 8. Copyright @ 2018 Learntek. All Rights Reserved. 8 Hive Hive concepts Schema on Read VS Schema on Write Hive architecture Install and configure hive on cluster Meta Store – Purpose & Type of Configurations Different type of tables in Hive Buckets Partitions Joins in hive Hive Query Language Hive Data Types Data Loading into Hive Tables Hive Query Execution Hive library functions Hive UDF Hive Limitations Pig Pig basics Install and configure PIG on a cluster PIG Library functions Pig Vs Hive Write sample Pig Latin scripts Modes of running PIG Running in Grunt shell Running as Java program PIG UDFs
  • 9. Copyright @ 2018 Learntek. All Rights Reserved. 9 HBase : HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using Java API Sqoop : Install and configure Sqoop on cluster Connecting to RDBMS Installing MySQL Import data from MySQL to hive Export data to MySQL Internal mechanism of import/export
  • 10. Copyright @ 2018 Learntek. All Rights Reserved. 10 Oozie : Introduction to OOZIE Oozie architecture XML file specifications Specifying Work flow Control nodes Oozie job coordinator Flume Introduction to Flume Configuration and Setup Flume Sink with example Channel Flume Source with example Complex flume architecture
  • 11. Copyright @ 2018 Learntek. All Rights Reserved. 11 Zookeeper : Introduction to Zookeeper Challenges in distributed Applications Coordination ZooKeeper : Design Goals Data Model and Hierarchical namespace Client APIs YARN Hadoop 1.0 Limitations MapReduce Limitations History of Hadoop 2.0 HDFS 2: Architecture HDFS 2: Quorum based storage HDFS 2: High availability HDFS 2: Federation YARN Architecture Classic vs YARN YARN Apps YARN multitenancy YARN Capacity Scheduler Prerequisites : Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful.
  • 12. Copyright @ 2018 Learntek. All Rights Reserved. 12