SlideShare a Scribd company logo
Hadoop
Framework for Distributed Applications
Hadoop
Hadoop
• Introduction
• History
• Key Technologies
– MapReduce
– HDFS
• Other Projects On Hadoop
• Conclusion
Introduction:
What is ?
Hadoop is a framework for running applications on large clusters
built of commodity hardware.
----HADOOP WIKI
Hadoop is a free, Java-based programming framework that
supports the processing of large data sets in a distributed
computing environment.
Introduction (conti..)
#1 Google’s Powerful Computation MapReduce Technology
#2 Hadoop Distributed File System(HDFS) inspired by Google File
System(GFS)
#3 Used for Cluster & Distributed Computing
#4 Support from…
#1 Open Source
#2 Part of Apache group
#3 Power of JAVA
#4 Supported By Big Web Giant Companies
History:
Inventor Doug Cutting, creator of Apache Lucene
The Origin of the Name “Hadoop”:
The name my kid gave a stuffed yellow elephant. Short, relatively easy to
spell and pronounce, meaningless, and not used elsewhere: those are my
naming criteria. ---Daug Cutting.
Started with building Web Search Engine
•Nutch in 2002
•Aim was to index billions of pages
•Architecture can’t support billions of pages
Google’s GFS in 2003 solved storage problem
•Nutch Distributed Filesystem(NDFS) in 2004
Google’s MapReduce in 2004
•MapReduce implimented in Nutch 2005
Feb 2006 they moved out of Nutch to form an independent
subproject of Lucene called Hadoop.
History (conti..)
At around the same time, Doug Cutting joined Yahoo
February 2008 , Yahoo! announced that its production searchindex
was being generated by a 10,000-core Hadoop cluster
In January 2008, Hadoop was made its own top-level project at
apache, confirming its success and its diverse, active community.
By this time Hadoop was being used by many other companies
besides Yahoo! such as
• Last.fm
• Facebook
• The New York Times
• Twitter
• Microsoft
• IBM
Key Technologies:
•MapReduce
-Computational Parallel Programming Model
-Technology developed by google
•Hadoop Distributed File System
-Distributed File System for large data set
-Inspired by Google File System
Key Technologies: MapReduce
Key Technologies: MapReduce
• Programming model developed at Google
• Sort/merge based distributed computing
• Initially, it was intended for their internal search/indexing
application, but now used extensively by more organizations
(e.g., Yahoo, Amazon.com, IBM, etc.)
• It is functional style programming (e.g., LISP) that is naturally
parallelizable across a large cluster of workstations or PCS.
• The underlying system takes care of the partitioning of the
input data, scheduling the program’s execution across several
machines, handling machine failures, and managing required
inter-machine communication. (This is the key for Hadoop’s
success)
Key Technologies: HDFS
 At Google MapReduce operation are run on a special file system
called Google File System (GFS) that is highly optimized for this
purpose.
 GFS is not open source.
 Doug Cutting and others at Yahoo! reverse engineered the GFS
and called it Hadoop Distributed File System (HDFS).
Key Technologies: HDFS
Key Technologies: HDFS
• Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB
• Assumes Commodity Hardware
– Files are replicated to handle hardware failure
– Detect failures and recovers from them
• Optimized for Batch Processing
– Data locations exposed so that computations can move to
where data resides
– Provides very high aggregate bandwidth
• User Space, runs on heterogeneous OS
Other Projects on Hadoop:
ZooKeeper: co-ordination services
Pig: A high-level data-flow language and execution
framework for parallel computation.
Hive:A data warehouse infrastructure that provides
data summarization and ad hoc querying.
Chukwa: A data collection system for managing
large distributed systems.
Other Projects on Hadoop:
Avro: Apache Avro is a data serialization system.
Avro provides:
•Rich data structures.
•A compact, fast, binary data format.
•A container file, to store persistentdata.
•Simple integration with dynamiclanguages.
Just as Google's Bigtable leverages the
distributed data storage provided by the
Google File System, HBase provides
Bigtable-like capabilities on top of
Hadoop Core.
Hadoop Architecture on DELL C Series
Server:
Conclusion:
Hadoop has been very effective solution for companies dealing
with the data in perabytes.
It has solved many problems in industry related to huge data
management and distributed system.
As it is open source, so it is adopted by companies widely.
Website : http://www.traininginbangalore.com/best-hadoop-training-institutes-in-bangalore/
Thank You…..
For More Query
+91 9513332301/02

More Related Content

PPTX
Hadoop..
PPT
Hadoop Technologies
PPTX
Hadoop And Their Ecosystem
PDF
Hadoop Primer
PPTX
Big data and hadoop anupama
PPTX
Hadoop jon
PPTX
Hadoop foundation for analytics
PPTX
Hadoop Architecture
Hadoop..
Hadoop Technologies
Hadoop And Their Ecosystem
Hadoop Primer
Big data and hadoop anupama
Hadoop jon
Hadoop foundation for analytics
Hadoop Architecture

What's hot (19)

PPTX
ODP
Hadoop introduction
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Hadoop
PPTX
Introduction to apache hadoop copy
PDF
Hadoop Ecosystem
PPTX
Hadoop Presentation - PPT
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Apache Hadoop at 10
ODP
Hadoop seminar
PPTX
Introduction to Apache Hadoop Ecosystem
PPTX
PPT
Hadoop hive presentation
PPTX
Cap 10 ingles
PPTX
Hadoop Technology
PDF
Big Data and Hadoop Ecosystem
PPTX
Hadoop overview
PPTX
Apache hadoop technology : Beginners
Hadoop introduction
HADOOP TECHNOLOGY ppt
Hadoop
Introduction to apache hadoop copy
Hadoop Ecosystem
Hadoop Presentation - PPT
Introduction to Big Data & Hadoop Architecture - Module 1
Apache Hadoop at 10
Hadoop seminar
Introduction to Apache Hadoop Ecosystem
Hadoop hive presentation
Cap 10 ingles
Hadoop Technology
Big Data and Hadoop Ecosystem
Hadoop overview
Apache hadoop technology : Beginners
Ad

Similar to Hadoop training (20)

PPSX
PDF
Hadoop framework thesis (3)
DOCX
Hadoop Report
PDF
Unit IV.pdf
PPTX
Hadoop.pptx
PPTX
Hadoop.pptx
PPTX
List of Engineering Colleges in Uttarakhand
PPTX
Cap 10 ingles
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PDF
Big data and hadoop overvew
PPTX
002 Introduction to hadoop v3
DOCX
Hadoop Seminar Report
PPTX
Hadoop And Their Ecosystem ppt
PPTX
hadoop-ecosystem-ppt.pptx
PPT
Hadoop Technology
PPTX
Hadoop ppt1
DOCX
Hadoop Seminar Report
PPTX
Hadoop and Big data in Big data and cloud.pptx
PPTX
Hadoo its a good pdf to read some notes p.pptx
PPTX
2. hadoop fundamentals
Hadoop framework thesis (3)
Hadoop Report
Unit IV.pdf
Hadoop.pptx
Hadoop.pptx
List of Engineering Colleges in Uttarakhand
Cap 10 ingles
Hadoop_EcoSystem slide by CIDAC India.pptx
Big data and hadoop overvew
002 Introduction to hadoop v3
Hadoop Seminar Report
Hadoop And Their Ecosystem ppt
hadoop-ecosystem-ppt.pptx
Hadoop Technology
Hadoop ppt1
Hadoop Seminar Report
Hadoop and Big data in Big data and cloud.pptx
Hadoo its a good pdf to read some notes p.pptx
2. hadoop fundamentals
Ad

More from TIB Academy (17)

PPTX
PPTX
Ios operating system
PPTX
Salesforce
PPTX
CCNA Introducing
PPTX
Hadoop training in bangalore
PPTX
CCNA Introducing
PPTX
Hadoop tutorial for Freshers,
PPTX
Selenium institute in bangalore
PPTX
Selenium Tutorial for Beginners - TIB Academy
PPTX
Django framework
PPTX
Python basics
PPTX
Core java tutorials
PPTX
Spring tutorials
PPTX
PPTX
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
PPTX
Python tutorial for beginners - Tib academy
PPTX
Best Angularjs tutorial for beginners - TIB Academy
Ios operating system
Salesforce
CCNA Introducing
Hadoop training in bangalore
CCNA Introducing
Hadoop tutorial for Freshers,
Selenium institute in bangalore
Selenium Tutorial for Beginners - TIB Academy
Django framework
Python basics
Core java tutorials
Spring tutorials
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Python tutorial for beginners - Tib academy
Best Angularjs tutorial for beginners - TIB Academy

Recently uploaded (20)

PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
master seminar digital applications in india
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Business Ethics Teaching Materials for college
PPTX
Cell Types and Its function , kingdom of life
PDF
RMMM.pdf make it easy to upload and study
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
Renaissance Architecture: A Journey from Faith to Humanism
Supply Chain Operations Speaking Notes -ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPH.pptx obstetrics and gynecology in nursing
Pharma ospi slides which help in ospi learning
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Business Ethics Teaching Materials for college
Cell Types and Its function , kingdom of life
RMMM.pdf make it easy to upload and study
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
VCE English Exam - Section C Student Revision Booklet

Hadoop training

  • 1. Hadoop Framework for Distributed Applications Hadoop
  • 2. Hadoop • Introduction • History • Key Technologies – MapReduce – HDFS • Other Projects On Hadoop • Conclusion
  • 3. Introduction: What is ? Hadoop is a framework for running applications on large clusters built of commodity hardware. ----HADOOP WIKI Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
  • 4. Introduction (conti..) #1 Google’s Powerful Computation MapReduce Technology #2 Hadoop Distributed File System(HDFS) inspired by Google File System(GFS) #3 Used for Cluster & Distributed Computing #4 Support from… #1 Open Source #2 Part of Apache group #3 Power of JAVA #4 Supported By Big Web Giant Companies
  • 5. History: Inventor Doug Cutting, creator of Apache Lucene The Origin of the Name “Hadoop”: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. ---Daug Cutting. Started with building Web Search Engine •Nutch in 2002 •Aim was to index billions of pages •Architecture can’t support billions of pages Google’s GFS in 2003 solved storage problem •Nutch Distributed Filesystem(NDFS) in 2004 Google’s MapReduce in 2004 •MapReduce implimented in Nutch 2005 Feb 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.
  • 6. History (conti..) At around the same time, Doug Cutting joined Yahoo February 2008 , Yahoo! announced that its production searchindex was being generated by a 10,000-core Hadoop cluster In January 2008, Hadoop was made its own top-level project at apache, confirming its success and its diverse, active community. By this time Hadoop was being used by many other companies besides Yahoo! such as • Last.fm • Facebook • The New York Times • Twitter • Microsoft • IBM
  • 7. Key Technologies: •MapReduce -Computational Parallel Programming Model -Technology developed by google •Hadoop Distributed File System -Distributed File System for large data set -Inspired by Google File System
  • 9. Key Technologies: MapReduce • Programming model developed at Google • Sort/merge based distributed computing • Initially, it was intended for their internal search/indexing application, but now used extensively by more organizations (e.g., Yahoo, Amazon.com, IBM, etc.) • It is functional style programming (e.g., LISP) that is naturally parallelizable across a large cluster of workstations or PCS. • The underlying system takes care of the partitioning of the input data, scheduling the program’s execution across several machines, handling machine failures, and managing required inter-machine communication. (This is the key for Hadoop’s success)
  • 10. Key Technologies: HDFS  At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose.  GFS is not open source.  Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).
  • 12. Key Technologies: HDFS • Very Large Distributed File System – 10K nodes, 100 million files, 10 PB • Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them • Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth • User Space, runs on heterogeneous OS
  • 13. Other Projects on Hadoop: ZooKeeper: co-ordination services Pig: A high-level data-flow language and execution framework for parallel computation. Hive:A data warehouse infrastructure that provides data summarization and ad hoc querying. Chukwa: A data collection system for managing large distributed systems.
  • 14. Other Projects on Hadoop: Avro: Apache Avro is a data serialization system. Avro provides: •Rich data structures. •A compact, fast, binary data format. •A container file, to store persistentdata. •Simple integration with dynamiclanguages. Just as Google's Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core.
  • 15. Hadoop Architecture on DELL C Series Server:
  • 16. Conclusion: Hadoop has been very effective solution for companies dealing with the data in perabytes. It has solved many problems in industry related to huge data management and distributed system. As it is open source, so it is adopted by companies widely.