SlideShare a Scribd company logo
- OWAIZ SHAIKH
Introduction to Hadoop:-
1. Hadoop is an open source, Java-based programming framework
that supports the processing and storage of extremely large data
sets in a distributed computing environment.
2. Hadoop makes it possible to run applications on systems with
thousands of commodity hardware nodes, and to handle thousands
of terabytes of data.
3. Its distributed file system facilitates rapid data transfer rates among
nodes and allows the system to continue operating in case of a node
failure.
data transfer rates
Hadoop was created by computer scientists Doug Cutting and Mike Cafarella.
In The Year 2006
Doug Cutting Mike Cafarella
Hadoop was created by computer scientists Doug Cutting
and Mike Cafarella in 2006 to support distribution for the
Nutch search engine.
It was inspired by Google's #MapReduce , a software
framework in which an application is broken down into
numerous small parts.
After years of development within the open source community,
Hadoop 1.0 became publically available in November 2012 as part of
the Apache project sponsored by the Apache Software Foundation.
90% of the world’s data was generated in the last few years.
Apache Nutch:-
Apache Nutch is a highly extensible and scalable open
source web crawler software project.
Nutch is coded entirely in the Java programming language, but
data is written in language-independent formats. It has a
highly modular architecture, allowing developers to create
plug-ins for media-type parsing, data retrieval, querying and
clustering.
Data analytics
Big data is a collection of both structured and
unstructured data that is too large fast and
distinct to be managed by traditional database
management tools or traditional data processing
applications.
Data analytics
Some Of the Examples:-
1.Data managed by eBayfor request search,
consumer/customer recommendations,
current trend and merchandising.
2.Data managed by Facebook:-
Platform For Managing Big Data:-
1.Hadoop uses simple programming model.
2. Hadoop can scale from single servers to
thousands of machines, each offering local
computation and storage.
HDFS Architecture :-
Data analytics
Data analytics
Data analytics
Three Main application Of Hadoop :-
• Advertisement (Mining user behavior to generate
recommendations)
• Searches (group related documents).
• Security (search for uncommon patterns).
Hadoop in the Wild:-
• Hadoop is in use at most organizations that handle big data:
o Yahoo!
o Facebook
o Amazon
o Netflix
o Etc…
• Some examples of scale:
o Yahoo!’s Search Webmap runs on 10,000 core Linux cluster and powers Yahoo! Web
search
o FB’s Hadoop cluster hosts 100+ PB of data (July, 2012) & growing at ½ PB/day (Nov, 2012)
…….Thank You…..

More Related Content

PPTX
Hadoop foundation for analytics
PPTX
Bigdata and hadoop
PPTX
INTRODUCTION OF BIG DATA
PPTX
Cap 10 ingles
PPTX
HADOOP TECHNOLOGY ppt
PPTX
PPTX
PDF
Hadoop foundation for analytics
Bigdata and hadoop
INTRODUCTION OF BIG DATA
Cap 10 ingles
HADOOP TECHNOLOGY ppt

What's hot (20)

PDF
Open source stak of big data techs open suse asia
PPTX
Intro to Big Data Hadoop
PPT
Big data and hadoop
PPTX
Introduction to Apache Hadoop Eco-System
PPT
PPTX
Big data & hadoop
PPTX
Hadoop An Introduction
PPTX
Hadoop Presentation - PPT
PPTX
Big Data Technology Stack : Nutshell
PDF
Introduction To Big Data Analytics On Hadoop - SpringPeople
PPTX
Hadoop
PPTX
Introduction to bigdata
PDF
Bigdata and Hadoop Bootcamp
PPTX
The Big Data Stack
PPTX
PPTX
Big data computing
PPTX
Big data ppt
PPTX
Hadoop Technology
PPT
Big data
Open source stak of big data techs open suse asia
Intro to Big Data Hadoop
Big data and hadoop
Introduction to Apache Hadoop Eco-System
Big data & hadoop
Hadoop An Introduction
Hadoop Presentation - PPT
Big Data Technology Stack : Nutshell
Introduction To Big Data Analytics On Hadoop - SpringPeople
Hadoop
Introduction to bigdata
Bigdata and Hadoop Bootcamp
The Big Data Stack
Big data computing
Big data ppt
Hadoop Technology
Big data
Ad

Similar to Data analytics (20)

PPTX
Hadoop training
PPTX
Cap 10 ingles
PPTX
Big Data Hadoop Technology
PPTX
Introduction to hadoop
PPSX
PPTX
introduction to hadoop
PPTX
Hadoop jon
PPTX
ch 01B Introduction to Hadoop components
PPTX
Big data Analytics Hadoop
PPTX
Hadoop Platforms - Introduction, Importance, Providers
PPTX
Data infrastructure at Facebook
PPTX
Big Data UNIT 2 AKTU syllabus all topics covered
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
PPTX
MahoutNew
PDF
Hadoop .pdf
PDF
Dallas TDWI Meeting Dec. 2012: Hadoop
PPTX
Hadoop info
PPTX
Spark_Talha.pptx
PDF
Unit IV.pdf
PPTX
Big Data Open Source Technologies
Hadoop training
Cap 10 ingles
Big Data Hadoop Technology
Introduction to hadoop
introduction to hadoop
Hadoop jon
ch 01B Introduction to Hadoop components
Big data Analytics Hadoop
Hadoop Platforms - Introduction, Importance, Providers
Data infrastructure at Facebook
Big Data UNIT 2 AKTU syllabus all topics covered
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
MahoutNew
Hadoop .pdf
Dallas TDWI Meeting Dec. 2012: Hadoop
Hadoop info
Spark_Talha.pptx
Unit IV.pdf
Big Data Open Source Technologies
Ad

Recently uploaded (20)

PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Global journeys: estimating international migration
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Foundation of Data Science unit number two notes
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Quality review (1)_presentation of this 21
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
.pdf is not working space design for the following data for the following dat...
Global journeys: estimating international migration
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
Foundation of Data Science unit number two notes
Fluorescence-microscope_Botany_detailed content
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Launch Your Data Science Career in Kochi – 2025
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Major-Components-ofNKJNNKNKNKNKronment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Quality review (1)_presentation of this 21

Data analytics

  • 2. Introduction to Hadoop:- 1. Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. 2. Hadoop makes it possible to run applications on systems with thousands of commodity hardware nodes, and to handle thousands of terabytes of data. 3. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating in case of a node failure. data transfer rates
  • 3. Hadoop was created by computer scientists Doug Cutting and Mike Cafarella. In The Year 2006 Doug Cutting Mike Cafarella
  • 4. Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch search engine. It was inspired by Google's #MapReduce , a software framework in which an application is broken down into numerous small parts.
  • 5. After years of development within the open source community, Hadoop 1.0 became publically available in November 2012 as part of the Apache project sponsored by the Apache Software Foundation. 90% of the world’s data was generated in the last few years.
  • 6. Apache Nutch:- Apache Nutch is a highly extensible and scalable open source web crawler software project. Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.
  • 8. Big data is a collection of both structured and unstructured data that is too large fast and distinct to be managed by traditional database management tools or traditional data processing applications.
  • 10. Some Of the Examples:- 1.Data managed by eBayfor request search, consumer/customer recommendations, current trend and merchandising.
  • 11. 2.Data managed by Facebook:-
  • 12. Platform For Managing Big Data:- 1.Hadoop uses simple programming model. 2. Hadoop can scale from single servers to thousands of machines, each offering local computation and storage.
  • 17. Three Main application Of Hadoop :- • Advertisement (Mining user behavior to generate recommendations) • Searches (group related documents). • Security (search for uncommon patterns).
  • 18. Hadoop in the Wild:- • Hadoop is in use at most organizations that handle big data: o Yahoo! o Facebook o Amazon o Netflix o Etc… • Some examples of scale: o Yahoo!’s Search Webmap runs on 10,000 core Linux cluster and powers Yahoo! Web search o FB’s Hadoop cluster hosts 100+ PB of data (July, 2012) & growing at ½ PB/day (Nov, 2012)