SlideShare a Scribd company logo
Apache top level project, open-
source implementation of frameworks
for reliable, scalable, distributed
computing and data storage.
It is a flexible and highly-available
architecture for large scale
computation and data processing on
a network of commodity hardware.
Hadoop’s Developers
Doug Cutting
2005: Doug Cutting and Michael J.
Cafarella developed Hadoop to
support distribution for
the Nutch search engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to
Apache
Software Foundation.
Google Origins
2003
2004
2006
• Hadoop:
• an open-source software framework that supports data-intensive
distributed applications, licensed under the Apache v2 license.
• Goals / Requirements:
• Abstract and facilitate the storage and processing of large and/or rapidly
growing data sets
• Structured and non-structured data
• Simple programming models
• High scalability and availability
• Use commodity (cheap!) hardware with little redundancy
• Fault-tolerance
• Move computation rather than data
Hadoop Framework Tools
Hadoop’s Architecture
Three main applications of Hadoop:
• Advertisement (Mining user behavior to
generate recommendations)
• Searches (group related documents)
• Security (search for uncommon
patterns)
introduction to hadoop

More Related Content

PPTX
Big Data Open Source Technologies
PPTX
Available platforms for Big Data 2.0
PPTX
PPTX
CESSDA Persistent Identifiers
 
PPTX
FAIR Dataverse
 
PPTX
API economy
 
PDF
Integrating Hadoop & Solr
PPTX
Managed Cluster Services
Big Data Open Source Technologies
Available platforms for Big Data 2.0
CESSDA Persistent Identifiers
 
FAIR Dataverse
 
API economy
 
Integrating Hadoop & Solr
Managed Cluster Services

What's hot (20)

PDF
PPTX
Big data analysis using hadoop cluster
PPTX
MahoutNew
PPTX
Big data in Azure
PDF
Bridging to a hybrid cloud data services architecture
PPTX
Spark - The beginnings
PDF
Intake at AnacondaCon
PPTX
PPTX
Introduction to Big Data
PPTX
The HDF Group: Community models and outreach
PDF
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
PDF
Data Tools and the Data Scientist Shortage
PDF
DataFest 2019 Science Gateways
PPTX
Intro To Hadoop
PPTX
Options for Data Prep - A Survey of the Current Market
PPTX
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PDF
Bi on Big Data - Strata 2016 in London
PDF
Дмитрий Попович "How to build a data warehouse?"
PDF
Intro to Big Data - Spark
Big data analysis using hadoop cluster
MahoutNew
Big data in Azure
Bridging to a hybrid cloud data services architecture
Spark - The beginnings
Intake at AnacondaCon
Introduction to Big Data
The HDF Group: Community models and outreach
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
Data Tools and the Data Scientist Shortage
DataFest 2019 Science Gateways
Intro To Hadoop
Options for Data Prep - A Survey of the Current Market
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Bi on Big Data - Strata 2016 in London
Дмитрий Попович "How to build a data warehouse?"
Intro to Big Data - Spark
Ad

Similar to introduction to hadoop (20)

PPTX
Data analytics
PPTX
Hadoo its a good pdf to read some notes p.pptx
PPTX
Hadoop foundation for analytics
PPTX
Hadoop
PPTX
Hadoop training
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
PPSX
PPTX
Akhil's hadoop
PPTX
Akhil's hadoop
PPTX
Big Data Introduction
PPTX
Cap 10 ingles
PPTX
Cap 10 ingles
PPT
Big data and hadoop
PPTX
Big Data Hadoop Technology
PPTX
Introduction to BIg Data and Hadoop
PPTX
Big Data UNIT 2 AKTU syllabus all topics covered
PPTX
Big data - Online Training
PPTX
Apache Hadoop Hive
PPTX
Hadoop jon
Data analytics
Hadoo its a good pdf to read some notes p.pptx
Hadoop foundation for analytics
Hadoop
Hadoop training
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Akhil's hadoop
Akhil's hadoop
Big Data Introduction
Cap 10 ingles
Cap 10 ingles
Big data and hadoop
Big Data Hadoop Technology
Introduction to BIg Data and Hadoop
Big Data UNIT 2 AKTU syllabus all topics covered
Big data - Online Training
Apache Hadoop Hive
Hadoop jon
Ad

More from ASIT (20)

PPTX
Asit education student review
PPTX
ASIT EDUCATION STUDENT REVIEWS
PPTX
Asit Education
PPTX
Asit Education Student Reviews
PPTX
Asit education Student review
PPTX
ASIT EDUCATION REVIEW
PPTX
Asit Never Cheats Unemployes
PPTX
Latest News on Amc Square Asit
PPTX
Asit amc never cheats students
PPTX
News on AMC Square ASIT
PPTX
News on Asit Amc
PPTX
Time Management
PPTX
learn Ruby at ASIT
PPTX
introduction to Mongodb
PPTX
ASIT REVIEWS
PPTX
ASIT REVIEWS
PPTX
Learn REST API at ASIT
PPTX
Learn C LANGUAGE at ASIT
PPTX
Learn Advanced JAVA at ASIT
PPTX
Learn WCF at ASIT
Asit education student review
ASIT EDUCATION STUDENT REVIEWS
Asit Education
Asit Education Student Reviews
Asit education Student review
ASIT EDUCATION REVIEW
Asit Never Cheats Unemployes
Latest News on Amc Square Asit
Asit amc never cheats students
News on AMC Square ASIT
News on Asit Amc
Time Management
learn Ruby at ASIT
introduction to Mongodb
ASIT REVIEWS
ASIT REVIEWS
Learn REST API at ASIT
Learn C LANGUAGE at ASIT
Learn Advanced JAVA at ASIT
Learn WCF at ASIT

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
master seminar digital applications in india
PDF
Pre independence Education in Inndia.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Cell Types and Its function , kingdom of life
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Basic Mud Logging Guide for educational purpose
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Insiders guide to clinical Medicine.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
master seminar digital applications in india
Pre independence Education in Inndia.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Cell Types and Its function , kingdom of life
Week 4 Term 3 Study Techniques revisited.pptx
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Microbial diseases, their pathogenesis and prophylaxis
102 student loan defaulters named and shamed – Is someone you know on the list?
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
Renaissance Architecture: A Journey from Faith to Humanism
Basic Mud Logging Guide for educational purpose
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES

introduction to hadoop

  • 1. Apache top level project, open- source implementation of frameworks for reliable, scalable, distributed computing and data storage. It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.
  • 2. Hadoop’s Developers Doug Cutting 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation.
  • 4. • Hadoop: • an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Goals / Requirements: • Abstract and facilitate the storage and processing of large and/or rapidly growing data sets • Structured and non-structured data • Simple programming models • High scalability and availability • Use commodity (cheap!) hardware with little redundancy • Fault-tolerance • Move computation rather than data
  • 7. Three main applications of Hadoop: • Advertisement (Mining user behavior to generate recommendations) • Searches (group related documents) • Security (search for uncommon patterns)