SlideShare a Scribd company logo
HADOOP
                           Framework and
                           Applications




Prepared by: TEAM HADOOP                   slide1/22
CONTENTS
   WHY   HADOOP?




   INTRODUCTION      TO MapReduce




Prepared by: TEAM HADOOP             slide 2/22
WHAT?
  “... to create building blocks for programmers
  who just happen to have lots of data to
  store, lots of data to analyze, or lots of machines
  to coordinate, and who don‟t have the
  time, the skill, or the inclination to become
  distributed systems experts to build the
  infrastructure to handle it.”
                                           -Tom White

  Source: Hadoop: The Definitive Guide



Prepared by: TEAM HADOOP                        slide 3/22
WHAT?
     Hadoop contains many subprojects:
     Hadoop Common
     Chukwa
     HBase
     ZooKeeper
     Pig
     Zombie
     Hive
     MapReduce

  We will focus on MapReduce



Prepared by: TEAM HADOOP                  slide 4/22
WHO & WHEN?
   Pre-2004 : Cutting and Cafarella develop
    open source projects for web-scale
    indexing, crawling and search.




Prepared by: TEAM HADOOP                slide 5/22
WHO & WHEN?
   2004: Jeffrey Dean and Sanjay
    Ghemawat introduce map reduce model
    used internally at Google.




Prepared by: TEAM HADOOP           slide 6/22
WHO & WHEN?
   2006:Hadoop becomes official Apache
    project, Cutting joins Yahoo!Yahoo
    adopts Hadoop.




Prepared by: TEAM HADOOP            slide 7/22
TRENDS




Prepared by: TEAM HADOOP   slide 8/22
WHO USES IT?




Prepared by: TEAM HADOOP   slide 9/22
Roughly how long to read 1TB
  from a commodity hard disk?




Prepared by: TEAM HADOOP   slide 10/22
Roughly how long to read 1TB
  from a commodity hard disk?


                     Around 4 hours
WITH HADOOP..



                      62 seconds…



Prepared by: TEAM HADOOP              slide 11/22
INTRODUCTION TO MapReduce




   "Break large problem into smaller parts, solve in
   parallel, combine results."



 Prepared by: TEAM HADOOP                              slide 12/22
Typical scenario
   How  many times is the word „IT‟ present?
    You‟ll probably count but in a 30k paged
    document, can you??




Prepared by: TEAM HADOOP                 slide 13/22
Map Reduce Typical Illustration




 Prepared by: TEAM HADOOP    slide 14/22
Map Reduce paradigm

                                 Input




               Output                                   Map




                        Reduce           Shuffle/Sort




Prepared by: TEAM HADOOP                                      slide 15/22
Map Reduce paradigm
   Map:  transforms input record to
    intermediate (key, value) pair




Prepared by: TEAM HADOOP               slide 16/22
Map Reduce paradigm
   Reduce:   transforms all records for given
    key to final output.




Prepared by: TEAM HADOOP                    slide 17/22
Map reduce principles

                                           Move code to data (local
                                                computation)




                  Abstract away fault                                    Allow programs to scale
            tolerance, synchronization, etc.                          transparently w.r.t size of input




Prepared by: TEAM HADOOP                                                                                  slide 18/22
Implementation: Hardware




Prepared by: TEAM HADOOP sroy choudhury7@gmail.com   slide 19/22
Map Reduce: strengths
   Batch,   offline jobs

   Write-once,   read-many across full data
    set

   Usually,
          though not always, simple
    computations

   I/O   bound by disk/network bandwidth


Prepared by: TEAM HADOOP                  slide 20/22
What it‟s not!

  What it‟s not:

   High-performance parallel
    computing, e.g. MPI

   Low-latency    random access relational
    database

   Always   the right solution


Prepared by: TEAM HADOOP                  slide 21/22
THANK YOU!
                           QUESTIONS?




Prepared by: TEAM HADOOP                slide 22/22

More Related Content

PPTX
Forma Polystar
PPT
Garizuma
PPTX
PIA AA AD16.
PDF
2014 REVISTA MEXICANA DE CIENCIAS GEOLÓGICAS - Study of Cedral Horses and the...
PDF
Kaliteli sofralik uzum yetistiriciligi
PDF
RAPPORTO EXPORT ABRUZZO
PPTX
Tutoria herramientas
PDF
Anggaran kas 2014
Forma Polystar
Garizuma
PIA AA AD16.
2014 REVISTA MEXICANA DE CIENCIAS GEOLÓGICAS - Study of Cedral Horses and the...
Kaliteli sofralik uzum yetistiriciligi
RAPPORTO EXPORT ABRUZZO
Tutoria herramientas
Anggaran kas 2014

Viewers also liked (19)

DOC
Zaidan ismail rashid original
PPTX
Bo p, disequlibrium,
PPTX
加拉太書
PDF
Expresiòn oral - Cassany
PPTX
Supersticiones
PPTX
La amistad
PDF
Estrategias de ensenanza_cap6 Anijovich Mora 2009_
PDF
Pat7.3 253
PPTX
Међумолекулске интеракције и водонична веза
PDF
Evolution of the EU institutional framework (in Albanian Language) by Dr Lore...
PPT
Chris Hamilton news:rewired presentation
PPT
22號 周玟伽
PDF
Sourajit Aiyer - GSCGI WealthGram, Switzerland - Can the indian elephant move...
PPTX
EDS selection & implementation @ CCC
PPT
disleksia kanak2
DOCX
Tugas 4
PPT
Salon Maison Passive - Enterprise Europe Brussels - Technology Watch services
PPTX
закон о пдн_последние_изменения_разбегаев_в_авг_2014
PPTX
動画の作り方から稼ぎ方まで20130720
Zaidan ismail rashid original
Bo p, disequlibrium,
加拉太書
Expresiòn oral - Cassany
Supersticiones
La amistad
Estrategias de ensenanza_cap6 Anijovich Mora 2009_
Pat7.3 253
Међумолекулске интеракције и водонична веза
Evolution of the EU institutional framework (in Albanian Language) by Dr Lore...
Chris Hamilton news:rewired presentation
22號 周玟伽
Sourajit Aiyer - GSCGI WealthGram, Switzerland - Can the indian elephant move...
EDS selection & implementation @ CCC
disleksia kanak2
Tugas 4
Salon Maison Passive - Enterprise Europe Brussels - Technology Watch services
закон о пдн_последние_изменения_разбегаев_в_авг_2014
動画の作り方から稼ぎ方まで20130720
Ad

Similar to Hadoop and MapReduce (20)

PDF
MapReduce and Hadoop
PDF
Big Data Analytics Chapter3-6@2021.pdf
PPTX
Big data and hadoop
PDF
Seminar_Report_hadoop
PPTX
This gives a brief detail about big data
PPT
Seminar Presentation Hadoop
PDF
Hadoop Master Class : A concise overview
PPT
Hadoop by sunitha
PPT
Hadoop - Introduction to HDFS
PDF
Hadoop programming
PPTX
Apache hadoop basics
PPTX
Hadoop training-in-hyderabad
PPTX
Large Scale Data With Hadoop
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PDF
Map reduce and hadoop at mylife
PPT
Hadoop online-training
PPTX
Introduction to Apache Hadoop
PPTX
Hadoop
PPTX
Hadoop and MapReduce Introductort presentation
MapReduce and Hadoop
Big Data Analytics Chapter3-6@2021.pdf
Big data and hadoop
Seminar_Report_hadoop
This gives a brief detail about big data
Seminar Presentation Hadoop
Hadoop Master Class : A concise overview
Hadoop by sunitha
Hadoop - Introduction to HDFS
Hadoop programming
Apache hadoop basics
Hadoop training-in-hyderabad
Large Scale Data With Hadoop
Hadoop_EcoSystem slide by CIDAC India.pptx
Map reduce and hadoop at mylife
Hadoop online-training
Introduction to Apache Hadoop
Hadoop
Hadoop and MapReduce Introductort presentation
Ad

More from Abhishek Dey (6)

PPTX
Automatic problem generation
PPTX
Cafaholic ppt
PPTX
Handling High Energy Physics Data using Cloud Computing
PPTX
Big Data Analysis on a Cloud Ecosystem-PATW 2013
PPTX
Cloud computing using Eucalyptus
PPTX
Introduction to cloud computing
Automatic problem generation
Cafaholic ppt
Handling High Energy Physics Data using Cloud Computing
Big Data Analysis on a Cloud Ecosystem-PATW 2013
Cloud computing using Eucalyptus
Introduction to cloud computing

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Insiders guide to clinical Medicine.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Classroom Observation Tools for Teachers
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Lesson notes of climatology university.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Sports Quiz easy sports quiz sports quiz
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Anesthesia in Laparoscopic Surgery in India
Renaissance Architecture: A Journey from Faith to Humanism
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Insiders guide to clinical Medicine.pdf
Basic Mud Logging Guide for educational purpose
Module 4: Burden of Disease Tutorial Slides S2 2025
human mycosis Human fungal infections are called human mycosis..pptx
Classroom Observation Tools for Teachers
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O7-L3 Supply Chain Operations - ICLT Program
GDM (1) (1).pptx small presentation for students
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Lesson notes of climatology university.
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Sports Quiz easy sports quiz sports quiz

Hadoop and MapReduce

  • 1. HADOOP Framework and Applications Prepared by: TEAM HADOOP slide1/22
  • 2. CONTENTS  WHY HADOOP?  INTRODUCTION TO MapReduce Prepared by: TEAM HADOOP slide 2/22
  • 3. WHAT? “... to create building blocks for programmers who just happen to have lots of data to store, lots of data to analyze, or lots of machines to coordinate, and who don‟t have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it.” -Tom White Source: Hadoop: The Definitive Guide Prepared by: TEAM HADOOP slide 3/22
  • 4. WHAT?  Hadoop contains many subprojects:  Hadoop Common  Chukwa  HBase  ZooKeeper  Pig  Zombie  Hive  MapReduce We will focus on MapReduce Prepared by: TEAM HADOOP slide 4/22
  • 5. WHO & WHEN?  Pre-2004 : Cutting and Cafarella develop open source projects for web-scale indexing, crawling and search. Prepared by: TEAM HADOOP slide 5/22
  • 6. WHO & WHEN?  2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internally at Google. Prepared by: TEAM HADOOP slide 6/22
  • 7. WHO & WHEN?  2006:Hadoop becomes official Apache project, Cutting joins Yahoo!Yahoo adopts Hadoop. Prepared by: TEAM HADOOP slide 7/22
  • 8. TRENDS Prepared by: TEAM HADOOP slide 8/22
  • 9. WHO USES IT? Prepared by: TEAM HADOOP slide 9/22
  • 10. Roughly how long to read 1TB from a commodity hard disk? Prepared by: TEAM HADOOP slide 10/22
  • 11. Roughly how long to read 1TB from a commodity hard disk? Around 4 hours WITH HADOOP.. 62 seconds… Prepared by: TEAM HADOOP slide 11/22
  • 12. INTRODUCTION TO MapReduce "Break large problem into smaller parts, solve in parallel, combine results." Prepared by: TEAM HADOOP slide 12/22
  • 13. Typical scenario  How many times is the word „IT‟ present? You‟ll probably count but in a 30k paged document, can you?? Prepared by: TEAM HADOOP slide 13/22
  • 14. Map Reduce Typical Illustration Prepared by: TEAM HADOOP slide 14/22
  • 15. Map Reduce paradigm Input Output Map Reduce Shuffle/Sort Prepared by: TEAM HADOOP slide 15/22
  • 16. Map Reduce paradigm  Map: transforms input record to intermediate (key, value) pair Prepared by: TEAM HADOOP slide 16/22
  • 17. Map Reduce paradigm  Reduce: transforms all records for given key to final output. Prepared by: TEAM HADOOP slide 17/22
  • 18. Map reduce principles Move code to data (local computation) Abstract away fault Allow programs to scale tolerance, synchronization, etc. transparently w.r.t size of input Prepared by: TEAM HADOOP slide 18/22
  • 19. Implementation: Hardware Prepared by: TEAM HADOOP sroy choudhury7@gmail.com slide 19/22
  • 20. Map Reduce: strengths  Batch, offline jobs  Write-once, read-many across full data set  Usually, though not always, simple computations  I/O bound by disk/network bandwidth Prepared by: TEAM HADOOP slide 20/22
  • 21. What it‟s not! What it‟s not:  High-performance parallel computing, e.g. MPI  Low-latency random access relational database  Always the right solution Prepared by: TEAM HADOOP slide 21/22
  • 22. THANK YOU! QUESTIONS? Prepared by: TEAM HADOOP slide 22/22