SlideShare a Scribd company logo
Introducing
The Hadoop Ecosystem
The Hadoop Ecosystem
Context: Performance Gap Trend




                            Introduction to the Hadoop Ecosystem
                                                                   2
Context: Exponential for Decades
 Abundance of
 - computing & storage
 - generated data (estimated 8ZB in ’15)
 - things
 More data provides greater value
 Traditional data doesn’t scale well
 It’s time for a new approach!




                                           Introduction to the Hadoop Ecosystem
                                                                                  3
New Hardware Approach
Traditional               Big Data
 Exotic HW                 Commodity HW
  - big central servers   -racks of pizza boxes
  - SAN                   -Ethernet
  - RAID                  -JBOD
 Hardware reliability      Unreliable HW
                           Scales further
 Limited scalability
                           Cost effective
 Expensive



                                   Introduction to the Hadoop Ecosystem
                                                                          4
New Software Approach
Traditional         Big Data
 Monolotic           Distributed
  - Centralized     -storage & compute nodes
  - RDBMS               Raw data
 Schema first           Open source
 Proprietary




                               Introduction to the Hadoop Ecosystem
                                                                      5
Hadoop
 De facto big data industry standard (batch)
 Vendor adoption
 - IBM, Microsoft, Oracle, EMC, ...
 A collection of projects at Apache
 - HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ...
 Main components
 - HDFS
 - MapReduce
 Cluster
   Set of machines running HDFS and MapReduce

                                       Introduction to the Hadoop Ecosystem
                                                                              6
HDFS




       Introduction to the Hadoop Ecosystem
                                              7
MapReduce




            Introduction to the Hadoop Ecosystem
                                                   8
MapReduce




            Introduction to the Hadoop Ecosystem
                                                   9
MapReduce




            Introduction to the Hadoop Ecosystem
                                                   10
Typical Adoption Pattern
 An idea that’s impractical without Hadoop
 Build Hadoop-based POC
 Move initial application to production
 Add more datasets and users
 - removing data silos in organizations
 - permitting easy experiments on real data
 Snowballs into institution’s central repository for
 - analysis
   data processing
   data service layer

                                         Introduction to the Hadoop Ecosystem
                                                                                11
Use Case 1: Truvo




                    Introduction to the Hadoop Ecosystem
                                                           12
Use Case 2: UZ Brussel




                         Introduction to the Hadoop Ecosystem
                                                                13
How can you use Hadoop?
 What data are you ignoring?
 - How can you use it?

 How can you combine internal and external data?
 -   Business partners
 -   Feedback from you customers through social media
 -   End your data silos
 -   ...




                                         Introduction to the Hadoop Ecosystem
                                                                                14
DataCrunchers - Big Data Enablers




                              Introduction to the Hadoop Ecosystem
                                                                     15
Introduction to the Hadoop Ecosystem
                                       16

More Related Content

PPTX
Getting more out of your big data
PPTX
Apache hadoop introduction and architecture
PPTX
Hadoop info
PPTX
Big data Analytics Hadoop
PPTX
Hadoop: An Industry Perspective
PPTX
Big data processing with apache spark part1
PPTX
Hadoop Presentation - PPT
PDF
Hw09 Welcome To Hadoop World
Getting more out of your big data
Apache hadoop introduction and architecture
Hadoop info
Big data Analytics Hadoop
Hadoop: An Industry Perspective
Big data processing with apache spark part1
Hadoop Presentation - PPT
Hw09 Welcome To Hadoop World

What's hot (20)

PPTX
HADOOP TECHNOLOGY ppt
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
PDF
Hadoop core concepts
PPTX
Hadoop Tutorial For Beginners
PPTX
Hadoop and big data
PPTX
Big data and hadoop
PPTX
Big data & hadoop
PDF
Learning How to Learn Hadoop
PPTX
Apache Hadoop at 10
PPT
Big Data and Hadoop Basics
PPTX
Hadoop: Distributed Data Processing
PPTX
Hadoop for beginners free course ppt
PPTX
Big Data and Hadoop
PDF
Introduction to Bigdata and HADOOP
PPTX
Big data and Hadoop
PDF
What is hadoop
PPTX
PPT on Hadoop
DOCX
Hadoop Seminar Report
PPTX
Hadoop and Big Data
PPTX
Hadoop project design and a usecase
HADOOP TECHNOLOGY ppt
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop core concepts
Hadoop Tutorial For Beginners
Hadoop and big data
Big data and hadoop
Big data & hadoop
Learning How to Learn Hadoop
Apache Hadoop at 10
Big Data and Hadoop Basics
Hadoop: Distributed Data Processing
Hadoop for beginners free course ppt
Big Data and Hadoop
Introduction to Bigdata and HADOOP
Big data and Hadoop
What is hadoop
PPT on Hadoop
Hadoop Seminar Report
Hadoop and Big Data
Hadoop project design and a usecase
Ad

Similar to Introducing the hadoop ecosystem (20)

PDF
Introduction to Hadoop
PDF
Hadoop Business Cases
PDF
Hadoop Overview
 
PDF
Hw09 Data Processing In The Enterprise
PPTX
Why hadoop for data science?
PDF
White Paper: Hadoop in Life Sciences — An Introduction
 
PPTX
Hadoop.powerpoint.pptx
PPTX
Bigdata and hadoop
PDF
Hadoop essentials by shiva achari - sample chapter
PPTX
Hadoop and Big Data: Revealed
PDF
Attaching cloud storage to a campus grid using parrot, chirp, and hadoop
PPTX
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
PPTX
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
DOCX
PPTX
Big Data in the Microsoft Platform
PPTX
201305 hadoop jpl-v3
PDF
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
DOCX
Hadoop online training
DOCX
Hadoop Seminar Report
PDF
Modern data warehouse
Introduction to Hadoop
Hadoop Business Cases
Hadoop Overview
 
Hw09 Data Processing In The Enterprise
Why hadoop for data science?
White Paper: Hadoop in Life Sciences — An Introduction
 
Hadoop.powerpoint.pptx
Bigdata and hadoop
Hadoop essentials by shiva achari - sample chapter
Hadoop and Big Data: Revealed
Attaching cloud storage to a campus grid using parrot, chirp, and hadoop
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
Big Data in the Microsoft Platform
201305 hadoop jpl-v3
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
Hadoop online training
Hadoop Seminar Report
Modern data warehouse
Ad

Introducing the hadoop ecosystem

  • 2. Context: Performance Gap Trend Introduction to the Hadoop Ecosystem 2
  • 3. Context: Exponential for Decades Abundance of - computing & storage - generated data (estimated 8ZB in ’15) - things More data provides greater value Traditional data doesn’t scale well It’s time for a new approach! Introduction to the Hadoop Ecosystem 3
  • 4. New Hardware Approach Traditional Big Data Exotic HW Commodity HW - big central servers -racks of pizza boxes - SAN -Ethernet - RAID -JBOD Hardware reliability Unreliable HW Scales further Limited scalability Cost effective Expensive Introduction to the Hadoop Ecosystem 4
  • 5. New Software Approach Traditional Big Data Monolotic Distributed - Centralized -storage & compute nodes - RDBMS Raw data Schema first Open source Proprietary Introduction to the Hadoop Ecosystem 5
  • 6. Hadoop De facto big data industry standard (batch) Vendor adoption - IBM, Microsoft, Oracle, EMC, ... A collection of projects at Apache - HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ... Main components - HDFS - MapReduce Cluster Set of machines running HDFS and MapReduce Introduction to the Hadoop Ecosystem 6
  • 7. HDFS Introduction to the Hadoop Ecosystem 7
  • 8. MapReduce Introduction to the Hadoop Ecosystem 8
  • 9. MapReduce Introduction to the Hadoop Ecosystem 9
  • 10. MapReduce Introduction to the Hadoop Ecosystem 10
  • 11. Typical Adoption Pattern An idea that’s impractical without Hadoop Build Hadoop-based POC Move initial application to production Add more datasets and users - removing data silos in organizations - permitting easy experiments on real data Snowballs into institution’s central repository for - analysis data processing data service layer Introduction to the Hadoop Ecosystem 11
  • 12. Use Case 1: Truvo Introduction to the Hadoop Ecosystem 12
  • 13. Use Case 2: UZ Brussel Introduction to the Hadoop Ecosystem 13
  • 14. How can you use Hadoop? What data are you ignoring? - How can you use it? How can you combine internal and external data? - Business partners - Feedback from you customers through social media - End your data silos - ... Introduction to the Hadoop Ecosystem 14
  • 15. DataCrunchers - Big Data Enablers Introduction to the Hadoop Ecosystem 15
  • 16. Introduction to the Hadoop Ecosystem 16