Introducing the hadoop ecosystem

Introducing
The Hadoop Ecosystem
The Hadoop Ecosystem

Context: Performance Gap Trend

Introduction to the Hadoop Ecosystem
2

Context: Exponential for Decades
Abundance of
- computing & storage
- generated data (estimated 8ZB in ’15)
- things
More data provides greater value
Traditional data doesn’t scale well
It’s time for a new approach!

3

New Hardware Approach
Traditional Big Data
Exotic HW Commodity HW
- big central servers -racks of pizza boxes
- SAN -Ethernet
- RAID -JBOD
Hardware reliability Unreliable HW
Scales further
Limited scalability
Cost effective
Expensive

4

New Software Approach
Traditional Big Data
Monolotic Distributed
- Centralized -storage & compute nodes
- RDBMS Raw data
Schema first Open source
Proprietary

5

Hadoop
De facto big data industry standard (batch)
Vendor adoption
- IBM, Microsoft, Oracle, EMC, ...
A collection of projects at Apache
- HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ...
Main components
- HDFS
- MapReduce
Cluster
Set of machines running HDFS and MapReduce

6

HDFS

7

MapReduce

8

MapReduce

9

MapReduce

10

Typical Adoption Pattern
An idea that’s impractical without Hadoop
Build Hadoop-based POC
Move initial application to production
Add more datasets and users
- removing data silos in organizations
- permitting easy experiments on real data
Snowballs into institution’s central repository for
- analysis
data processing
data service layer

11

Use Case 1: Truvo

12

Use Case 2: UZ Brussel

13

How can you use Hadoop?
What data are you ignoring?
- How can you use it?

How can you combine internal and external data?
- Business partners
- Feedback from you customers through social media
- End your data silos
- ...

14

DataCrunchers - Big Data Enablers

15

16

Introducing the hadoop ecosystem

More Related Content

What's hot (20)

Similar to Introducing the hadoop ecosystem (20)

Introducing the hadoop ecosystem