SlideShare a Scribd company logo
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant:
Hadoop 2.0
http://bit.ly/RidingElephants
Append only distributed file-system
In the beginning…
Map Reduce
Java.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)
HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, Pig
Cascading
Scalding
SQL on Hadoop
Learning to share the toys
HBase
Solr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
Task
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
MR Status
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Typical Hadoop 1.x setup
HBaseProductionAdhoc
Typical Hadoop 1.x setup
HBaseProductionAdhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
Removing the choke point
Advantages
60%-150% better usage
Long running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
Services
Direct to YARN APIs
Spring Data Hadoop abstraction
Streaming
Why?
Machine Learning
Graphs
Services
Distributed Shell - Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓
✓
✓
✓
❌
RRDs
✓
Mesos
Wider sharing
HadoopSparkAurora
Mesos Framework
Hardware
YARN
MapReduce HBase etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigms
Infrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
simon@simonellistonball.com
http://bit.ly/RidingElephants

More Related Content

PDF
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
PDF
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
PDF
SparkR: Enabling Interactive Data Science at Scale
PDF
SparkR: Enabling Interactive Data Science at Scale on Hadoop
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
PPTX
Distributed ML with Dask and Kubernetes
PDF
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale on Hadoop
Project Tungsten: Bringing Spark Closer to Bare Metal
Distributed ML with Dask and Kubernetes
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...

What's hot (20)

PPTX
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
PDF
The BDAS Open Source Community
PDF
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
PDF
Apache Spark Performance is too hard. Let's make it easier
PDF
A Graph-Based Method For Cross-Entity Threat Detection
PDF
H2O World - PySparkling Water - Nidhi Mehta
PDF
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
PPTX
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
PDF
Spark Under the Hood - Meetup @ Data Science London
PPTX
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
PDF
London Spark Meetup Project Tungsten Oct 12 2015
PDF
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
PPTX
Introduction to MapReduce
PPTX
LocationTech Projects
PPTX
Dask: Scaling Python
PDF
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
PDF
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
PDF
Lens: Data exploration with Dask and Jupyter widgets
PDF
Reliable Performance at Scale with Apache Spark on Kubernetes
PPTX
First impressions of SparkR: our own machine learning algorithm
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
The BDAS Open Source Community
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Apache Spark Performance is too hard. Let's make it easier
A Graph-Based Method For Cross-Entity Threat Detection
H2O World - PySparkling Water - Nidhi Mehta
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Spark Under the Hood - Meetup @ Data Science London
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
London Spark Meetup Project Tungsten Oct 12 2015
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Introduction to MapReduce
LocationTech Projects
Dask: Scaling Python
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
Lens: Data exploration with Dask and Jupyter widgets
Reliable Performance at Scale with Apache Spark on Kubernetes
First impressions of SparkR: our own machine learning algorithm
Ad

Similar to Riding the Elephant - Hadoop 2.0 (20)

PPT
Riding the Elephant - Hadoop 2.0
PDF
Hadoop 2.0 handout 5.0
PDF
Hadoop - Past, Present and Future - v1.2
PPTX
Intro to hadoop
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
PDF
Discover.hdp2.2.h base.final[2]
PDF
PDF
Hadoop 2 - Beyond MapReduce
PPTX
Introduction to the Hadoop EcoSystem
PDF
What-is-Hadoop.pdfjutyuityitritritrirtitri
PDF
Hadoop ecosystem
PDF
Hadoop ecosystem
PDF
Hadoop 2 - Going beyond MapReduce
PDF
Hadoop, Taming Elephants
PDF
Elephant in the cloud
PPTX
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
PDF
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
PPTX
Introduction to hadoop V2
PDF
What is hadoop
PDF
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Riding the Elephant - Hadoop 2.0
Hadoop 2.0 handout 5.0
Hadoop - Past, Present and Future - v1.2
Intro to hadoop
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Discover.hdp2.2.h base.final[2]
Hadoop 2 - Beyond MapReduce
Introduction to the Hadoop EcoSystem
What-is-Hadoop.pdfjutyuityitritritrirtitri
Hadoop ecosystem
Hadoop ecosystem
Hadoop 2 - Going beyond MapReduce
Hadoop, Taming Elephants
Elephant in the cloud
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
Introduction to hadoop V2
What is hadoop
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Ad

More from Simon Elliston Ball (11)

PPTX
A streaming architecture for Cyber Security - Apache Metron
PPTX
mcubed london - data science at the edge
PPTX
When to no sql and when to know sql javaone
PPTX
Machine learning without the PhD - azure ml
PPTX
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
PPTX
Getting your Big Data on with HDInsight
PDF
Finding and Using Big Data in your business
PDF
When to NoSQL and when to know SQL
PDF
Mongo db for c# developers
PDF
NDC London 2013 - Mongo db for c# developers
PDF
Mongo db for C# Developers
A streaming architecture for Cyber Security - Apache Metron
mcubed london - data science at the edge
When to no sql and when to know sql javaone
Machine learning without the PhD - azure ml
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Getting your Big Data on with HDInsight
Finding and Using Big Data in your business
When to NoSQL and when to know SQL
Mongo db for c# developers
NDC London 2013 - Mongo db for c# developers
Mongo db for C# Developers

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
Modernizing your data center with Dell and AMD
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Chapter 3 Spatial Domain Image Processing.pdf

Riding the Elephant - Hadoop 2.0