SlideShare a Scribd company logo
Hadoop for the disillusioned
Steve Watt, Red Hat

CC flickr rubenswieringa

@wattsteve
@wattsteve
Wired Magazine - July 2008

@wattsteve
Hadoop in 2013
Platform Layers

Technologies

Computational
Runtimes

YARN, GiRAPH, MapReduce,
HBase, Phoenix, Spark/BDAS,
Drill, Impala, Stinger & more

FileSystems

Azure, CassandraFS, CephFS,
CleverSafe, GlusterFS, GridGain,
HDFS, Lustre
MapR FS, S3, SWIFT, Quantcast
FS, Symantec VCFS & more

Infrastructures

System on a Chip, x86,
Virtualization and Cloud

Distributions

Cloudera, Hortonworks, IBM,
Intel, MapR, WanDisco

CC flickr lowfatbrains

@wattsteve
Source: Gartner Hype Cycle

@wattsteve
Your data is growing beyond your ability to manage & query it

CC flickr kakadu

@wattsteve
Save money when asking the same questions of your data

CC flickr martijnsnels

@wattsteve
Hadoop Customer, “Great, but now what?”
Innovators

Early
Adopters

Early
Majority

Late
Majority

Laggards

CHASM

Geoffrey Moore’s Technology Adoption Lifecycle

@wattsteve
new
and build data products

CC flickr cbcastro

@wattsteve






Ask your domain experts and LOB folks what unanswered questions they have
Where can you get the data you need to answer that question? (domain experts should know
where to get it)
Some of this data may be outside your organization (Social Media, Sensor Data, Data
brokerages/Marketplaces, Web Pages) and some of it may be inside.
If the data for the query doesn’t exist, figure out how to instrument or gather it.
Pair your domain experts with your data engineers so they can work out how to obtain and
massage the data given the types of queries desired

CC flickr birdwatcher63

@wattsteve
• Building data products is a similar exercise except that it involves typical product planning,
such as identifying a market.
• This is also a great way for an organization to explore what assets they have within their data

CC flickr syume

@wattsteve
Mapping the night sky

CC flickr bobfamiliar

@wattsteve
Analyzing farm soil content
to predict human conflict

CC flickr oxfam

@wattsteve
Crisis Management for the
Chilean Earthquake

CC flickr flodigrip

@wattsteve
Thanks for listening

Steve Watt

swatt@redhat.com

@wattsteve

More Related Content

PPT
Hadoop file systems
PPT
4 hadoop for-the-disillusioned
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PPT
Cloud computing and Hadoop introduction
PPTX
Extending your Hadoop Implementation to the Cloud
PDF
Data streaming at VRT
Hadoop file systems
4 hadoop for-the-disillusioned
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Cloud computing and Hadoop introduction
Extending your Hadoop Implementation to the Cloud
Data streaming at VRT

What's hot (20)

PDF
Enabling Apache Spark for Hybrid Cloud
PPT
Final deck
PDF
Build Your Own Data Beast : Greenplum + Dell
PPTX
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
PDF
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
PDF
Developing high frequency indicators using real time tick data on apache supe...
PDF
Realtime
 Distributed Analysis
 of Datastreams
PDF
Data Tools and the Data Scientist Shortage
PPTX
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
PDF
Introduction to Hivemall
PDF
Building Open Data Lakes on AWS with Debezium and Apache Hudi
PPTX
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
PDF
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
PPTX
Powers of Ten Redux
PDF
Graph Computing with JanusGraph
PDF
Data in Motion vs Data at Rest
PPTX
Dataiku Flow and dctc - Berlin Buzzwords
PPTX
Snaplogic Live: Big Data in Motion
PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Accelerating analytics in a new era of data
Enabling Apache Spark for Hybrid Cloud
Final deck
Build Your Own Data Beast : Greenplum + Dell
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Developing high frequency indicators using real time tick data on apache supe...
Realtime
 Distributed Analysis
 of Datastreams
Data Tools and the Data Scientist Shortage
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Introduction to Hivemall
Building Open Data Lakes on AWS with Debezium and Apache Hudi
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
Powers of Ten Redux
Graph Computing with JanusGraph
Data in Motion vs Data at Rest
Dataiku Flow and dctc - Berlin Buzzwords
Snaplogic Live: Big Data in Motion
Iceberg + Alluxio for Fast Data Analytics
Accelerating analytics in a new era of data
Ad

Similar to Hadoop for the disillusioned (20)

PPTX
Not Just Another Overview of Apache Hadoop
PPTX
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
PDF
Hortonworks and HP Vertica Webinar
PDF
50 Shades of SQL
PPTX
Keynote - Cloudera - Mike Olson - Hadoop World 2010
PPTX
Cloudera - Mike Olson - Hadoop World 2010
PPTX
Intro to Hadoop
PPTX
201305 hadoop jpl-v3
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PPTX
10 concepts the enterprise decision maker needs to understand about Hadoop
PDF
Hadoop and SQL: Delivery Analytics Across the Organization
PPTX
Big Data Lessons from the Cloud
PPTX
5 Things that Make Hadoop a Game Changer
PDF
Hortonworks Big Data & Hadoop
PDF
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
PDF
Hadoop data-lake-white-paper
PDF
The Hadoop Ecosystem for Developers
PPTX
Hadoop Innovation Summit 2014
PPTX
Hadoop and Big Data: Revealed
Not Just Another Overview of Apache Hadoop
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hortonworks and HP Vertica Webinar
50 Shades of SQL
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010
Intro to Hadoop
201305 hadoop jpl-v3
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Data Lake for the Cloud: Extending your Hadoop Implementation
10 concepts the enterprise decision maker needs to understand about Hadoop
Hadoop and SQL: Delivery Analytics Across the Organization
Big Data Lessons from the Cloud
5 Things that Make Hadoop a Game Changer
Hortonworks Big Data & Hadoop
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Hadoop data-lake-white-paper
The Hadoop Ecosystem for Developers
Hadoop Innovation Summit 2014
Hadoop and Big Data: Revealed
Ad

More from Steve Watt (10)

PPT
Building Clustered Applications with Kubernetes and Docker
PPT
Building Clustered Applications with Kubernetes and Docker
ODP
Apache con 2013-hadoop
PPTX
Apache con 2012 taking the guesswork out of your hadoop infrastructure
PPTX
Mining the Web for Information using Hadoop
PPTX
Tech4Africa - Opportunities around Big Data
PPTX
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
PPT
Web Crawling and Data Gathering with Apache Nutch
PPT
Introduction to Apache Hadoop
PPTX
Extractiv
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
Apache con 2013-hadoop
Apache con 2012 taking the guesswork out of your hadoop infrastructure
Mining the Web for Information using Hadoop
Tech4Africa - Opportunities around Big Data
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Web Crawling and Data Gathering with Apache Nutch
Introduction to Apache Hadoop
Extractiv

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
project resource management chapter-09.pdf
PPTX
Modernising the Digital Integration Hub
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
The various Industrial Revolutions .pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Architecture types and enterprise applications.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Getting Started with Data Integration: FME Form 101
PDF
WOOl fibre morphology and structure.pdf for textiles
OMC Textile Division Presentation 2021.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
1. Introduction to Computer Programming.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
project resource management chapter-09.pdf
Modernising the Digital Integration Hub
Web App vs Mobile App What Should You Build First.pdf
O2C Customer Invoices to Receipt V15A.pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
The various Industrial Revolutions .pptx
Programs and apps: productivity, graphics, security and other tools
Architecture types and enterprise applications.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A novel scalable deep ensemble learning framework for big data classification...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
cloud_computing_Infrastucture_as_cloud_p
Getting Started with Data Integration: FME Form 101
WOOl fibre morphology and structure.pdf for textiles

Hadoop for the disillusioned

  • 1. Hadoop for the disillusioned Steve Watt, Red Hat CC flickr rubenswieringa @wattsteve
  • 3. Wired Magazine - July 2008 @wattsteve
  • 4. Hadoop in 2013 Platform Layers Technologies Computational Runtimes YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger & more FileSystems Azure, CassandraFS, CephFS, CleverSafe, GlusterFS, GridGain, HDFS, Lustre MapR FS, S3, SWIFT, Quantcast FS, Symantec VCFS & more Infrastructures System on a Chip, x86, Virtualization and Cloud Distributions Cloudera, Hortonworks, IBM, Intel, MapR, WanDisco CC flickr lowfatbrains @wattsteve
  • 5. Source: Gartner Hype Cycle @wattsteve
  • 6. Your data is growing beyond your ability to manage & query it CC flickr kakadu @wattsteve
  • 7. Save money when asking the same questions of your data CC flickr martijnsnels @wattsteve
  • 8. Hadoop Customer, “Great, but now what?” Innovators Early Adopters Early Majority Late Majority Laggards CHASM Geoffrey Moore’s Technology Adoption Lifecycle @wattsteve
  • 9. new and build data products CC flickr cbcastro @wattsteve
  • 10.      Ask your domain experts and LOB folks what unanswered questions they have Where can you get the data you need to answer that question? (domain experts should know where to get it) Some of this data may be outside your organization (Social Media, Sensor Data, Data brokerages/Marketplaces, Web Pages) and some of it may be inside. If the data for the query doesn’t exist, figure out how to instrument or gather it. Pair your domain experts with your data engineers so they can work out how to obtain and massage the data given the types of queries desired CC flickr birdwatcher63 @wattsteve
  • 11. • Building data products is a similar exercise except that it involves typical product planning, such as identifying a market. • This is also a great way for an organization to explore what assets they have within their data CC flickr syume @wattsteve
  • 12. Mapping the night sky CC flickr bobfamiliar @wattsteve
  • 13. Analyzing farm soil content to predict human conflict CC flickr oxfam @wattsteve
  • 14. Crisis Management for the Chilean Earthquake CC flickr flodigrip @wattsteve
  • 15. Thanks for listening Steve Watt swatt@redhat.com @wattsteve

Editor's Notes

  • #3: Hadoop is not new - NY Time Source: http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
  • #4: Wired Source: http://www.wired.com/wired/issue/16-07
  • #6: Source: Gartner Hype Cycle - http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp “Big Data is a fad”, “Its just BI 2.0”, “This is all just hype”, “We can’t figure out how to use it”, “There’s nothing new here”, “It’s not ready”, “Too few support options”, “Its too hard”
  • #7: - You’re sharding your RDBMS infrastructure and its becoming brittle and a nightmare to maintain. - Twitter has a good quote where they stated it used to take them 2 weeks to run an alter table statement
  • #8: Using Hadoop for ETL to save money by displacing ETL vendors Using Hive to offload datasets and their corresponding queries from your EDW and lower your EDW bill
  • #10: A great way to competitively differentiate with arbitrarily structured data
  • #11: Hadoop’s power is in its single storage repository and its support for arbitrary data structures. You have the technology to ask any question if you just have the data.
  • #13: http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop
  • #14: http://strataconf.com/stratany2013/public/schedule/detail/30810
  • #15: http://vimeo.com/16861296