SlideShare a Scribd company logo
Hadoop for the disillusioned
Steve Watt, Red Hat

CC flickr rubenswieringa

@wattsteve
@wattsteve
Wired Magazine - July 2008

@wattsteve
Hadoop in 2013
Platform Layers

Technologies

Computational
Runtimes

YARN, GiRAPH, MapReduce,
HBase, Phoenix, Spark/BDAS,
Drill, Impala, Stinger & more

FileSystems

Azure, CassandraFS, CephFS,
CleverSafe, GlusterFS, GridGain,
HDFS, Lustre
MapR FS, S3, SWIFT, Quantcast
FS, Symantec VCFS & more

Infrastructures

System on a Chip, x86,
Virtualization and Cloud

Distributions

Cloudera, Hortonworks, IBM,
Intel, MapR, WanDisco

CC flickr lowfatbrains

@wattsteve
Source: Gartner Hype Cycle

@wattsteve
Your data is growing beyond your ability to manage & query it

CC flickr kakadu

@wattsteve
Save money when asking the same questions of your data

CC flickr martijnsnels

@wattsteve
Hadoop Customer, “Great, but now what?”
Innovators

Early
Adopters

Early
Majority

Late
Majority

Laggards

CHASM

Geoffrey Moore’s Technology Adoption Lifecycle

@wattsteve
new
and build data products

CC flickr cbcastro

@wattsteve






Ask your domain experts and LOB folks what unanswered questions they have
Where can you get the data you need to answer that question? (domain experts should know
where to get it)
Some of this data may be outside your organization (Social Media, Sensor Data, Data
brokerages/Marketplaces, Web Pages) and some of it may be inside.
If the data for the query doesn’t exist, figure out how to instrument or gather it.
Pair your domain experts with your data engineers so they can work out how to obtain and
massage the data given the types of queries desired

CC flickr birdwatcher63

@wattsteve
• Building data products is a similar exercise except that it involves typical product planning,
such as identifying a market.
• This is also a great way for an organization to explore what assets they have within their data

CC flickr syume

@wattsteve
Mapping the night sky

CC flickr bobfamiliar

@wattsteve
Analyzing farm soil content
to predict human conflict

CC flickr oxfam

@wattsteve
Crisis Management for the
Chilean Earthquake

CC flickr flodigrip

@wattsteve
Thanks for listening

Steve Watt

swatt@redhat.com

@wattsteve

More Related Content

PPT
Hadoop for the disillusioned
PPT
Hadoop file systems
PPTX
Cloud and Big Data trends
PPTX
IBM Big Data Platform, 2012
PDF
Data Orchestration for AI, Big Data, and Cloud
PPTX
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
PPTX
Attributes of a Modern Data Warehouse - Gartner Catalyst
PDF
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Hadoop for the disillusioned
Hadoop file systems
Cloud and Big Data trends
IBM Big Data Platform, 2012
Data Orchestration for AI, Big Data, and Cloud
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Attributes of a Modern Data Warehouse - Gartner Catalyst
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

What's hot (19)

PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Data Tools and the Data Scientist Shortage
PPTX
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
PPTX
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
PPTX
Hadoop world overview trends and topics
PDF
Data Orchestration for AI, Big Data, and Cloud
PPTX
Data Science in the cloud with Microsoft Azure
PDF
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
PPT
Big Data: The Final Frontier
PDF
Making it easy to work with data
PPT
Cloud computing and Hadoop introduction
PDF
Build Your Own Data Beast : Greenplum + Dell
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
PPTX
Extending your Hadoop Implementation to the Cloud
PDF
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
PPTX
2014 Predictions: Jay Kidd
PPTX
Revolution Analytics
PPTX
Hadoop and Cloudian HyperStore
PDF
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Data Tools and the Data Scientist Shortage
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
Hadoop world overview trends and topics
Data Orchestration for AI, Big Data, and Cloud
Data Science in the cloud with Microsoft Azure
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Big Data: The Final Frontier
Making it easy to work with data
Cloud computing and Hadoop introduction
Build Your Own Data Beast : Greenplum + Dell
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Extending your Hadoop Implementation to the Cloud
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
2014 Predictions: Jay Kidd
Revolution Analytics
Hadoop and Cloudian HyperStore
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Ad

Viewers also liked (6)

PPT
Appistry Lightening Talk from CloudCamp Federal @ FOSE
PDF
Lightening Talk @Symfony Conference 2016
PDF
Microservices Manchester: Lightning Talk - The Hidden Cost of Technology By J...
PPTX
Grokking microservices in 5 minutes
PDF
MuCon 2015 - Microservices in Integration Architecture
PDF
Integration Patterns and Anti-Patterns for Microservices Architectures
Appistry Lightening Talk from CloudCamp Federal @ FOSE
Lightening Talk @Symfony Conference 2016
Microservices Manchester: Lightning Talk - The Hidden Cost of Technology By J...
Grokking microservices in 5 minutes
MuCon 2015 - Microservices in Integration Architecture
Integration Patterns and Anti-Patterns for Microservices Architectures
Ad

Similar to 4 hadoop for-the-disillusioned (20)

PPTX
Not Just Another Overview of Apache Hadoop
PPTX
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
PDF
Hortonworks and HP Vertica Webinar
PDF
50 Shades of SQL
PPTX
Keynote - Cloudera - Mike Olson - Hadoop World 2010
PPTX
Cloudera - Mike Olson - Hadoop World 2010
PPTX
Intro to Hadoop
PPTX
201305 hadoop jpl-v3
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PPTX
10 concepts the enterprise decision maker needs to understand about Hadoop
PDF
Hadoop and SQL: Delivery Analytics Across the Organization
PPTX
Big Data Lessons from the Cloud
PPTX
5 Things that Make Hadoop a Game Changer
PDF
Hortonworks Big Data & Hadoop
PDF
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
PDF
Hadoop data-lake-white-paper
PDF
The Hadoop Ecosystem for Developers
PPTX
Hadoop Innovation Summit 2014
PPTX
Hadoop and Big Data: Revealed
Not Just Another Overview of Apache Hadoop
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hortonworks and HP Vertica Webinar
50 Shades of SQL
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010
Intro to Hadoop
201305 hadoop jpl-v3
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Data Lake for the Cloud: Extending your Hadoop Implementation
10 concepts the enterprise decision maker needs to understand about Hadoop
Hadoop and SQL: Delivery Analytics Across the Organization
Big Data Lessons from the Cloud
5 Things that Make Hadoop a Game Changer
Hortonworks Big Data & Hadoop
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Hadoop data-lake-white-paper
The Hadoop Ecosystem for Developers
Hadoop Innovation Summit 2014
Hadoop and Big Data: Revealed

More from BigDataCamp (10)

PDF
Ingest, Transform & Visualize w Amazon Web Services
PPTX
BigDataCamp LA 2014 Schedule
PDF
5 kinesis lightning
PPTX
3 analytic strategies shree dandekar dell 12-10-13
PPTX
2 one spot redshift bigdatacamp 1.02
PPTX
1 big datacampdell2013
PDF
Stefan Groschupf of Datameer Gives Lightning Talk at BigDataCamp
PPT
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
PDF
Stefan Groschupf of Datameer Gives Lightning Tallk at BigDataCamp
PPTX
Sam Charrington Of Appistry Gives Lighting Talk
Ingest, Transform & Visualize w Amazon Web Services
BigDataCamp LA 2014 Schedule
5 kinesis lightning
3 analytic strategies shree dandekar dell 12-10-13
2 one spot redshift bigdatacamp 1.02
1 big datacampdell2013
Stefan Groschupf of Datameer Gives Lightning Talk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Stefan Groschupf of Datameer Gives Lightning Tallk at BigDataCamp
Sam Charrington Of Appistry Gives Lighting Talk

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf

4 hadoop for-the-disillusioned

  • 1. Hadoop for the disillusioned Steve Watt, Red Hat CC flickr rubenswieringa @wattsteve
  • 3. Wired Magazine - July 2008 @wattsteve
  • 4. Hadoop in 2013 Platform Layers Technologies Computational Runtimes YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger & more FileSystems Azure, CassandraFS, CephFS, CleverSafe, GlusterFS, GridGain, HDFS, Lustre MapR FS, S3, SWIFT, Quantcast FS, Symantec VCFS & more Infrastructures System on a Chip, x86, Virtualization and Cloud Distributions Cloudera, Hortonworks, IBM, Intel, MapR, WanDisco CC flickr lowfatbrains @wattsteve
  • 5. Source: Gartner Hype Cycle @wattsteve
  • 6. Your data is growing beyond your ability to manage & query it CC flickr kakadu @wattsteve
  • 7. Save money when asking the same questions of your data CC flickr martijnsnels @wattsteve
  • 8. Hadoop Customer, “Great, but now what?” Innovators Early Adopters Early Majority Late Majority Laggards CHASM Geoffrey Moore’s Technology Adoption Lifecycle @wattsteve
  • 9. new and build data products CC flickr cbcastro @wattsteve
  • 10.      Ask your domain experts and LOB folks what unanswered questions they have Where can you get the data you need to answer that question? (domain experts should know where to get it) Some of this data may be outside your organization (Social Media, Sensor Data, Data brokerages/Marketplaces, Web Pages) and some of it may be inside. If the data for the query doesn’t exist, figure out how to instrument or gather it. Pair your domain experts with your data engineers so they can work out how to obtain and massage the data given the types of queries desired CC flickr birdwatcher63 @wattsteve
  • 11. • Building data products is a similar exercise except that it involves typical product planning, such as identifying a market. • This is also a great way for an organization to explore what assets they have within their data CC flickr syume @wattsteve
  • 12. Mapping the night sky CC flickr bobfamiliar @wattsteve
  • 13. Analyzing farm soil content to predict human conflict CC flickr oxfam @wattsteve
  • 14. Crisis Management for the Chilean Earthquake CC flickr flodigrip @wattsteve
  • 15. Thanks for listening Steve Watt swatt@redhat.com @wattsteve

Editor's Notes

  • #3: Hadoop is not new - NY Time Source: http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
  • #4: Wired Source: http://www.wired.com/wired/issue/16-07
  • #6: Source: Gartner Hype Cycle - http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp “Big Data is a fad”, “Its just BI 2.0”, “This is all just hype”, “We can’t figure out how to use it”, “There’s nothing new here”, “It’s not ready”, “Too few support options”, “Its too hard”
  • #7: - You’re sharding your RDBMS infrastructure and its becoming brittle and a nightmare to maintain. - Twitter has a good quote where they stated it used to take them 2 weeks to run an alter table statement
  • #8: Using Hadoop for ETL to save money by displacing ETL vendors Using Hive to offload datasets and their corresponding queries from your EDW and lower your EDW bill
  • #10: A great way to competitively differentiate with arbitrarily structured data
  • #11: Hadoop’s power is in its single storage repository and its support for arbitrary data structures. You have the technology to ask any question if you just have the data.
  • #13: http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop
  • #14: http://strataconf.com/stratany2013/public/schedule/detail/30810
  • #15: http://vimeo.com/16861296