SlideShare a Scribd company logo
Hadoop Turns a Corner and Sees the Future
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written
permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained
from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information.
This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner
research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders
may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced
independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on
Independence and Objectivity."
Merv Adrian
Research Vice President, Information Management
Twitter: @merv
Blogs.gartner.com/merv-adrian
Hadoop — Entering Phase Two?
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
NEXUS
Nexus of Forces Drives Innovation
Extreme
Networking
Pervasive
Access
Global-Class
Delivery
"Big," Rich
Context
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Cameras and
microphones widely
deployed
New routes to market via
intelligent objects
Content and services
via connected
products
Everything
has a URL
Remote sensing of
objects and environment
Augmented
reality
Situational
decision support
Building and
infrastructure management
Over 50% of Internet connections are things:
2011: 15+ billion permanent, 50+ billion intermittent
2020: 30+ billion permanent, >200 billion intermittent
Audio
GPRS Wi-Fi NFC
Higher-resolution display
LTE
Flash
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Gartner Definition of Big Data: High-volume, velocity and variety
information assets that demand cost-effective, innovative forms of
information processing for enhanced insight and decision making.
Gartner Research Circle 2013 Big Data Survey
687Respondents
Worldwide
$3.2BMean
Company Size
5,100
Mean
Employees
60%Mainstream
Adopters
18%Focused on
Running/Maintaining
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Are They Investing?
30%
Have
31%
No plans
at this
time
19%
Plan to within
the next
year
15%
Plan to
within two
years
5%
Don't
know
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
How Does That Compare to Last Year?
Note — Survey base increased from 473 in 2012 to 687 in 2013
27
15
16
11
30
19
15
31
5
Have invested
Within next year
Within two years
No plans
Don't know
20132012
0 10 20 30 40
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Things Are Done Differently in Silicon Valley …
Traditional IM
• Requirements based
• Top-down design
• Integration and reuse
• Technology consolidation
• World of DW and ECM
• Competence centers
• Better decisions
• Commercial software
"Big Data" Style
• Opportunity oriented
• Bottom-up experimentation
• Immediate use
• Tool proliferation
• "World of Hadoop"
• Hackathons
• Better business
• Open source
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Introducing: The Open-Source Car!
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Apache Hadoop is a set of standard open-source software projects
that provide a framework for using massive amounts of data across
a distributed network
The standards steward — Apache Software Foundation — manages
and distributes many typical components of "Hadoop" platform
Many distributions exist —
Built and/or marketed by pure-play specialists or major vendors and they
include additional open-source and commercial components
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Apache Hadoop is a set of standard open source software projects
that provide a framework for using massive amounts of data across
a distributed network
The standards steward — Apache Software Foundation — manages
and distributes many typical components of "Hadoop" platform
Many distributions exist —
Built and/or marketed by pure play specialists or major vendors and they
include additional open source and commercial components
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Clients Ask: Which Projects Are "Hadoop"?
• Minimum set (from Apache website):
- Apache HDFS
- Apache MapReduce
- Apache Yarn
• Other independent Apache projects:
Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout,
Pig, ZooKeeper
- The virtuous circle of open-source community
• Apache Hadoop is version 1.0. Version 2.0,
including Yarn, is alpha.
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Rich, Complex Set of Functional Choices
Ingest/Propagate
Persist
Describe, Develop
Monitor, Administer
Analytics, Machine Learning
Compute, Search
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Ingest/Propagate
Apache Flume, Apache Kafka, Apache Sqoop, HDFS NFS,
Informatica HParser, DBMS vendor utilities, Talend, WebHDFS
Import data into HDFS
(or alternatives)
• Commercial DBMS, DI or OSS
• "Big data" ≠ Hadoop —
import is not always required
− MapReduce inside DBMSs, HPCC,
SAS, Splunk, others
Export data into RDBMS
(or alternatives)
• NoSQL DBMS supported, or
offer integration
• On same cluster (HBase),
even same nodes (Hadapt)
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Also included here: "intercept-based" data remediation
Develop refers to coding functions, as in Pig, for execution elsewhere,
such as MapReduce
Metadata (Hive, Hcatalog) describes for other stack components
and external ones; e.g., DI and BI tools
Describe, Develop
Apache Crunch, Apache Hive, Apache Pig, Apache Tika, Cascading,
Cloudera Hue, DataFu, Dataguise, IBM Jaql
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Runtime execution for programs created to run against HDFS
or HBase data
With Apache Hadoop 2.0, MapReduce will begin to lose its exclusivity
in "the basic stack" with Yarn support
MapReduce was first, but others have emerged as additions/
alternatives/supplements
Compute, Search
Apache Blur, Apache Drill, Apache Giraph, Apache Hama, Apache Lucene, Apache MapReduce,
Apache Solr, Cloudera Impala, HP HAVEn, IBM BigSQL, IBM InfoSphere Streams, HStreaming,
Pivotal HAWQ, SQLstream, Storm, Teradata SQL-H
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
File system: Append only, access methods at OS level
Database: Collected and structured to facilitate storage, retrieval, modification,
and deletion in online, not only batch, mode
Serialized: Format that can be stored in a database, eliminating
byte ordering, adding metadata
Persist
File System: Apache HDFS, IBM GPFS, Lustre, MapR Data Platform
Serialization: Apache Avro, RCFile (and ORCFile), SequenceFile, Text, Trevni
DBMS: Apache Accumulo, Apache Cassandra, Apache HBase, Google Dremel, Hadapt,
HP Vertica, IBM DB2, Kognitio, Oracle, Oracle MySQL, RainStor, Teradata Aster, Teradata, others
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
System health and administration
Cloud configuration and connection to resources
Virtualization and resource management
Job management and orchestration
Monitor, Administer
Apache Ambari, Apache Chukwa, Apache Falcon, Apache Oozie, Apache Whirr,
Apache ZooKeeper, Cloudera Manager, Ganglia, Nagios, Pivotal Serengeti
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Analytics, Machine Learning
Apache Drill, Apache Hive, Apache Mahout, Datameer, IBM Big Sheets, IBM BigSQL,
Karmasphere, Microsoft Excel, Platfora, Revolution Analytics RHadoop, SAS, Skytree
This is where the future is — it's not just "a part of the stack" but why it exists
Machine learning, advanced statistical analysis, scenario modeling
"BI for Hadoop": Statistical libraries for use in programs, spreadsheets,
reporting, visualization tools
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Go Ahead — Pick the Pieces You Need
Ingest/Propagate
Persist
Describe, Develop
Monitor, Administer
Analytics, Machine Learning
Compute, Search
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Distribution Vendors Sort It Out for You
Megavendors:
Amazon, EMC
Pivotal, IBM, Intel
Megapartners:
Dell, HP, NetApp,
Microsoft, Oracle,
Teradata
Leading pure plays:
Cloudera, Hortonworks, MapR
Others:
Datastax, LucidWorks, RainStor, Sqrrl,
WANdisco, Zettaset
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Hadoop's Great Leap Forward
Hadoop has moved to the next stage with Apache Hadoop 2.0.
• Mainstream vendors are all interested, contributing and adding value
• Skills development is ramping rapidly
From To
Single-stack Yarn-based multistyle environment, supporting
multiple engines
Batch-only, file-based stack Interactive capabilities with multiple optional databases
SQL translation
with Hive
"SQL in front of Hadoop": Cloudera Impala, IBM Big
SQL, Pivotal Hawq, Platfora, others
Relatively unmanaged Ambari-based beginnings of real management
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
What's Next?
Search
Advanced
prebuilt
analytic
functions
Cluster,
appliance
or cloud?
Virtualization
Graph
processing
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
What's Still Needed?
Security
Data Warehousing Tools
Governance
Distributed Optimization
Subproject Optimization Skills
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
By 2015, big data demand will reach
4.4 million jobs worldwide,
but only one-third of those jobs will be filled.
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Americas EMEA APJ
Education
Wholesale Trade
Healthcare Providers
Transportation
Utilities
Retail
Insurance
Communications, Media & Services
Government
Banking & Securities
Manufacturing & Natural Resources
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Recommendations
 Audit your data — find "dark data" and map it to business
opportunities to identify pilot projects
 Familiarize yourself with the capabilities of available
Hadoop distributions
 Build skills and recruit within the organization from early
experimenters for a data science lab
 Consider cloud pilots to minimize capital expenditure
© 2013 Gartner, Inc. and/or its affiliates. All rights reserved.
Thank you!
http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/
Hadoop Turns a Corner and Sees the Future

More Related Content

PPTX
Demystifying Systems for Interactive and Real-time Analytics
PDF
Jubatus Invited Talk at XLDB Asia
PDF
Mahout
PPTX
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
PPTX
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
PPTX
Apache Mahout
PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
PPTX
Big Data Analytics with Hadoop
Demystifying Systems for Interactive and Real-time Analytics
Jubatus Invited Talk at XLDB Asia
Mahout
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
Apache Mahout
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analytics with Hadoop

What's hot (20)

PDF
Intro to Machine Learning with H2O and AWS
PPTX
Bigdata
PPTX
Hadoop - An Introduction
PDF
Future of Data - Big Data
PPTX
Learn Big Data & Hadoop
PPTX
Data infrastructure and Hadoop at LinkedIn
PDF
Introduction To Big Data Analytics On Hadoop - SpringPeople
PPTX
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
ODP
Big data, map reduce and beyond
PPTX
BigData
PPTX
Why hadoop for data science?
PPT
Big Data: An Overview
PPTX
Whatisbigdataandwhylearnhadoop
PPT
Big Data Analytics 2014
PPTX
DW Appliance
PPTX
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
PDF
Hadoop,Big Data Analytics and More
PDF
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
PPTX
Big Data, Baby Steps
PPTX
Big Data Technology Stack : Nutshell
Intro to Machine Learning with H2O and AWS
Bigdata
Hadoop - An Introduction
Future of Data - Big Data
Learn Big Data & Hadoop
Data infrastructure and Hadoop at LinkedIn
Introduction To Big Data Analytics On Hadoop - SpringPeople
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big data, map reduce and beyond
BigData
Why hadoop for data science?
Big Data: An Overview
Whatisbigdataandwhylearnhadoop
Big Data Analytics 2014
DW Appliance
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Hadoop,Big Data Analytics and More
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Big Data, Baby Steps
Big Data Technology Stack : Nutshell
Ad

Viewers also liked (20)

PDF
Jubatus talk at HadoopSummit 2013
PPTX
前回のCasual Talkでいただいたご要望に対する進捗状況
PDF
Jubatusハンズオン分散編
PPTX
Video Analysis in Hadoop
PDF
機械学習チュートリアル@Jubatus Casual Talks
PDF
Jubatusをベースにしたオーディエンスの分析エンジンの紹介
PDF
評BanにおけるJubatus活用事例
PDF
Jubatus on Mavericks
PDF
標的型メール対策製品でのJubatus活用事例
PDF
Jubatus 0.6.0 新機能紹介
PPTX
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
PPTX
Jubatusで始める機械学習
PDF
世界征服を目指すJubatusだからこそ期待する5つのポイント
PDF
Jubatus Casual Talks #2 Jubatus開発者入門
PPTX
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
PDF
センサデータ解析におけるJubatus活用事例
PDF
Jubatus分類器の活用テクニック
PDF
Jubatus casulatalks2
PDF
A use case of online machine learning using Jubatus
PPTX
Jubatus使ってみた 作ってみたJubatus
Jubatus talk at HadoopSummit 2013
前回のCasual Talkでいただいたご要望に対する進捗状況
Jubatusハンズオン分散編
Video Analysis in Hadoop
機械学習チュートリアル@Jubatus Casual Talks
Jubatusをベースにしたオーディエンスの分析エンジンの紹介
評BanにおけるJubatus活用事例
Jubatus on Mavericks
標的型メール対策製品でのJubatus活用事例
Jubatus 0.6.0 新機能紹介
Jubatus Casual Talks #2: 大量映像・画像のための異常値検知とクラス分類
Jubatusで始める機械学習
世界征服を目指すJubatusだからこそ期待する5つのポイント
Jubatus Casual Talks #2 Jubatus開発者入門
Jubatus Casual Talks #2 : 0.5.0の新機能(クラスタリング)の紹介
センサデータ解析におけるJubatus活用事例
Jubatus分類器の活用テクニック
Jubatus casulatalks2
A use case of online machine learning using Jubatus
Jubatus使ってみた 作ってみたJubatus
Ad

Similar to Hadoop Turns a Corner and Sees the Future (20)

PPTX
Hadoop Summit Keynote 2014
PDF
BIGDATA ppts
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
ODP
Hadoop introduction
PPTX
Intro to Hadoop
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
PPTX
Big Data and NoSQL for Database and BI Pros
PDF
Introduction To Hadoop Ecosystem
PPTX
Apache hadoop introduction and architecture
PPTX
Fundamentals of big data analytics and Hadoop
PDF
VMUGIT UC 2013 - 08a VMware Hadoop
PPTX
Big data and apache hadoop adoption
PDF
What is hadoop
PPTX
Hadoop and IoT Sinergija 2014
PPTX
Hadoop and IoT Sinergija 2014
PPTX
Architecting Your First Big Data Implementation
PPTX
Big data or big deal
PDF
Hadoop
PPTX
Foxvalley bigdata
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Hadoop Summit Keynote 2014
BIGDATA ppts
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop introduction
Intro to Hadoop
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and NoSQL for Database and BI Pros
Introduction To Hadoop Ecosystem
Apache hadoop introduction and architecture
Fundamentals of big data analytics and Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
Big data and apache hadoop adoption
What is hadoop
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
Architecting Your First Big Data Implementation
Big data or big deal
Hadoop
Foxvalley bigdata
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
MYSQL Presentation for SQL database connectivity
NewMind AI Monthly Chronicles - July 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing

Hadoop Turns a Corner and Sees the Future

  • 2. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity." Merv Adrian Research Vice President, Information Management Twitter: @merv Blogs.gartner.com/merv-adrian Hadoop — Entering Phase Two?
  • 3. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. NEXUS Nexus of Forces Drives Innovation Extreme Networking Pervasive Access Global-Class Delivery "Big," Rich Context
  • 4. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Cameras and microphones widely deployed New routes to market via intelligent objects Content and services via connected products Everything has a URL Remote sensing of objects and environment Augmented reality Situational decision support Building and infrastructure management Over 50% of Internet connections are things: 2011: 15+ billion permanent, 50+ billion intermittent 2020: 30+ billion permanent, >200 billion intermittent Audio GPRS Wi-Fi NFC Higher-resolution display LTE Flash
  • 5. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner Definition of Big Data: High-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Gartner Research Circle 2013 Big Data Survey 687Respondents Worldwide $3.2BMean Company Size 5,100 Mean Employees 60%Mainstream Adopters 18%Focused on Running/Maintaining
  • 6. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Are They Investing? 30% Have 31% No plans at this time 19% Plan to within the next year 15% Plan to within two years 5% Don't know
  • 7. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. How Does That Compare to Last Year? Note — Survey base increased from 473 in 2012 to 687 in 2013 27 15 16 11 30 19 15 31 5 Have invested Within next year Within two years No plans Don't know 20132012 0 10 20 30 40
  • 8. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Things Are Done Differently in Silicon Valley … Traditional IM • Requirements based • Top-down design • Integration and reuse • Technology consolidation • World of DW and ECM • Competence centers • Better decisions • Commercial software "Big Data" Style • Opportunity oriented • Bottom-up experimentation • Immediate use • Tool proliferation • "World of Hadoop" • Hackathons • Better business • Open source
  • 9. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Introducing: The Open-Source Car!
  • 10. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Apache Hadoop is a set of standard open-source software projects that provide a framework for using massive amounts of data across a distributed network The standards steward — Apache Software Foundation — manages and distributes many typical components of "Hadoop" platform Many distributions exist — Built and/or marketed by pure-play specialists or major vendors and they include additional open-source and commercial components
  • 11. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Apache Hadoop is a set of standard open source software projects that provide a framework for using massive amounts of data across a distributed network The standards steward — Apache Software Foundation — manages and distributes many typical components of "Hadoop" platform Many distributions exist — Built and/or marketed by pure play specialists or major vendors and they include additional open source and commercial components
  • 12. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Clients Ask: Which Projects Are "Hadoop"? • Minimum set (from Apache website): - Apache HDFS - Apache MapReduce - Apache Yarn • Other independent Apache projects: Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, ZooKeeper - The virtuous circle of open-source community • Apache Hadoop is version 1.0. Version 2.0, including Yarn, is alpha.
  • 13. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Rich, Complex Set of Functional Choices Ingest/Propagate Persist Describe, Develop Monitor, Administer Analytics, Machine Learning Compute, Search
  • 14. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Ingest/Propagate Apache Flume, Apache Kafka, Apache Sqoop, HDFS NFS, Informatica HParser, DBMS vendor utilities, Talend, WebHDFS Import data into HDFS (or alternatives) • Commercial DBMS, DI or OSS • "Big data" ≠ Hadoop — import is not always required − MapReduce inside DBMSs, HPCC, SAS, Splunk, others Export data into RDBMS (or alternatives) • NoSQL DBMS supported, or offer integration • On same cluster (HBase), even same nodes (Hadapt)
  • 15. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Also included here: "intercept-based" data remediation Develop refers to coding functions, as in Pig, for execution elsewhere, such as MapReduce Metadata (Hive, Hcatalog) describes for other stack components and external ones; e.g., DI and BI tools Describe, Develop Apache Crunch, Apache Hive, Apache Pig, Apache Tika, Cascading, Cloudera Hue, DataFu, Dataguise, IBM Jaql
  • 16. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Runtime execution for programs created to run against HDFS or HBase data With Apache Hadoop 2.0, MapReduce will begin to lose its exclusivity in "the basic stack" with Yarn support MapReduce was first, but others have emerged as additions/ alternatives/supplements Compute, Search Apache Blur, Apache Drill, Apache Giraph, Apache Hama, Apache Lucene, Apache MapReduce, Apache Solr, Cloudera Impala, HP HAVEn, IBM BigSQL, IBM InfoSphere Streams, HStreaming, Pivotal HAWQ, SQLstream, Storm, Teradata SQL-H
  • 17. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. File system: Append only, access methods at OS level Database: Collected and structured to facilitate storage, retrieval, modification, and deletion in online, not only batch, mode Serialized: Format that can be stored in a database, eliminating byte ordering, adding metadata Persist File System: Apache HDFS, IBM GPFS, Lustre, MapR Data Platform Serialization: Apache Avro, RCFile (and ORCFile), SequenceFile, Text, Trevni DBMS: Apache Accumulo, Apache Cassandra, Apache HBase, Google Dremel, Hadapt, HP Vertica, IBM DB2, Kognitio, Oracle, Oracle MySQL, RainStor, Teradata Aster, Teradata, others
  • 18. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. System health and administration Cloud configuration and connection to resources Virtualization and resource management Job management and orchestration Monitor, Administer Apache Ambari, Apache Chukwa, Apache Falcon, Apache Oozie, Apache Whirr, Apache ZooKeeper, Cloudera Manager, Ganglia, Nagios, Pivotal Serengeti
  • 19. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Analytics, Machine Learning Apache Drill, Apache Hive, Apache Mahout, Datameer, IBM Big Sheets, IBM BigSQL, Karmasphere, Microsoft Excel, Platfora, Revolution Analytics RHadoop, SAS, Skytree This is where the future is — it's not just "a part of the stack" but why it exists Machine learning, advanced statistical analysis, scenario modeling "BI for Hadoop": Statistical libraries for use in programs, spreadsheets, reporting, visualization tools
  • 20. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Go Ahead — Pick the Pieces You Need Ingest/Propagate Persist Describe, Develop Monitor, Administer Analytics, Machine Learning Compute, Search
  • 21. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Distribution Vendors Sort It Out for You Megavendors: Amazon, EMC Pivotal, IBM, Intel Megapartners: Dell, HP, NetApp, Microsoft, Oracle, Teradata Leading pure plays: Cloudera, Hortonworks, MapR Others: Datastax, LucidWorks, RainStor, Sqrrl, WANdisco, Zettaset
  • 22. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Hadoop's Great Leap Forward Hadoop has moved to the next stage with Apache Hadoop 2.0. • Mainstream vendors are all interested, contributing and adding value • Skills development is ramping rapidly From To Single-stack Yarn-based multistyle environment, supporting multiple engines Batch-only, file-based stack Interactive capabilities with multiple optional databases SQL translation with Hive "SQL in front of Hadoop": Cloudera Impala, IBM Big SQL, Pivotal Hawq, Platfora, others Relatively unmanaged Ambari-based beginnings of real management
  • 23. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. What's Next? Search Advanced prebuilt analytic functions Cluster, appliance or cloud? Virtualization Graph processing
  • 24. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. What's Still Needed? Security Data Warehousing Tools Governance Distributed Optimization Subproject Optimization Skills
  • 25. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. By 2015, big data demand will reach 4.4 million jobs worldwide, but only one-third of those jobs will be filled. 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 Americas EMEA APJ Education Wholesale Trade Healthcare Providers Transportation Utilities Retail Insurance Communications, Media & Services Government Banking & Securities Manufacturing & Natural Resources
  • 26. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Recommendations  Audit your data — find "dark data" and map it to business opportunities to identify pilot projects  Familiarize yourself with the capabilities of available Hadoop distributions  Build skills and recruit within the organization from early experimenters for a data science lab  Consider cloud pilots to minimize capital expenditure
  • 27. © 2013 Gartner, Inc. and/or its affiliates. All rights reserved. Thank you! http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/

Editor's Notes

  • #25: Economic Disruption: The Growth of Data