SlideShare a Scribd company logo
Vanilla Hadoop vs. the RestVanilla Hadoop vs. the Rest
Viet-Trung TranViet-Trung Tran
Why HadoopWhy Hadoop
2012,2012,
Worldwide Hadoop-MapReduce Ecosystem Software 2012-2016Worldwide Hadoop-MapReduce Ecosystem Software 2012-2016
(IDC #234294) forecasts Hadoop ecosystem win be worth(IDC #234294) forecasts Hadoop ecosystem win be worth
of 813 millions by 2016of 813 millions by 2016
01.2013, IDC predicts a big data market that will grow01.2013, IDC predicts a big data market that will grow
revenue at 31.7 percent a year until it hits the $23.8 billionrevenue at 31.7 percent a year until it hits the $23.8 billion
mark in 2016mark in 2016
The majority of Fortune 500 companies are at least experimentingThe majority of Fortune 500 companies are at least experimenting
..
Hadoop in mainstreamHadoop in mainstream
production appsproduction apps
Companies are using Hadoop toCompanies are using Hadoop to off load data warehouse-bound data.off load data warehouse-bound data. Hadoop,onHadoop,on
average, provides at least a 10x cost savings over data warehouse solutionsaverage, provides at least a 10x cost savings over data warehouse solutions
Financial institutions are using HadoopFinancial institutions are using Hadoop as a critical part of their securityas a critical part of their security
architecture — to predict phishing behavior and payments fraud in real time andarchitecture — to predict phishing behavior and payments fraud in real time and
minimize their impact. They hold on to data for longer periods and run moreminimize their impact. They hold on to data for longer periods and run more
detailed analytics and forensics.detailed analytics and forensics.
An online advertising companyAn online advertising company provides real-time trading technology to its usersprovides real-time trading technology to its users
and relies on Hadoop to store and analyze petabytes worth of data. 90 billionand relies on Hadoop to store and analyze petabytes worth of data. 90 billion
realtime ad auctions are processed each day on their Hadoop distribution.realtime ad auctions are processed each day on their Hadoop distribution.
A digital marketing intelligenceA digital marketing intelligence provider uses Hadoop to process over 1.7 trillionprovider uses Hadoop to process over 1.7 trillion
Internet and mobile records per month providing syndicated and custom digitalInternet and mobile records per month providing syndicated and custom digital
marketing intelligence.marketing intelligence.
201201
22
20142014
Vanilla Hadoop ecosystemVanilla Hadoop ecosystem
Hortonworks data flatformHortonworks data flatform
Cloudera enterprise data hubCloudera enterprise data hub
Vanilla Hadoop vs. the rest
Hadoop ecosystem: MicrosoftHadoop ecosystem: Microsoft
HDinsightHDinsight
MapRMapR
Google trendsGoogle trends
Google trends [2]Google trends [2]
Choice of Hadoop distributionChoice of Hadoop distribution
Vanilla HadoopVanilla Hadoop
Opensource Hadoop + SupportOpensource Hadoop + Support
Opensource Hadoop + Support +Opensource Hadoop + Support +
proprietary improvedproprietary improved
managementsmanagements
Opensouce Hadoop + Support +Opensouce Hadoop + Support +
Proprietary architecturalProprietary architectural
ImprovementsImprovements
Big data begs a big question:Big data begs a big question:
does Hadoop replace yourdoes Hadoop replace your
enterprise data? warehouse orenterprise data? warehouse or
augment it?augment it?Cloudera: RevolutionCloudera: Revolution
Hadoop first vendorHadoop first vendor
Introducing the Enterprise Data Hub in which Hadoop replaces theIntroducing the Enterprise Data Hub in which Hadoop replaces the
data warehousedata warehouse
+ Commercial software+ Commercial software
Hortonworks: EvolutionHortonworks: Evolution
Partnering with leading commercial data management and analyticsPartnering with leading commercial data management and analytics
vendorsvendors
Opensource puristOpensource purist
"Increasingly, our customers are not viewing the relevant"Increasingly, our customers are not viewing the relevant
comparison as Cloudera versus Hortonworks,"comparison as Cloudera versus Hortonworks,"
"They're viewing it as Cloudera versus Hortonworks plus"They're viewing it as Cloudera versus Hortonworks plus
Teradata Aster, or, if you're talking to an IBM shop, ClouderaTeradata Aster, or, if you're talking to an IBM shop, Cloudera
versus IBM BigInsights plus Netezza."versus IBM BigInsights plus Netezza."
Cloudera director of product marketingCloudera director of product marketing
Cloudera vs. HortonworksCloudera vs. Hortonworks
business modelbusiness model
First mover momentumFirst mover momentum
The old “Nobody got fired for buying IBM” routineThe old “Nobody got fired for buying IBM” routine
03.2014, Intel ditchs its-own distro, invest $740 millions to buy 18% Cloudera03.2014, Intel ditchs its-own distro, invest $740 millions to buy 18% Cloudera
Hortonworks Wants To Own Big Data Without Owning AnythingHortonworks Wants To Own Big Data Without Owning Anything
““History repeats itself”History repeats itself”
““This is Red Hat versus Oracle and IBM”This is Red Hat versus Oracle and IBM”
““We are focused on not moving up the stack”We are focused on not moving up the stack”
““not stepping on the toes of anyone with the capacity to crush us”not stepping on the toes of anyone with the capacity to crush us”
Teradata, for instance, finds itself reselling Hortonworks’ cheaper product against itsTeradata, for instance, finds itself reselling Hortonworks’ cheaper product against its
own higher margin ones – a relationship that may not be built to lastown higher margin ones – a relationship that may not be built to last
MacOS/SUSE vs. RedHatMacOS/SUSE vs. RedHat
Business model?Business model?
Hortonworks to have steady,Hortonworks to have steady,
long-term growthlong-term growth
Red Hat win? CommunityRed Hat win? Community
Red Hat contributes more toRed Hat contributes more to
the Linux kernel than anythe Linux kernel than any
single individual or company.single individual or company.
Red Hat attractsRed Hat attracts
"professional developer""professional developer"
The platform with the biggestThe platform with the biggest
community wins.community wins.
Hadoop core contributorsHadoop core contributors
20122012
http://hadoop.apache.org/who.htmlhttp://hadoop.apache.org/who.html
Spark the disrupterSpark the disrupter
Cloudera impala vs Hortonworks Stinger vs. SparkSQLCloudera impala vs Hortonworks Stinger vs. SparkSQL
08.2014, Hortonworks: A shared vision for Apache Spark on08.2014, Hortonworks: A shared vision for Apache Spark on
HadoopHadoop
Software matureSoftware mature
Cloudera manager vs. Hortonworks ambariCloudera manager vs. Hortonworks ambari
ThreateningThreatening
Vendor lock-in?Vendor lock-in?
As of business ultimate goalAs of business ultimate goal
Cloudera or XYZ (Microsoft, Oracle) buy HortonworksCloudera or XYZ (Microsoft, Oracle) buy Hortonworks
vs. Vanilla Hadoopvs. Vanilla Hadoop
Apache BigtopApache Bigtop
Are there any reasons for using vendor specificAre there any reasons for using vendor specific
Hadoop distributions like Cloudera/HortonworksHadoop distributions like Cloudera/Hortonworks
instead of vanilla Apache Hadoop if I'm not usinginstead of vanilla Apache Hadoop if I'm not using
their support services?their support services?
Ask yourself, is your business to manage trillions of data objects, analyzeAsk yourself, is your business to manage trillions of data objects, analyze
customer behaviour, device behavior or other analytic tasks, to find thatcustomer behaviour, device behavior or other analytic tasks, to find that
strategic advantage, prevent fraud, prevent failure, and improvestrategic advantage, prevent fraud, prevent failure, and improve
customer satisfaction?customer satisfaction?
Or is your mission build and maintain dozens and dozens of openOr is your mission build and maintain dozens and dozens of open
source components, troubleshoot arcane bugs and to answer urgentsource components, troubleshoot arcane bugs and to answer urgent
questions at 2am?questions at 2am?
Which is higher value to you and your organization?Which is higher value to you and your organization?
Will your organization derive more benefit from you writing that one keyWill your organization derive more benefit from you writing that one key
Hive Query, or from distributing packages across a cluster of machines?Hive Query, or from distributing packages across a cluster of machines?
Based on the alliances that Hortonworks and Cloudera haveBased on the alliances that Hortonworks and Cloudera have
achieved, it makes sense to use their packages to insureachieved, it makes sense to use their packages to insure
compatibility and integration with those tools. Cloudera forcompatibility and integration with those tools. Cloudera for
Oracle and Hortonworks for Teradata and SAS to name aOracle and Hortonworks for Teradata and SAS to name a
few. This is by no means saying that you can't use otherfew. This is by no means saying that you can't use other
distributions for those integrations, but the alliance ensures adistributions for those integrations, but the alliance ensures a
compatibility and integration testing you won't find withcompatibility and integration testing you won't find with
straight open sourcestraight open source
DiscussionDiscussion
Performance and ScalabilityPerformance and Scalability
DependabilityDependability
ManageabilityManageability
Data AccessData Access
Vanilla Hadoop vs. the rest
ReferencesReferences
http://www.quora.com/Which-is-the-best-distribution-of-Hadoop-Is-Cloudera-the-clear-leader-in-thishttp://www.quora.com/Which-is-the-best-distribution-of-Hadoop-Is-Cloudera-the-clear-leader-in-this
http://www.forbes.com/sites/danwoods/2014/06/30/why-google-capital-placed-its-hadoop-bet-on-http://www.forbes.com/sites/danwoods/2014/06/30/why-google-capital-placed-its-hadoop-bet-on-m
http://www.b-eye-network.com/blogs/eckerson/archives/2014/02/the_battle_for.phphttp://www.b-eye-network.com/blogs/eckerson/archives/2014/02/the_battle_for.php
http://mail-archives.apache.org/mod_mbox/hadoop-user/201309.mbox/%3CCADPi3fjVM4NopC6RNhttp://mail-archives.apache.org/mod_mbox/hadoop-user/201309.mbox/%3CCADPi3fjVM4NopC6RN
http://www.informationweek.com/big-data/software-platforms/cloudera-trash-talks-with-http://www.informationweek.com/big-data/software-platforms/cloudera-trash-talks-with-
enterprise-data-hub-release/d/d-id/1113677enterprise-data-hub-release/d/d-id/1113677
http://siliconangle.com/blog/2014/06/04/hadoops-horse-race-will-cloudera-hortonworks-go-the-disthttp://siliconangle.com/blog/2014/06/04/hadoops-horse-race-will-cloudera-hortonworks-go-the-dist
http://www.theregister.co.uk/2012/08/17/community_hadoop/http://www.theregister.co.uk/2012/08/17/community_hadoop/
https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-futurhttps://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future
http://www.networkworld.com/article/2369327/software/comparing-the-top-hadohttp://www.networkworld.com/article/2369327/software/comparing-the-top-hadoo
http://www.forbes.com/sites/danwoods/2013/12/16/a-quick-guide-to-choosing-thehttp://www.forbes.com/sites/danwoods/2013/12/16/a-quick-guide-to-choosing-the
http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/
http://www.quora.com/Are-there-any-reasons-for-using-vendor-specific-Hadoop-dhttp://www.quora.com/Are-there-any-reasons-for-using-vendor-specific-Hadoop-d
Team membersTeam members
HortonworksHortonworks
Hortonworks (NASDAQ:HDP) opened at $24 and is now atHortonworks (NASDAQ:HDP) opened at $24 and is now at
$24.13, up 50.8%from its $16 IPO price.$24.13, up 50.8%from its $16 IPO price.
Hortonworks, one of the two most prominent developersHortonworks, one of the two most prominent developers
(along with Intel-backed Cloudera) of software distributions(along with Intel-backed Cloudera) of software distributions
for the Hadoop big data framework, is now worth just overfor the Hadoop big data framework, is now worth just over
$1B, or ~15x gross billings from the 12 months ending Sep.$1B, or ~15x gross billings from the 12 months ending Sep.
30.30.
Cloudera: CustomersCloudera: Customers
100 clients: AMD, Ebay, Western Union100 clients: AMD, Ebay, Western Union
TechnologyTechnology
Retail ecommerceRetail ecommerce
HealthcareHealthcare
EnergyEnergy
TelecommunicationTelecommunication
Financial servicesFinancial services
263 partners: Teradata (data warehouse), Microsoft (Azure cloud service), Intel263 partners: Teradata (data warehouse), Microsoft (Azure cloud service), Intel
Analytics & Business IntelligenceAnalytics & Business Intelligence
ApplicationsApplications
DatabaseDatabase
Data integrationData integration
CloudCloud
VirtualizationVirtualization
SecuritySecurity

More Related Content

PDF
Red Hat - Corporate Presentation
PPTX
Introduction to Hyper-V
PPTX
Compression Options in Hadoop - A Tale of Tradeoffs
PDF
AWS Serverless Introduction (Lambda)
PPTX
Cloud computing by Google Cloud Platform - Presentation
PDF
Modern Data Center Network Architecture - The house that Clos built
PDF
Bare-Metal Hypervisor as a Platform for Innovation
PPTX
Red Hat - Corporate Presentation
Introduction to Hyper-V
Compression Options in Hadoop - A Tale of Tradeoffs
AWS Serverless Introduction (Lambda)
Cloud computing by Google Cloud Platform - Presentation
Modern Data Center Network Architecture - The house that Clos built
Bare-Metal Hypervisor as a Platform for Innovation

What's hot (20)

PPTX
Cloudera Hadoop Distribution
PPTX
VMware Workstation
PPT
Alfresco
PPTX
Red hat enterprise linux 7 (rhel 7)
PPTX
Hive, Presto, and Spark on TPC-DS benchmark
PPTX
Aws certification ppt
PPTX
Linux file system
PPTX
Google Cloud Fundamentals by CloudZone
PPTX
VMware Advance Troubleshooting Workshop - Day 2
PDF
Aws concepts-power-point-slides
PDF
OpenText Archive Server on Azure
PPTX
Azure Storage
PPTX
VMware vSphere technical presentation
PPT
Hive Training -- Motivations and Real World Use Cases
PPTX
Azure Storage Services - Part 01
PPTX
Hci solution with VxRail
PDF
Serverless computing
PDF
PPT
Hadoop hive presentation
PDF
Network Automation Journey, A systems engineer NetOps perspective
Cloudera Hadoop Distribution
VMware Workstation
Alfresco
Red hat enterprise linux 7 (rhel 7)
Hive, Presto, and Spark on TPC-DS benchmark
Aws certification ppt
Linux file system
Google Cloud Fundamentals by CloudZone
VMware Advance Troubleshooting Workshop - Day 2
Aws concepts-power-point-slides
OpenText Archive Server on Azure
Azure Storage
VMware vSphere technical presentation
Hive Training -- Motivations and Real World Use Cases
Azure Storage Services - Part 01
Hci solution with VxRail
Serverless computing
Hadoop hive presentation
Network Automation Journey, A systems engineer NetOps perspective
Ad

Viewers also liked (6)

PPTX
Hadoop on osx
PPTX
Big Data Benchmarking with RDMA solutions
PDF
Overview - IBM Big Data Platform
PDF
Hardware Startups: The VC Perspective
PDF
General Tips for participating Kaggle Competitions
PDF
Build Features, Not Apps
Hadoop on osx
Big Data Benchmarking with RDMA solutions
Overview - IBM Big Data Platform
Hardware Startups: The VC Perspective
General Tips for participating Kaggle Competitions
Build Features, Not Apps
Ad

Similar to Vanilla Hadoop vs. the rest (20)

PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PPTX
Deutsche Telekom on Big Data
PPTX
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PPT
Cloudera's Original Pitch Deck from 2008
PDF
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
PPTX
Why Hadoop as a Service?
PPTX
巨量資料入門 The evolution of data architecture
PPTX
Hortonworks for Financial Analysts Presentation
PPTX
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
PPTX
Integrating Hadoop Into the Enterprise
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PPTX
Simplifying and Future-Proofing Hadoop
PDF
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PPTX
Turning Data into Business Value with a Modern Data Platform
PDF
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
PDF
Hadoop Perspectives for 2017
Intel and Cloudera: Accelerating Enterprise Big Data Success
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Deutsche Telekom on Big Data
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera's Original Pitch Deck from 2008
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
Why Hadoop as a Service?
巨量資料入門 The evolution of data architecture
Hortonworks for Financial Analysts Presentation
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Simplifying and Future-Proofing Hadoop
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Turning Data into Business Value with a Modern Data Platform
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
Hadoop Perspectives for 2017

More from Viet-Trung TRAN (20)

PDF
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
PDF
Dynamo: Amazon’s Highly Available Key-value Store
PDF
Pregel: Hệ thống xử lý đồ thị lớn
PDF
Mapreduce simplified-data-processing
PDF
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
PPTX
giasan.vn real-estate analytics: a Vietnam case study
PDF
Giasan.vn @rstars
PDF
A Vietnamese Language Model Based on Recurrent Neural Network
PDF
A Vietnamese Language Model Based on Recurrent Neural Network
PPTX
Large-Scale Geographically Weighted Regression on Spark
PDF
Recent progress on distributing deep learning
PDF
success factors for project proposals
PDF
GPSinsights poster
PPTX
OCR processing with deep learning: Apply to Vietnamese documents
PDF
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
PDF
Deep learning for nlp
PDF
Introduction to BigData @TCTK2015
PDF
From neural networks to deep learning
PDF
From decision trees to random forests
PPTX
Recommender systems: Content-based and collaborative filtering
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Dynamo: Amazon’s Highly Available Key-value Store
Pregel: Hệ thống xử lý đồ thị lớn
Mapreduce simplified-data-processing
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
giasan.vn real-estate analytics: a Vietnam case study
Giasan.vn @rstars
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Large-Scale Geographically Weighted Regression on Spark
Recent progress on distributing deep learning
success factors for project proposals
GPSinsights poster
OCR processing with deep learning: Apply to Vietnamese documents
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Deep learning for nlp
Introduction to BigData @TCTK2015
From neural networks to deep learning
From decision trees to random forests
Recommender systems: Content-based and collaborative filtering

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Modernizing your data center with Dell and AMD
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Modernizing your data center with Dell and AMD
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Spectral efficient network and resource selection model in 5G networks
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Vanilla Hadoop vs. the rest