SlideShare a Scribd company logo
Hadoop – Looking to the Future
Arun C. Murthy
Hortonworks Co-Founder
@acmurthy
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ACTIONABLE
INTELLIGENCE
INTERNET
OF
ANYTHING
Hortonworks
Connected Data Platforms
that power modern data apps
with actionable intelligence
from ALL data.
PERISHABLE
INSIGHTS
HISTORICAL
INSIGHTS
HDP for
DATA AT REST
HDF for
DATA IN MOTION
MODERN DATA APPS
Looking Back…
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
1	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
MapReduce	
  
Largely	
  Batch	
  Processing	
  
2006
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop	
  w/	
  MapReduce
Traditional Hadoop allowed early adopters to
deal with data at scale however…
•  Single purpose clusters, specific data sets
•  Primarily a batch system using MapReduce
•  Difficult to natively integrate existing applications
•  Limited enterprise capabilities:
Operations, Security & Governance
In the beginning…
20092006
1	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
MapReduce	
  
Largely	
  Batch	
  Processing	
  
Hadoop	
  w/	
  MapReduce
MAPREDUCE-­‐279
Common data,

multiple applications
•  Support multi-tenant cluster
•  Batch, interactive & real-time 

use cases can leverage the 

most appropriate engine
Architectural Center
•  Consistent security, 

governance & operations
•  Ecosystem applications 

run natively in Hadoop
Apache Hadoop 2.0 & YARN
October 23, 2013
YARN : Data Operating System
1
 °
 °
 °
 °
 °
 °
 °
 °
 °
°
 °
 °
 °
 °
 °
 °
 °
 °
°
°
N
HDFS 

(Hadoop Distributed File System)
Batch
 Interactive
 Real-Time
YARN : Data Operating System
1
 °
 °
 °
 °
 °
 °
 °
 °
 °
°
 °
 °
 °
 °
 °
 °
 °
 °
°
°
N
HDFS 

(Hadoop Distributed File System)
Legacy
MapReduce
Interactive SQL
Apache Tez
Other Engines 
& Workloads
Apache Hive
SQL
Business Analytics
 Custom Apps
Apache Hive and the Power of YARN
Stinger Initiative

Next generation SQL based 

interactive query in Hadoop

Speed
Performance increased 100x for
interactive & batch use cases
Scale
Queries from GBs,
to TBs to PBs 
SQL
Broadest range of SQL
semantics
Apache Hive Community
1,672
Jira Tickets Closed
145
Developers
44
Companies
~390,000
Lines Of Code Added… (2x)
13
Months
Hive
13
Hive
12
Hive
10
Dramatically 

faster queries 

speeds time 

to insight
seconds
thousands 

of seconds
YARN : Data Operating System
1
 °
 °
 °
 °
 °
 °
 °
 °
 °
°
 °
 °
 °
 °
 °
 °
 °
 °
°
°
N
HDFS 

(Hadoop Distributed File System)
Legacy
MapReduce
Interactive SQL
Apache Tez
Other Engines 
& Workloads
Apache Hive
SQL
Business Analytics
 Custom Apps
Apache Hive – Interactive SQL in Hadoop
Stinger

Next generation SQL based 

interactive query in Hadoop

ORC
IO Improvements
Efficient processing via
complex pushdown
Tez
Powerful primitives for
the SQL Planner
VQP
Efficient CPU utilization in
Inner Loop
Sub-Second SQL with Hive LLAP
Stinger.Next

Sub-second SQL in Hadoop via Hive/
LLAP

CBO
The “right” plan executed
violently…
LLAP
Metastore
Extensive stats &
scalability
YARN : Data Operating System
1
 °
 °
 °
 °
 °
 °
 °
 °
 °
°
 °
 °
 °
 °
 °
 °
 °
 °
°
°
N
HDFS 

(Hadoop Distributed File System)
LLAP
 Apache Tez
Other Engines 
& Workloads
Apache Hive
Sub-second SQL
Business Analytics
 Custom Apps
Long-lived daemon for low-
latency startup, caching & CPU
efficiency via JIT
© Hortonworks Inc. 2015. All Rights Reserved
Apache Atlas - Data Governance Initiative
Requirements
1.  Hadoop must snap in to the
existing frameworks and
openly exchange metadata
2.  Hadoop must address
governance within its own
stack of technologies
Engineers from a group of companies dedicated
to meeting these requirements in the open
New Apache
project proposal
Knowledge Store
Audit Store (Ranger)
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
(Falcon)
Real-time Tag-based Access Control (Ranger)
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
© Hortonworks Inc. 2015. All Rights Reserved
HDFS – Tiered Storage
© Hortonworks Inc. 2015. All Rights Reserved
Looking Ahead…
© Hortonworks Inc. 2015. All Rights Reserved
Data Trends
Internet of Anything
Aggregate any and all IoAT
data from sensors,
machines, geolocation,
clicks, files, social
Mediate secure point-to-
point and bi-directional data
flows
© Hortonworks Inc. 2015. All Rights Reserved
14	
  
	
  	
  
Data Trends
It is cheap to create, collect
and curate all data
© Hortonworks Inc. 2015. All Rights Reserved
	
  	
  
15	
  
Data Trends
ContainerizaCon	
  
	
  
	
  
Virtualization
© Hortonworks Inc. 2015. All Rights Reserved
16	
  
	
  	
  
Data Trends
Modern Applications Are Data Applications
Easy to Consume
& Operate
Secure Repeatable
© Hortonworks Inc. 2015. All Rights Reserved
	
  	
  
Assemble	
  Modern	
  Data	
  ApplicaHons
YARN.NEXT
Assemble
Select Engines & Services Wire Secure & Operate
Service
Container
Container
Container
Data
Service
Data
Service
Service
Engine
Service
Data
Service
Service
Container
Engine Data
Security Admin
Container
© Hortonworks Inc. 2015. All Rights Reserved
Real-Time Cyber Security
It’s not just how quickly or how much
data you ingest – it’s about ingesting
and enriching data in real-time in order
to provide actionable intelligence that
stop cyber threats…
© Hortonworks Inc. 2015. All Rights Reserved
Ex. Real-Time Cyber Security with Hortonworks
Raw Network Stream
Network Metadata Stream
Data Stores
Syslog
Raw Application Logs
Other Streaming Telemetry
ParseandFormat
Enrich
Persist
Applications and Analyst Tools
(ex. Zeppelin on Spark)
Log Mining and
Analysis
Network Packet
Mining and
PCAP
Reconstruction
Big Data
Exploration,
Predictive
Modeling
SOLR HBase Hive
Threat Intelligence Feeds
Enrichment Data
Real-Time
Index
Raw Packet
Store
Long-Term
Store
Deliver actionable insights from real-time and historical network threat alerts
Key components:
NiFi -> Kafka -> Storm
Thank You
@acmurthy

More Related Content

PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
PPTX
Time-oriented event search. A new level of scale
PDF
What's new in SQL on Hadoop and Beyond
PDF
Spark meetup - Zoomdata Streaming
PDF
Exponea - Kafka and Hadoop as components of architecture
PPTX
Embeddable data transformation for real time streams
PPTX
Active Learning for Fraud Prevention
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Time-oriented event search. A new level of scale
What's new in SQL on Hadoop and Beyond
Spark meetup - Zoomdata Streaming
Exponea - Kafka and Hadoop as components of architecture
Embeddable data transformation for real time streams
Active Learning for Fraud Prevention

What's hot (20)

PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PPTX
Splice Machine Overview
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
PPTX
Lambda-less Stream Processing @Scale in LinkedIn
PPTX
Preventative Maintenance of Robots in Automotive Industry
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PDF
Apache Flink & Kudu: a connector to develop Kappa architectures
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
Realizing the promise of portable data processing with Apache Beam
PPTX
Hadoop and HBase @eBay
PDF
Spark Uber Development Kit
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
PDF
Big Telco - Yousun Jeong
PPTX
Solr + Hadoop: Interactive Search for Hadoop
PPTX
Analysis of Major Trends in Big Data Analytics
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PPTX
Securing Spark Applications
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Splice Machine Overview
Unified, Efficient, and Portable Data Processing with Apache Beam
Real time fraud detection at 1+M scale on hadoop stack
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Lambda-less Stream Processing @Scale in LinkedIn
Preventative Maintenance of Robots in Automotive Industry
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Apache Flink & Kudu: a connector to develop Kappa architectures
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Realizing the promise of portable data processing with Apache Beam
Hadoop and HBase @eBay
Spark Uber Development Kit
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Big Telco - Yousun Jeong
Solr + Hadoop: Interactive Search for Hadoop
Analysis of Major Trends in Big Data Analytics
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Securing Spark Applications
Ad

Viewers also liked (18)

PDF
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
PDF
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
PDF
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
PPTX
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
PDF
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
PDF
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
PPTX
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
PDF
Data science and good questions eric kostello
PPTX
Tugas tik ke dua
PPTX
Evaluation question 3
PDF
Sentència del Tribunal Suprem de 3/2/2017
PPTX
Työttömyysvakuutus ei vakuuta kaikkia (ETK Onnela 150522)
PPTX
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
DOCX
Resistividad Laboratorio Física II
PPTX
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
PDF
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
PDF
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
Data science and good questions eric kostello
Tugas tik ke dua
Evaluation question 3
Sentència del Tribunal Suprem de 3/2/2017
Työttömyysvakuutus ei vakuuta kaikkia (ETK Onnela 150522)
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Resistividad Laboratorio Física II
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Ad

Similar to The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks (20)

PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PPTX
Cloud Austin Meetup - Hadoop like a champion
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PDF
Introduction to Hadoop
PPTX
Realtime analytics + hadoop 2.0
PPTX
Realtime Analytics in Hadoop
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
PDF
Hortonworks and Red Hat Webinar - Part 2
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
PDF
Discover hdp 2.2 hdfs - final
PPTX
Hadoop - Looking to the Future By Arun Murthy
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
PDF
Discover.hdp2.2.ambari.final[1]
PDF
Azure Cafe Marketplace with Hortonworks March 31 2016
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
PDF
Discover.hdp2.2.h base.final[2]
Supporting Financial Services with a More Flexible Approach to Big Data
Cloud Austin Meetup - Hadoop like a champion
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Introduction to Hadoop
Realtime analytics + hadoop 2.0
Realtime Analytics in Hadoop
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks and Red Hat Webinar - Part 2
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2 hdfs - final
Hadoop - Looking to the Future By Arun Murthy
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover.hdp2.2.ambari.final[1]
Azure Cafe Marketplace with Hortonworks March 31 2016
Building a Modern Data Architecture with Enterprise Hadoop
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover.hdp2.2.h base.final[2]

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
CIFDAQ's Market Insight: SEC Turns Pro Crypto
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
GamePlan Trading System Review: Professional Trader's Honest Take
Diabetes mellitus diagnosis method based random forest with bat algorithm
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology

The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks

  • 1. Hadoop – Looking to the Future Arun C. Murthy Hortonworks Co-Founder @acmurthy
  • 2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ACTIONABLE INTELLIGENCE INTERNET OF ANYTHING Hortonworks Connected Data Platforms that power modern data apps with actionable intelligence from ALL data. PERISHABLE INSIGHTS HISTORICAL INSIGHTS HDP for DATA AT REST HDF for DATA IN MOTION MODERN DATA APPS
  • 5. 1   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   MapReduce   Largely  Batch  Processing   2006 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop  w/  MapReduce Traditional Hadoop allowed early adopters to deal with data at scale however… •  Single purpose clusters, specific data sets •  Primarily a batch system using MapReduce •  Difficult to natively integrate existing applications •  Limited enterprise capabilities: Operations, Security & Governance In the beginning…
  • 6. 20092006 1   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   MapReduce   Largely  Batch  Processing   Hadoop  w/  MapReduce MAPREDUCE-­‐279 Common data,
 multiple applications •  Support multi-tenant cluster •  Batch, interactive & real-time 
 use cases can leverage the 
 most appropriate engine Architectural Center •  Consistent security, 
 governance & operations •  Ecosystem applications 
 run natively in Hadoop Apache Hadoop 2.0 & YARN October 23, 2013 YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS 
 (Hadoop Distributed File System) Batch Interactive Real-Time
  • 7. YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS 
 (Hadoop Distributed File System) Legacy MapReduce Interactive SQL Apache Tez Other Engines & Workloads Apache Hive SQL Business Analytics Custom Apps Apache Hive and the Power of YARN Stinger Initiative
 Next generation SQL based 
 interactive query in Hadoop Speed Performance increased 100x for interactive & batch use cases Scale Queries from GBs, to TBs to PBs SQL Broadest range of SQL semantics Apache Hive Community 1,672 Jira Tickets Closed 145 Developers 44 Companies ~390,000 Lines Of Code Added… (2x) 13 Months Hive 13 Hive 12 Hive 10 Dramatically 
 faster queries 
 speeds time 
 to insight seconds thousands 
 of seconds
  • 8. YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS 
 (Hadoop Distributed File System) Legacy MapReduce Interactive SQL Apache Tez Other Engines & Workloads Apache Hive SQL Business Analytics Custom Apps Apache Hive – Interactive SQL in Hadoop Stinger
 Next generation SQL based 
 interactive query in Hadoop ORC IO Improvements Efficient processing via complex pushdown Tez Powerful primitives for the SQL Planner VQP Efficient CPU utilization in Inner Loop
  • 9. Sub-Second SQL with Hive LLAP Stinger.Next
 Sub-second SQL in Hadoop via Hive/ LLAP CBO The “right” plan executed violently… LLAP Metastore Extensive stats & scalability YARN : Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS 
 (Hadoop Distributed File System) LLAP Apache Tez Other Engines & Workloads Apache Hive Sub-second SQL Business Analytics Custom Apps Long-lived daemon for low- latency startup, caching & CPU efficiency via JIT
  • 10. © Hortonworks Inc. 2015. All Rights Reserved Apache Atlas - Data Governance Initiative Requirements 1.  Hadoop must snap in to the existing frameworks and openly exchange metadata 2.  Hadoop must address governance within its own stack of technologies Engineers from a group of companies dedicated to meeting these requirements in the open New Apache project proposal Knowledge Store Audit Store (Ranger) ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management (Falcon) Real-time Tag-based Access Control (Ranger) REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 11. © Hortonworks Inc. 2015. All Rights Reserved HDFS – Tiered Storage
  • 12. © Hortonworks Inc. 2015. All Rights Reserved Looking Ahead…
  • 13. © Hortonworks Inc. 2015. All Rights Reserved Data Trends Internet of Anything Aggregate any and all IoAT data from sensors, machines, geolocation, clicks, files, social Mediate secure point-to- point and bi-directional data flows
  • 14. © Hortonworks Inc. 2015. All Rights Reserved 14       Data Trends It is cheap to create, collect and curate all data
  • 15. © Hortonworks Inc. 2015. All Rights Reserved     15   Data Trends ContainerizaCon       Virtualization
  • 16. © Hortonworks Inc. 2015. All Rights Reserved 16       Data Trends Modern Applications Are Data Applications Easy to Consume & Operate Secure Repeatable
  • 17. © Hortonworks Inc. 2015. All Rights Reserved     Assemble  Modern  Data  ApplicaHons YARN.NEXT Assemble Select Engines & Services Wire Secure & Operate Service Container Container Container Data Service Data Service Service Engine Service Data Service Service Container Engine Data Security Admin Container
  • 18. © Hortonworks Inc. 2015. All Rights Reserved Real-Time Cyber Security It’s not just how quickly or how much data you ingest – it’s about ingesting and enriching data in real-time in order to provide actionable intelligence that stop cyber threats…
  • 19. © Hortonworks Inc. 2015. All Rights Reserved Ex. Real-Time Cyber Security with Hortonworks Raw Network Stream Network Metadata Stream Data Stores Syslog Raw Application Logs Other Streaming Telemetry ParseandFormat Enrich Persist Applications and Analyst Tools (ex. Zeppelin on Spark) Log Mining and Analysis Network Packet Mining and PCAP Reconstruction Big Data Exploration, Predictive Modeling SOLR HBase Hive Threat Intelligence Feeds Enrichment Data Real-Time Index Raw Packet Store Long-Term Store Deliver actionable insights from real-time and historical network threat alerts Key components: NiFi -> Kafka -> Storm