SlideShare a Scribd company logo
Vinod Kumar Vavilapalli
Apache Hadoop PMC, Co-founder of YARN project
Hortonworks Inc
A Multi-Colored YARN
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
 Apache Hadoop PMC, ASF Member
 9 years of only Hadoop
– Finally the job-adverts asking for “10 years of Hadoop experience” have validity
 ’Rewritten’ the Hadoop processing side – Became Apache Hadoop YARN
 With me today
– Billie Rinaldi: VP Apache Accumulo, Apache Slider PMC, ASF Member
– Jayush Luniya: Apache Ambari PMC
– Vadim Vaks: Kickass field guy (Sr. Solutions Architect)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
 Layers that enable applications and higher order
frameworks
 It’s all about data!
 Still a single colored yarn
 Apache Hadoop YARN pretty good at jobs, queries,
short running apps
– We will continue doing this
 Admins and admin tools (Ambari) takes care of
statically provisioned services
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
Platform Services
Storage
Resource
Management Security
Management
Monitoring
Alerts
Governance
MR Tez Spark …
 Run everything in a single secure, multi-
tenant, elastic Hadoop YARN cluster
– An ongoing journey
 Adding new ‘stuff’ to this stack is an
involved effort
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evolution of user focus
 A need for reuse, composition and to keep building ‘upwards’
 Applications & services & more complex combinations - Assembly
IOT ApplicationsApache Metron
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IOT ApplicationsApache Metron
• Simplified deployment of an assembly
– Ready to go packages
– Discovery
– Resource/capacity planning
• Management / monitoring / metrics of assemblies!
– “Start / stop” my business app end-to-end
– “Tell me what’s happening with my business application”
– “I don’t care whether HBase RegionServer is down or not, is my assembly healthy?”
• Scale up/down the entire app!
– “I got more input coming in, I don’t care how you scale individual pieces, but do scale the entire machinery”
Emerging needs of the platform
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN?
 Manual plumbing is very tiresome, not repeatable
 Assemblies - similar to apps & services, but N x harder (because there are N services to
grapple with)
 Why not static allocations?
– Machines die
– Jobs (MapReduce, Spark) are tolerant of faults, but static services aren’t!
– Upfront capacity planning
– Cannot react to hardware or utilization changes without manual intervention
– Elasticity is a manual operation
 This is fundamentally the same resource-management problem that YARN is built to
address!
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN? Contd..
 The Apache Hadoop ecosystem knows Data services the best – YARN is data-first!
 Big Data use-cases don’t stop at Hadoop services and apps
– Hive for all data, summary in traditional on-demand DB for driving analysts
– Extracting results from HDP and hosting report servers, interactive Uis like Apache Zeppelin
 Users don’t care about this separation
– Big Data is already a huge cluster on one side
– Asking for another infrastructure & needing separate management of this other stuff is
burdensome
– Unified solution >> Silos
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Next
 A colorful, multi-threaded yarn
 For use-cases of various colors
 Today’s applications better
 Simplified long running applications
 Bring your app easily
https://www.flickr.com/photos/happyskrappy/15699919424
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is happening now?
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
APIs
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– https://issues.apache.org/jira/browse/YARN-4793
– [Umbrella] Simplified API layer for services and beyond
 Spawn services & Manage them
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++
 YARN itself is evolving to support services and complex apps
– https://issues.apache.org/jira/browse/YARN-4692
– [Umbrella] Simplified and first-class support for services in YARN
 Scheduling
– Application priorities: YARN-1963
– Affinity / anti-affinity: YARN-1042
– Services as first-class citizens: Preemption, reservations etc
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++ Contd
 Application & Services upgrades
– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
 YARN Federation – to infinity and beyond: YARN-2915
 Easier container sizing models: Resource profiles: YARN-3926
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Framework++
 Platform is only as good as the tools
 A native YARN framework
– https://issues.apache.org/jira/browse/YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Slider supporting a DAG of apps:
– https://issues.apache.org/jira/browse/SLIDER-875
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User facing and operational experience
 Modern YARN web UI - YARN-3368
 Enhanced shell interfaces
 Metrics: Timeline Service V2 – YARN-2928
 Application & Services monitoring, integration with other systems
 First class support for YARN hosted services in Ambari
– https://issues.apache.org/jira/browse/AMBARI-17353
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use-cases.. Assemble!
Platform Services
Storage
Resource
Management Security
Service
Discovery Management
Monitoring
Alerts
Holiday Assembly
HBase
Web
Server
IOT Assembly
Kafka Storm HBase Solr
Governance
MR Tez Spark …
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Take away..
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
(Rest of) The demo Team
• Gour Saha
• Sidhartha Seethana
• Varun Vasudev
• Shane Kumpf
• Jaimin Jetly
• Yusaku Sako
• Yu Liu

More Related Content

PDF
Scalable OCR with NiFi and Tesseract
PPTX
Apache Hadoop YARN: Past, Present and Future
PPT
State of Security: Apache Spark & Apache Zeppelin
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
PPTX
Running Services on YARN
PDF
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
PPTX
Hadoop 3 in a Nutshell
Scalable OCR with NiFi and Tesseract
Apache Hadoop YARN: Past, Present and Future
State of Security: Apache Spark & Apache Zeppelin
Apache Hive 2.0: SQL, Speed, Scale
File Format Benchmark - Avro, JSON, ORC & Parquet
Running Services on YARN
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Hadoop 3 in a Nutshell

What's hot (20)

PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PPTX
Streamline Hadoop DevOps with Apache Ambari
PPTX
Major advancements in Apache Hive towards full support of SQL compliance
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
Hive edw-dataworks summit-eu-april-2017
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
PPTX
Schema Registry - Set Your Data Free
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PPTX
Row/Column- Level Security in SQL for Apache Spark
PPTX
An Apache Hive Based Data Warehouse
PPTX
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
PPTX
Next Generation Execution Engine for Apache Storm
PPTX
LLAP: Building Cloud First BI
PDF
Sub-second-sql-on-hadoop-at-scale
PPTX
Apache Hadoop YARN: Past, Present and Future
PDF
The state of SQL-on-Hadoop in the Cloud
PPTX
Debugging Apache Hadoop YARN Cluster in Production
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Streamline Hadoop DevOps with Apache Ambari
Major advancements in Apache Hive towards full support of SQL compliance
Hadoop & Cloud Storage: Object Store Integration in Production
Hive edw-dataworks summit-eu-april-2017
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Schema Registry - Set Your Data Free
Apache Hadoop 3.0 What's new in YARN and MapReduce
Row/Column- Level Security in SQL for Apache Spark
An Apache Hive Based Data Warehouse
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Next Generation Execution Engine for Apache Storm
LLAP: Building Cloud First BI
Sub-second-sql-on-hadoop-at-scale
Apache Hadoop YARN: Past, Present and Future
The state of SQL-on-Hadoop in the Cloud
Debugging Apache Hadoop YARN Cluster in Production
Apache Hadoop YARN: Past, Present and Future
Ad

Similar to A Multi Colored YARN (20)

PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
PPTX
Apache Hadoop YARN: state of the union
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PPTX
Apache Hadoop YARN: Present and Future
PPTX
Internet of Things Crash Course Workshop at Hadoop Summit
PPTX
Internet of things Crash Course Workshop
PPTX
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
PDF
Storm Demo Talk - Denver Apr 2015
PDF
Storm Demo Talk - Colorado Springs May 2015
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
PPTX
Spark Summit EMEA - Arun Murthy's Keynote
PDF
Paris FOD meetup - Streams Messaging Manager
PPTX
Introduction to the Hortonworks YARN Ready Program
PPTX
YARN - Hadoop Next Generation Compute Platform
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PDF
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop YARN: state of the union
YARN - Next Generation Compute Platform fo Hadoop
Apache Hadoop YARN: Present and Future
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of things Crash Course Workshop
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Colorado Springs May 2015
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit EMEA - Arun Murthy's Keynote
Paris FOD meetup - Streams Messaging Manager
Introduction to the Hortonworks YARN Ready Program
YARN - Hadoop Next Generation Compute Platform
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Developing YARN Applications - Integrating natively to YARN July 24 2014
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Running Apache Spark & Apache Zeppelin in Production
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

A Multi Colored YARN

  • 1. Vinod Kumar Vavilapalli Apache Hadoop PMC, Co-founder of YARN project Hortonworks Inc A Multi-Colored YARN
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About.html  Apache Hadoop PMC, ASF Member  9 years of only Hadoop – Finally the job-adverts asking for “10 years of Hadoop experience” have validity  ’Rewritten’ the Hadoop processing side – Became Apache Hadoop YARN  With me today – Billie Rinaldi: VP Apache Accumulo, Apache Slider PMC, ASF Member – Jayush Luniya: Apache Ambari PMC – Vadim Vaks: Kickass field guy (Sr. Solutions Architect)
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Today  Layers that enable applications and higher order frameworks  It’s all about data!  Still a single colored yarn  Apache Hadoop YARN pretty good at jobs, queries, short running apps – We will continue doing this  Admins and admin tools (Ambari) takes care of statically provisioned services
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Today Platform Services Storage Resource Management Security Management Monitoring Alerts Governance MR Tez Spark …  Run everything in a single secure, multi- tenant, elastic Hadoop YARN cluster – An ongoing journey  Adding new ‘stuff’ to this stack is an involved effort
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evolution of user focus  A need for reuse, composition and to keep building ‘upwards’  Applications & services & more complex combinations - Assembly IOT ApplicationsApache Metron
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IOT ApplicationsApache Metron • Simplified deployment of an assembly – Ready to go packages – Discovery – Resource/capacity planning • Management / monitoring / metrics of assemblies! – “Start / stop” my business app end-to-end – “Tell me what’s happening with my business application” – “I don’t care whether HBase RegionServer is down or not, is my assembly healthy?” • Scale up/down the entire app! – “I got more input coming in, I don’t care how you scale individual pieces, but do scale the entire machinery” Emerging needs of the platform
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why on YARN?  Manual plumbing is very tiresome, not repeatable  Assemblies - similar to apps & services, but N x harder (because there are N services to grapple with)  Why not static allocations? – Machines die – Jobs (MapReduce, Spark) are tolerant of faults, but static services aren’t! – Upfront capacity planning – Cannot react to hardware or utilization changes without manual intervention – Elasticity is a manual operation  This is fundamentally the same resource-management problem that YARN is built to address!
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why on YARN? Contd..  The Apache Hadoop ecosystem knows Data services the best – YARN is data-first!  Big Data use-cases don’t stop at Hadoop services and apps – Hive for all data, summary in traditional on-demand DB for driving analysts – Extracting results from HDP and hosting report servers, interactive Uis like Apache Zeppelin  Users don’t care about this separation – Big Data is already a huge cluster on one side – Asking for another infrastructure & needing separate management of this other stuff is burdensome – Unified solution >> Silos
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform Next  A colorful, multi-threaded yarn  For use-cases of various colors  Today’s applications better  Simplified long running applications  Bring your app easily https://www.flickr.com/photos/happyskrappy/15699919424
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is happening now?
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved APIs  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – https://issues.apache.org/jira/browse/YARN-4793 – [Umbrella] Simplified API layer for services and beyond  Spawn services & Manage them
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Platform++  YARN itself is evolving to support services and complex apps – https://issues.apache.org/jira/browse/YARN-4692 – [Umbrella] Simplified and first-class support for services in YARN  Scheduling – Application priorities: YARN-1963 – Affinity / anti-affinity: YARN-1042 – Services as first-class citizens: Preemption, reservations etc
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Platform++ Contd  Application & Services upgrades – ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757  YARN Federation – to infinity and beyond: YARN-2915  Easier container sizing models: Resource profiles: YARN-3926
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Framework++  Platform is only as good as the tools  A native YARN framework – https://issues.apache.org/jira/browse/YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Slider supporting a DAG of apps: – https://issues.apache.org/jira/browse/SLIDER-875
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User facing and operational experience  Modern YARN web UI - YARN-3368  Enhanced shell interfaces  Metrics: Timeline Service V2 – YARN-2928  Application & Services monitoring, integration with other systems  First class support for YARN hosted services in Ambari – https://issues.apache.org/jira/browse/AMBARI-17353
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use-cases.. Assemble! Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts Holiday Assembly HBase Web Server IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark …
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Take away..
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You (Rest of) The demo Team • Gour Saha • Sidhartha Seethana • Varun Vasudev • Shane Kumpf • Jaimin Jetly • Yusaku Sako • Yu Liu