SlideShare a Scribd company logo
Apache Hadoop 3
Junping Du Andrew Wang
jdu@hortonworks.com andrew.wang@cloudera.com
Andrew Wang
● HDFS @ Cloudera
● Hadoop PMC Member
● Release Manager for Hadoop 3.0
Junping Du
● YARN @ Hortonworks
● Hadoop PMC Member
● Release Manager for Hadoop 2.8
Who We Are
An Abbreviated History of Hadoop Releases
Date Release Major Notes
2007-11-04 0.14.1 First release at the ASF
2011-12-27 1.0.0 Security, HBase support
2012-05-23 2.0.0 YARN, NameNode HA, wire compat
2014-11-18 2.6.0 HDFS encryption, rolling upgrade, node labels
2015-04-21 2.7.0 Most recent production-quality release line
Motivation for Hadoop 3
● Upgrade minimum Java version to Java 8
○ Java 7 end-of-life in April 2015
○ Many Java libraries now only support Java 8
● HDFS erasure coding
○ Major feature that refactored core pieces of HDFS
○ Too big to backport to 2.x
● YARN as Data/container cloud
○ Significant change to support Docker and native service in YARN
● Other miscellaneous incompatible bugfixes and improvements
○ Hadoop 2.x was branched in 2011
○ 6 years of changes waiting for 3.0
Hadoop 3 status and release plan
● A series of alphas and betas leading up to GA
● Alpha4 feature complete
● Beta1 compatibility freeze
● GA by the end of the year
Release Date
3.0.0-alpha1 2016-09-03 ✔
3.0.0-alpha2 2017-01-25 ✔
3.0.0-alpha3 2017-05-16 ✔
3.0.0-alpha4 2017 Q2
3.0.0-beta1 2017 Q3
3.0.0 GA 2017 Q4
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
HDFS & Hadoop Features
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
3x replication vs. Erasure coding
b1
b1
b2
b2
b3
b3
/foo.csv - 3 block file
b1 b2 b3
3x replication vs. Erasure coding
b1
b1
b2
b2
b3
b3
/foo.csv - 3 block file
b1 b2 b3
3 replicas
3 blocks
3 x 3 = 9 total replicas
9 / 3 = 200% overhead!
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
p1 p2
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
3 data blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
p1 p2
2 parity blocks
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
3 data blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
p1 p2
2 parity blocks
b1 b10
10 data blocks
10 + 4 = 14 replicas
14 / 10 = 40% overhead!
p1 p4
4 parity blocks
...b2 ...
/bigfoo.csv - 10 block file
Erasure coding (HDFS-7285)
● Motivation: improve storage efficiency of HDFS
○ ~2x the storage efficiency compared to 3x replication
○ Reduction of overhead from 200% to 40%
● Uses Reed-Solomon(k,m) erasure codes instead of replication
○ Support for multiple erasure coding policies
○ RS(3,2), RS(6,3), RS(10,4)
● Can improves data durability
○ RS(6,3) can tolerate 3 failures
○ RS(10,4) can tolerate 4 failures
● Missing blocks reconstructed from remaining blocks
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2 Reed-Solomon (3,2)
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2 Reed-Solomon (3,2)
X
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2
X
Read 3 remaining blocks
Reed-Solomon (3,2)
b3
Run RS to recover b3
New copy of b3 recovered
EC implications
● File data is striped across multiple nodes and racks
● Reads and writes are remote and cross-rack
● Reconstruction is network-intensive, reads m blocks cross-rack
● Important to use Intel’s optimized ISA-L for performance
○ 1+ GB/s encode/decode speed, much faster than Java implementation
○ CPU is no longer a bottleneck
● Need to combine data into larger files to avoid an explosion in replica count
○ Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas)
○ Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas)
● Works best for archival / cold data usecases
EC performance
EC performance
EC performance
Erasure coding status
● Massive development effort by the Hadoop community
○ 20+ contributors from many companies (Cloudera, Intel, Hortonworks, Huawei, Y! JP, …)
○ 100s of commits over three years (started in 2014)
● Erasure coding is feature complete!
● Solidifying some user APIs in preparation for beta1
● Current focus is on testing and integration efforts
○ Want the complete Hadoop stack to work with HDFS erasure coding enabled
○ Stress / endurance testing to ensure stability
Classpath isolation (HADOOP-11656)
● Hadoop leaks lots of
dependencies onto the
application’s classpath
○ Known offenders: Guava, Protobuf,
Jackson, Jetty, …
● No separate HDFS client jar
means server jars are leaked
● YARN / MR clients not shaded
● HDFS-6200: Split HDFS client
into separate JAR
● HADOOP-11804: Shaded
hadoop-client dependency
● YARN-6466: Shade the task
umbilical for a clean YARN
container environment (ongoing)
Miscellaneous
● Shell script rewrite
● Support for multiple Standby NameNodes
● Intra-DataNode balancer
● Support for Microsoft Azure Data Lake and Aliyun OSS
● Move default ports out of the ephemeral range
● S3 consistency and performance improvements (ongoing)
● Tightening the Hadoop compatibility policy (ongoing)
YARN & MR Features
Apache Hadoop 3.0 - YARN Enhancements
● Built-in support for Long Running Services
● Better resource isolation and Docker!!
● YARN Scheduling Enhancements
● Re-architecture for YARN Timeline Service - ATS v2
● Better User Experiences
● Other Enhancements
YARN Native Service - Key Drivers
● Consolidation of Infrastructure
○ Hadoop clusters have a lot of compute and storage resources (some unused)
○ Can’t I use Hadoop’s resources for non-Hadoop load?
○ Other open source infra/cloud is hard to run, can I use YARN?
○ But does it support Docker? – yes, we heard you
○ Can we run hadoop (Hive, HBase, etc.) or related services on YARN?
■ Benefit from YARN’s Elasticity and resource management
Built-in support for long running service in YARN
● A native YARN framework - YARN-4692
○ Abstract common Framework for long running service
■ Similar to Slider
○ More simplified API
● Recognition of long running service
○ Affect the policy of preemption, container reservation, etc.
○ Auto-restart of containers
○ Long running containers are retried to same node in case of local state
● Service/application upgrade support - YARN-4726
○ Services are expected to run long enough to cross versions
● Dynamic container configuration
● Service Discovery
○ Expose existing service information in YARN registry via DNS (YARN-4757)
Docker on YARN
● Why Docker?
○ Lightweight mechanism for packaging, distributing,
and isolating processes
○ Most popular containerization framework
○ Packaging new apps for YARN easier
■ TensorFlow, etc.
○ Focus on integration instead of container primitives
○ Mostly fits into the YARN Container model
Docker on YARN (Contd.)
● Docker support in LinuxContainerExecutor
○ YARN-3611 (Umbrella)
○ Multiple container types are supported in the same executor.
○ A new docker container runtime is introduced that manages docker containers
○ LinuxContainerExecutor can delegate to either runtime on a per application basis
○ Clients specify which container type they want to use
■ currently via environment variables but eventually through well-defined client
APIs.
Docker road to Yarn on Yarn
Scheduling Enhancements
● Generic Resource Types
○ Abstract ResourceTypes to allow new resources, like: GPU, Network, etc.
○ Resource profiles for containers
■ small, medium, large etc. similar to EC2 instance types
● Global Scheduling: YARN-5139
○ Replace trigger scheduling only on heartbeat with global scheduler that has parallel threads
○ Globally optimal placement
■ Critical for long running services – they stick to the allocation – better be a good one
■ Enhanced container scheduling throughput (6 - 8x)
Scheduling Enhancements (Contd.)
● Other CapacityScheduler improvements
○ Queue Management Improvements
■ More Dynamic Queue reconfiguration
■ REST API support for queue management
○ Absolute resource configuration support
○ Priority Support in Application and Queue
○ Preemption improvements
■ Inter-Queue preemption support
● FairScheduler improvements
○ Preemption improvements
■ Preemption considers resource requirements of starved applications
○ Better defaults:
■ Assign multiple containers in a heartbeat based on resource availability
Application Timeline Service v2
● ATS: Captures system/application
events/metrics
● v2 improvements:
○ Enhanced Data Model: first-class citizen for Flows,
Config, etc.
○ Scalable backend: HBase
○ Distributed Reader/Writer
○ Others
■ Captures system metrics. E.g. memory/cpu
usage per container over time
■ Efficient updates: just write a new version to
the appropriate HBase cell
ATS v2 architecture
YARN New WebUI
● Improved visibility into cluster usage
○ Memory, CPU
○ By queues and applications
○ Sunburst graphs for hierarchical queues
○ NodeManager heatmap
● ATSv2 integration
○ Plot container start/stop events
○ Easy to capture delays in app execution
Misc. YARN/MR improvements
● Opportunistic containers (YARN-2877 & YARN-5542)
○ Motivation: Resource utilization is typically low in most clusters
○ Solution: Run some containers at a lower priority, and preempted as and when needed for
Guaranteed containers
● YARN Federation (YARN-2915 & YARN-5597)
○ Allows YARN to scale to 100k nodes and beyond
● HA improvements
○ Better handling of transient network issues
○ ZK-store scalability: Limit number of children under a znode
● MapReduce Native Collector (MAPREDUCE-2841)
○ Native implementation of the map output collector
○ Upto 30% faster for shuffle-intensive jobs
Summary: What’s new in Hadoop 3.0?
● Storage Optimization
○ HDFS: Erasure codes
● Improved Utilization
○ YARN: Long Running Services
○ YARN: Schedule Enhancements
● Additional Workloads
○ YARN: Docker & Isolation
● Easier to Use
○ New User Interface
● Refactor Base
○ Lots of Trunk content
○ JDK8 and newer dependent libraries
3.0
Compatibility & Testing
Compatibility
● Strong feedback from large users on the need for compatibility
● Preserves wire-compatibility with Hadoop 2 clients
○ Impossible to coordinate upgrading off-cluster Hadoop clients
● Will support rolling upgrade from Hadoop 2 to Hadoop 3
○ Can’t take downtime to upgrade a business-critical cluster
● Not fully preserving API compatibility!
○ Dependency version bumps
○ Removal of deprecated APIs and tools
○ Shell script rewrite, rework of Hadoop tools scripts
○ Incompatible bug fixes
Testing and validation
● Extended alpha → beta → GA plan designed for stabilization
● EC already has some users in production (700 nodes at Y! JP)
● Cloudera is rebasing CDH against upstream and running full test suite
○ Integration of Hadoop 3 with all components in CDH stack
○ Same integration tests used to validate CDH5
● Hortonworks is also integrating and testing Hadoop 3
● Plans for extensive HDFS EC testing by Cloudera and Intel
● Happy synergy between 2.8.x and 3.0.x lines
○ Shares much of the same code, fixes flow into both
○ Yahoo! doing scale testing of 2.8.0
Conclusion
● Expect Hadoop 3.0.0 GA by the end of the year
● Shiny new features
○ HDFS Erasure Coding
○ YARN Docker and Native Service Support
○ YARN ATSv2
○ Client classpath isolation
● Great time to get involved in testing and validation
● Come to the BoFs on Thursday at 5PM
○ YARN BoF: Room 211
○ HDFS BoF: Room 212
Thank You!

More Related Content

PPTX
Lessons learned from running Spark on Docker
PPTX
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
PPTX
Node Labels in YARN
PDF
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
PPTX
Graphene – Microsoft SCOPE on Tez
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PPTX
Practice of large Hadoop cluster in China Mobile
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Lessons learned from running Spark on Docker
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Node Labels in YARN
Bringing Real-Time to the Enterprise with Hortonworks DataFlow
Graphene – Microsoft SCOPE on Tez
Apache Hadoop 3.0 What's new in YARN and MapReduce
Practice of large Hadoop cluster in China Mobile
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

What's hot (20)

PPTX
To The Cloud and Back: A Look At Hybrid Analytics
PDF
Present and future of unified, portable, and efficient data processing with A...
PPTX
Bringing complex event processing to Spark streaming
PPTX
Hadoop 3 in a Nutshell
PPTX
Evolving HDFS to Generalized Storage Subsystem
PPTX
Evolving HDFS to a Generalized Storage Subsystem
PPTX
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
PDF
Present and future of unified, portable and efficient data processing with Ap...
PDF
Spark Uber Development Kit
PPTX
Apache Hadoop YARN: Present and Future
PPTX
Schema Registry - Set Your Data Free
PPTX
Moving towards enterprise ready Hadoop clusters on the cloud
PPTX
Empower Data-Driven Organizations with HPE and Hadoop
PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
PPTX
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
PPTX
The Future of Apache Ambari
PPTX
HDFS tiered storage
PPTX
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PPTX
Apache Hadoop 3.0 Community Update
PPTX
Enabling real interactive BI on Hadoop
To The Cloud and Back: A Look At Hybrid Analytics
Present and future of unified, portable, and efficient data processing with A...
Bringing complex event processing to Spark streaming
Hadoop 3 in a Nutshell
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Present and future of unified, portable and efficient data processing with Ap...
Spark Uber Development Kit
Apache Hadoop YARN: Present and Future
Schema Registry - Set Your Data Free
Moving towards enterprise ready Hadoop clusters on the cloud
Empower Data-Driven Organizations with HPE and Hadoop
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
The Future of Apache Ambari
HDFS tiered storage
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Apache Hadoop 3.0 Community Update
Enabling real interactive BI on Hadoop
Ad

Similar to Apache Hadoop 3.0 Community Update (20)

PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
PDF
Apache Hadoop 3
PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
PPTX
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
PDF
Nicholas:hdfs what is new in hadoop 2
PDF
Challenges with Gluster and Persistent Memory with Dan Lambright
PDF
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
PDF
Hadoop 3.0 - Revolution or evolution?
PPTX
HDFS- What is New and Future
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
What's New with Ceph - Ceph Day Silicon Valley
PDF
The state of SQL-on-Hadoop in the Cloud
PDF
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
PDF
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
PDF
Netflix Open Source Meetup Season 4 Episode 2
PPTX
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
PDF
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
PPTX
Hadoop introduction
PPTX
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
PPTX
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Apache Hadoop 3
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Nicholas:hdfs what is new in hadoop 2
Challenges with Gluster and Persistent Memory with Dan Lambright
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Hadoop 3.0 - Revolution or evolution?
HDFS- What is New and Future
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
What's New with Ceph - Ceph Day Silicon Valley
The state of SQL-on-Hadoop in the Cloud
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Netflix Open Source Meetup Season 4 Episode 2
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Hadoop introduction
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
Spectral efficient network and resource selection model in 5G networks
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Apache Hadoop 3.0 Community Update

  • 1. Apache Hadoop 3 Junping Du Andrew Wang jdu@hortonworks.com andrew.wang@cloudera.com
  • 2. Andrew Wang ● HDFS @ Cloudera ● Hadoop PMC Member ● Release Manager for Hadoop 3.0 Junping Du ● YARN @ Hortonworks ● Hadoop PMC Member ● Release Manager for Hadoop 2.8 Who We Are
  • 3. An Abbreviated History of Hadoop Releases Date Release Major Notes 2007-11-04 0.14.1 First release at the ASF 2011-12-27 1.0.0 Security, HBase support 2012-05-23 2.0.0 YARN, NameNode HA, wire compat 2014-11-18 2.6.0 HDFS encryption, rolling upgrade, node labels 2015-04-21 2.7.0 Most recent production-quality release line
  • 4. Motivation for Hadoop 3 ● Upgrade minimum Java version to Java 8 ○ Java 7 end-of-life in April 2015 ○ Many Java libraries now only support Java 8 ● HDFS erasure coding ○ Major feature that refactored core pieces of HDFS ○ Too big to backport to 2.x ● YARN as Data/container cloud ○ Significant change to support Docker and native service in YARN ● Other miscellaneous incompatible bugfixes and improvements ○ Hadoop 2.x was branched in 2011 ○ 6 years of changes waiting for 3.0
  • 5. Hadoop 3 status and release plan ● A series of alphas and betas leading up to GA ● Alpha4 feature complete ● Beta1 compatibility freeze ● GA by the end of the year Release Date 3.0.0-alpha1 2016-09-03 ✔ 3.0.0-alpha2 2017-01-25 ✔ 3.0.0-alpha3 2017-05-16 ✔ 3.0.0-alpha4 2017 Q2 3.0.0-beta1 2017 Q3 3.0.0 GA 2017 Q4 https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
  • 6. HDFS & Hadoop Features
  • 7. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file
  • 8. 3x replication vs. Erasure coding b1 b1 b2 b2 b3 b3 /foo.csv - 3 block file b1 b2 b3
  • 9. 3x replication vs. Erasure coding b1 b1 b2 b2 b3 b3 /foo.csv - 3 block file b1 b2 b3 3 replicas 3 blocks 3 x 3 = 9 total replicas 9 / 3 = 200% overhead!
  • 10. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file
  • 11. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file p1 p2
  • 12. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file 3 data blocks 3 + 2 = 5 replicas 5 / 3 = 67% overhead! p1 p2 2 parity blocks
  • 13. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file 3 data blocks 3 + 2 = 5 replicas 5 / 3 = 67% overhead! p1 p2 2 parity blocks b1 b10 10 data blocks 10 + 4 = 14 replicas 14 / 10 = 40% overhead! p1 p4 4 parity blocks ...b2 ... /bigfoo.csv - 10 block file
  • 14. Erasure coding (HDFS-7285) ● Motivation: improve storage efficiency of HDFS ○ ~2x the storage efficiency compared to 3x replication ○ Reduction of overhead from 200% to 40% ● Uses Reed-Solomon(k,m) erasure codes instead of replication ○ Support for multiple erasure coding policies ○ RS(3,2), RS(6,3), RS(10,4) ● Can improves data durability ○ RS(6,3) can tolerate 3 failures ○ RS(10,4) can tolerate 4 failures ● Missing blocks reconstructed from remaining blocks
  • 15. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 Reed-Solomon (3,2)
  • 16. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 Reed-Solomon (3,2) X
  • 17. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 X Read 3 remaining blocks Reed-Solomon (3,2) b3 Run RS to recover b3 New copy of b3 recovered
  • 18. EC implications ● File data is striped across multiple nodes and racks ● Reads and writes are remote and cross-rack ● Reconstruction is network-intensive, reads m blocks cross-rack ● Important to use Intel’s optimized ISA-L for performance ○ 1+ GB/s encode/decode speed, much faster than Java implementation ○ CPU is no longer a bottleneck ● Need to combine data into larger files to avoid an explosion in replica count ○ Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas) ○ Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas) ● Works best for archival / cold data usecases
  • 22. Erasure coding status ● Massive development effort by the Hadoop community ○ 20+ contributors from many companies (Cloudera, Intel, Hortonworks, Huawei, Y! JP, …) ○ 100s of commits over three years (started in 2014) ● Erasure coding is feature complete! ● Solidifying some user APIs in preparation for beta1 ● Current focus is on testing and integration efforts ○ Want the complete Hadoop stack to work with HDFS erasure coding enabled ○ Stress / endurance testing to ensure stability
  • 23. Classpath isolation (HADOOP-11656) ● Hadoop leaks lots of dependencies onto the application’s classpath ○ Known offenders: Guava, Protobuf, Jackson, Jetty, … ● No separate HDFS client jar means server jars are leaked ● YARN / MR clients not shaded ● HDFS-6200: Split HDFS client into separate JAR ● HADOOP-11804: Shaded hadoop-client dependency ● YARN-6466: Shade the task umbilical for a clean YARN container environment (ongoing)
  • 24. Miscellaneous ● Shell script rewrite ● Support for multiple Standby NameNodes ● Intra-DataNode balancer ● Support for Microsoft Azure Data Lake and Aliyun OSS ● Move default ports out of the ephemeral range ● S3 consistency and performance improvements (ongoing) ● Tightening the Hadoop compatibility policy (ongoing)
  • 25. YARN & MR Features
  • 26. Apache Hadoop 3.0 - YARN Enhancements ● Built-in support for Long Running Services ● Better resource isolation and Docker!! ● YARN Scheduling Enhancements ● Re-architecture for YARN Timeline Service - ATS v2 ● Better User Experiences ● Other Enhancements
  • 27. YARN Native Service - Key Drivers ● Consolidation of Infrastructure ○ Hadoop clusters have a lot of compute and storage resources (some unused) ○ Can’t I use Hadoop’s resources for non-Hadoop load? ○ Other open source infra/cloud is hard to run, can I use YARN? ○ But does it support Docker? – yes, we heard you ○ Can we run hadoop (Hive, HBase, etc.) or related services on YARN? ■ Benefit from YARN’s Elasticity and resource management
  • 28. Built-in support for long running service in YARN ● A native YARN framework - YARN-4692 ○ Abstract common Framework for long running service ■ Similar to Slider ○ More simplified API ● Recognition of long running service ○ Affect the policy of preemption, container reservation, etc. ○ Auto-restart of containers ○ Long running containers are retried to same node in case of local state ● Service/application upgrade support - YARN-4726 ○ Services are expected to run long enough to cross versions ● Dynamic container configuration ● Service Discovery ○ Expose existing service information in YARN registry via DNS (YARN-4757)
  • 29. Docker on YARN ● Why Docker? ○ Lightweight mechanism for packaging, distributing, and isolating processes ○ Most popular containerization framework ○ Packaging new apps for YARN easier ■ TensorFlow, etc. ○ Focus on integration instead of container primitives ○ Mostly fits into the YARN Container model
  • 30. Docker on YARN (Contd.) ● Docker support in LinuxContainerExecutor ○ YARN-3611 (Umbrella) ○ Multiple container types are supported in the same executor. ○ A new docker container runtime is introduced that manages docker containers ○ LinuxContainerExecutor can delegate to either runtime on a per application basis ○ Clients specify which container type they want to use ■ currently via environment variables but eventually through well-defined client APIs.
  • 31. Docker road to Yarn on Yarn
  • 32. Scheduling Enhancements ● Generic Resource Types ○ Abstract ResourceTypes to allow new resources, like: GPU, Network, etc. ○ Resource profiles for containers ■ small, medium, large etc. similar to EC2 instance types ● Global Scheduling: YARN-5139 ○ Replace trigger scheduling only on heartbeat with global scheduler that has parallel threads ○ Globally optimal placement ■ Critical for long running services – they stick to the allocation – better be a good one ■ Enhanced container scheduling throughput (6 - 8x)
  • 33. Scheduling Enhancements (Contd.) ● Other CapacityScheduler improvements ○ Queue Management Improvements ■ More Dynamic Queue reconfiguration ■ REST API support for queue management ○ Absolute resource configuration support ○ Priority Support in Application and Queue ○ Preemption improvements ■ Inter-Queue preemption support ● FairScheduler improvements ○ Preemption improvements ■ Preemption considers resource requirements of starved applications ○ Better defaults: ■ Assign multiple containers in a heartbeat based on resource availability
  • 34. Application Timeline Service v2 ● ATS: Captures system/application events/metrics ● v2 improvements: ○ Enhanced Data Model: first-class citizen for Flows, Config, etc. ○ Scalable backend: HBase ○ Distributed Reader/Writer ○ Others ■ Captures system metrics. E.g. memory/cpu usage per container over time ■ Efficient updates: just write a new version to the appropriate HBase cell
  • 36. YARN New WebUI ● Improved visibility into cluster usage ○ Memory, CPU ○ By queues and applications ○ Sunburst graphs for hierarchical queues ○ NodeManager heatmap ● ATSv2 integration ○ Plot container start/stop events ○ Easy to capture delays in app execution
  • 37. Misc. YARN/MR improvements ● Opportunistic containers (YARN-2877 & YARN-5542) ○ Motivation: Resource utilization is typically low in most clusters ○ Solution: Run some containers at a lower priority, and preempted as and when needed for Guaranteed containers ● YARN Federation (YARN-2915 & YARN-5597) ○ Allows YARN to scale to 100k nodes and beyond ● HA improvements ○ Better handling of transient network issues ○ ZK-store scalability: Limit number of children under a znode ● MapReduce Native Collector (MAPREDUCE-2841) ○ Native implementation of the map output collector ○ Upto 30% faster for shuffle-intensive jobs
  • 38. Summary: What’s new in Hadoop 3.0? ● Storage Optimization ○ HDFS: Erasure codes ● Improved Utilization ○ YARN: Long Running Services ○ YARN: Schedule Enhancements ● Additional Workloads ○ YARN: Docker & Isolation ● Easier to Use ○ New User Interface ● Refactor Base ○ Lots of Trunk content ○ JDK8 and newer dependent libraries 3.0
  • 40. Compatibility ● Strong feedback from large users on the need for compatibility ● Preserves wire-compatibility with Hadoop 2 clients ○ Impossible to coordinate upgrading off-cluster Hadoop clients ● Will support rolling upgrade from Hadoop 2 to Hadoop 3 ○ Can’t take downtime to upgrade a business-critical cluster ● Not fully preserving API compatibility! ○ Dependency version bumps ○ Removal of deprecated APIs and tools ○ Shell script rewrite, rework of Hadoop tools scripts ○ Incompatible bug fixes
  • 41. Testing and validation ● Extended alpha → beta → GA plan designed for stabilization ● EC already has some users in production (700 nodes at Y! JP) ● Cloudera is rebasing CDH against upstream and running full test suite ○ Integration of Hadoop 3 with all components in CDH stack ○ Same integration tests used to validate CDH5 ● Hortonworks is also integrating and testing Hadoop 3 ● Plans for extensive HDFS EC testing by Cloudera and Intel ● Happy synergy between 2.8.x and 3.0.x lines ○ Shares much of the same code, fixes flow into both ○ Yahoo! doing scale testing of 2.8.0
  • 42. Conclusion ● Expect Hadoop 3.0.0 GA by the end of the year ● Shiny new features ○ HDFS Erasure Coding ○ YARN Docker and Native Service Support ○ YARN ATSv2 ○ Client classpath isolation ● Great time to get involved in testing and validation ● Come to the BoFs on Thursday at 5PM ○ YARN BoF: Room 211 ○ HDFS BoF: Room 212