Apache Hadoop 3.0 Community Update

Apache Hadoop 3
Junping Du Andrew Wang
jdu@hortonworks.com andrew.wang@cloudera.com

Andrew Wang
● HDFS @ Cloudera
● Hadoop PMC Member
● Release Manager for Hadoop 3.0
Junping Du
● YARN @ Hortonworks
● Hadoop PMC Member
● Release Manager for Hadoop 2.8
Who We Are

An Abbreviated History of Hadoop Releases
Date Release Major Notes
2007-11-04 0.14.1 First release at the ASF
2011-12-27 1.0.0 Security, HBase support
2012-05-23 2.0.0 YARN, NameNode HA, wire compat
2014-11-18 2.6.0 HDFS encryption, rolling upgrade, node labels
2015-04-21 2.7.0 Most recent production-quality release line

Motivation for Hadoop 3
● Upgrade minimum Java version to Java 8
○ Java 7 end-of-life in April 2015
○ Many Java libraries now only support Java 8
● HDFS erasure coding
○ Major feature that refactored core pieces of HDFS
○ Too big to backport to 2.x
● YARN as Data/container cloud
○ Significant change to support Docker and native service in YARN
● Other miscellaneous incompatible bugfixes and improvements
○ Hadoop 2.x was branched in 2011
○ 6 years of changes waiting for 3.0

Hadoop 3 status and release plan
● A series of alphas and betas leading up to GA
● Alpha4 feature complete
● Beta1 compatibility freeze
● GA by the end of the year
Release Date
3.0.0-alpha1 2016-09-03 ✔
3.0.0-alpha2 2017-01-25 ✔
3.0.0-alpha3 2017-05-16 ✔
3.0.0-alpha4 2017 Q2
3.0.0-beta1 2017 Q3
3.0.0 GA 2017 Q4
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release

3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file

b1
b1
b2
b2
b3
b3
b1 b2 b3

b1
b1
b2
b2
b3
b3
b1 b2 b3
3 replicas
3 blocks
3 x 3 = 9 total replicas
9 / 3 = 200% overhead!

b1 b2 b3
p1 p2

b1 b2 b3
3 data blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
p1 p2
2 parity blocks

b1 b2 b3
3 data blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
p1 p2
2 parity blocks
b1 b10
10 data blocks
10 + 4 = 14 replicas
14 / 10 = 40% overhead!
p1 p4
4 parity blocks
...b2 ...
/bigfoo.csv - 10 block file

Erasure coding (HDFS-7285)
● Motivation: improve storage efficiency of HDFS
○ ~2x the storage efficiency compared to 3x replication
○ Reduction of overhead from 200% to 40%
● Uses Reed-Solomon(k,m) erasure codes instead of replication
○ Support for multiple erasure coding policies
○ RS(3,2), RS(6,3), RS(10,4)
● Can improves data durability
○ RS(6,3) can tolerate 3 failures
○ RS(10,4) can tolerate 4 failures
● Missing blocks reconstructed from remaining blocks

EC Reconstruction
b1 b2 b3
p1 p2 Reed-Solomon (3,2)

EC Reconstruction
b1 b2 b3
p1 p2 Reed-Solomon (3,2)
X

EC Reconstruction
b1 b2 b3
p1 p2
X
Read 3 remaining blocks
Reed-Solomon (3,2)
b3
Run RS to recover b3
New copy of b3 recovered

EC implications
● File data is striped across multiple nodes and racks
● Reads and writes are remote and cross-rack
● Reconstruction is network-intensive, reads m blocks cross-rack
● Important to use Intel’s optimized ISA-L for performance
○ 1+ GB/s encode/decode speed, much faster than Java implementation
○ CPU is no longer a bottleneck
● Need to combine data into larger files to avoid an explosion in replica count
○ Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas)
○ Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas)
● Works best for archival / cold data usecases

Erasure coding status
● Massive development effort by the Hadoop community
○ 20+ contributors from many companies (Cloudera, Intel, Hortonworks, Huawei, Y! JP, …)
○ 100s of commits over three years (started in 2014)
● Erasure coding is feature complete!
● Solidifying some user APIs in preparation for beta1
● Current focus is on testing and integration efforts
○ Want the complete Hadoop stack to work with HDFS erasure coding enabled
○ Stress / endurance testing to ensure stability

Classpath isolation (HADOOP-11656)
● Hadoop leaks lots of
dependencies onto the
application’s classpath
○ Known offenders: Guava, Protobuf,
Jackson, Jetty, …
● No separate HDFS client jar
means server jars are leaked
● YARN / MR clients not shaded
● HDFS-6200: Split HDFS client
into separate JAR
● HADOOP-11804: Shaded
hadoop-client dependency
● YARN-6466: Shade the task
umbilical for a clean YARN
container environment (ongoing)

Miscellaneous
● Shell script rewrite
● Support for multiple Standby NameNodes
● Intra-DataNode balancer
● Support for Microsoft Azure Data Lake and Aliyun OSS
● Move default ports out of the ephemeral range
● S3 consistency and performance improvements (ongoing)
● Tightening the Hadoop compatibility policy (ongoing)

Apache Hadoop 3.0 - YARN Enhancements
● Built-in support for Long Running Services
● Better resource isolation and Docker!!
● YARN Scheduling Enhancements
● Re-architecture for YARN Timeline Service - ATS v2
● Better User Experiences
● Other Enhancements

YARN Native Service - Key Drivers
● Consolidation of Infrastructure
○ Hadoop clusters have a lot of compute and storage resources (some unused)
○ Can’t I use Hadoop’s resources for non-Hadoop load?
○ Other open source infra/cloud is hard to run, can I use YARN?
○ But does it support Docker? – yes, we heard you
○ Can we run hadoop (Hive, HBase, etc.) or related services on YARN?
■ Benefit from YARN’s Elasticity and resource management

Built-in support for long running service in YARN
● A native YARN framework - YARN-4692
○ Abstract common Framework for long running service
■ Similar to Slider
○ More simplified API
● Recognition of long running service
○ Affect the policy of preemption, container reservation, etc.
○ Auto-restart of containers
○ Long running containers are retried to same node in case of local state
● Service/application upgrade support - YARN-4726
○ Services are expected to run long enough to cross versions
● Dynamic container configuration
● Service Discovery
○ Expose existing service information in YARN registry via DNS (YARN-4757)

Docker on YARN
● Why Docker?
○ Lightweight mechanism for packaging, distributing,
and isolating processes
○ Most popular containerization framework
○ Packaging new apps for YARN easier
■ TensorFlow, etc.
○ Focus on integration instead of container primitives
○ Mostly fits into the YARN Container model

Docker on YARN (Contd.)
● Docker support in LinuxContainerExecutor
○ YARN-3611 (Umbrella)
○ Multiple container types are supported in the same executor.
○ A new docker container runtime is introduced that manages docker containers
○ LinuxContainerExecutor can delegate to either runtime on a per application basis
○ Clients specify which container type they want to use
■ currently via environment variables but eventually through well-defined client
APIs.

Scheduling Enhancements
● Generic Resource Types
○ Abstract ResourceTypes to allow new resources, like: GPU, Network, etc.
○ Resource profiles for containers
■ small, medium, large etc. similar to EC2 instance types
● Global Scheduling: YARN-5139
○ Replace trigger scheduling only on heartbeat with global scheduler that has parallel threads
○ Globally optimal placement
■ Critical for long running services – they stick to the allocation – better be a good one
■ Enhanced container scheduling throughput (6 - 8x)

Scheduling Enhancements (Contd.)
● Other CapacityScheduler improvements
○ Queue Management Improvements
■ More Dynamic Queue reconfiguration
■ REST API support for queue management
○ Absolute resource configuration support
○ Priority Support in Application and Queue
○ Preemption improvements
■ Inter-Queue preemption support
● FairScheduler improvements
○ Preemption improvements
■ Preemption considers resource requirements of starved applications
○ Better defaults:
■ Assign multiple containers in a heartbeat based on resource availability

Application Timeline Service v2
● ATS: Captures system/application
events/metrics
● v2 improvements:
○ Enhanced Data Model: first-class citizen for Flows,
Config, etc.
○ Scalable backend: HBase
○ Distributed Reader/Writer
○ Others
■ Captures system metrics. E.g. memory/cpu
usage per container over time
■ Efficient updates: just write a new version to
the appropriate HBase cell

YARN New WebUI
● Improved visibility into cluster usage
○ Memory, CPU
○ By queues and applications
○ Sunburst graphs for hierarchical queues
○ NodeManager heatmap
● ATSv2 integration
○ Plot container start/stop events
○ Easy to capture delays in app execution

Misc. YARN/MR improvements
● Opportunistic containers (YARN-2877 & YARN-5542)
○ Motivation: Resource utilization is typically low in most clusters
○ Solution: Run some containers at a lower priority, and preempted as and when needed for
Guaranteed containers
● YARN Federation (YARN-2915 & YARN-5597)
○ Allows YARN to scale to 100k nodes and beyond
● HA improvements
○ Better handling of transient network issues
○ ZK-store scalability: Limit number of children under a znode
● MapReduce Native Collector (MAPREDUCE-2841)
○ Native implementation of the map output collector
○ Upto 30% faster for shuffle-intensive jobs

Summary: What’s new in Hadoop 3.0?
● Storage Optimization
○ HDFS: Erasure codes
● Improved Utilization
○ YARN: Long Running Services
○ YARN: Schedule Enhancements
● Additional Workloads
○ YARN: Docker & Isolation
● Easier to Use
○ New User Interface
● Refactor Base
○ Lots of Trunk content
○ JDK8 and newer dependent libraries
3.0

Compatibility
● Strong feedback from large users on the need for compatibility
● Preserves wire-compatibility with Hadoop 2 clients
○ Impossible to coordinate upgrading off-cluster Hadoop clients
● Will support rolling upgrade from Hadoop 2 to Hadoop 3
○ Can’t take downtime to upgrade a business-critical cluster
● Not fully preserving API compatibility!
○ Dependency version bumps
○ Removal of deprecated APIs and tools
○ Shell script rewrite, rework of Hadoop tools scripts
○ Incompatible bug fixes

Testing and validation
● Extended alpha → beta → GA plan designed for stabilization
● EC already has some users in production (700 nodes at Y! JP)
● Cloudera is rebasing CDH against upstream and running full test suite
○ Integration of Hadoop 3 with all components in CDH stack
○ Same integration tests used to validate CDH5
● Hortonworks is also integrating and testing Hadoop 3
● Plans for extensive HDFS EC testing by Cloudera and Intel
● Happy synergy between 2.8.x and 3.0.x lines
○ Shares much of the same code, fixes flow into both
○ Yahoo! doing scale testing of 2.8.0

Conclusion
● Expect Hadoop 3.0.0 GA by the end of the year
● Shiny new features
○ HDFS Erasure Coding
○ YARN Docker and Native Service Support
○ YARN ATSv2
○ Client classpath isolation
● Great time to get involved in testing and validation
● Come to the BoFs on Thursday at 5PM
○ YARN BoF: Room 211
○ HDFS BoF: Room 212

Apache Hadoop 3.0 Community Update

More Related Content

What's hot (20)

Similar to Apache Hadoop 3.0 Community Update (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Apache Hadoop 3.0 Community Update