SlideShare a Scribd company logo
Moving the Oath
Grid to Docker
1
Eric Badger, Oath
● RHEL 7 adoption
● Security
● Isolation
2
Motivations
*Note: Running arbitrary, user-supplied Docker images on YARN
was not a motivation
Architecture
3
● Tasks run in Docker containers
● Daemons run on bare-metal RHEL 7
4
● Seccomp profile
● Linux capabilities
● Run with --no-new-privileges
● Read-only containers and limited mounts
● Enter container as user UID:GID
5
Security
6
Bind-mounts
Bind-mount RO/RW Reason
Hadoop release RO Hadoop jars and confs
Usercache/Filecache directories RO Distributed cache
NSCD socket RW Use host cache for user lookups
HDFS short-circuit socket RW HDFS short-circuit reads/writes
Container log directories RW Write logs where nodemanager can see for
aggregation
Application local directories RW Application-specific temporary space, /tmp, /var/tmp
Grid Migration
7
● No cluster downtime
● No changes to current jobs
● No more than 5% performance degradation
8
Requirements
● Docker or Linux runtime chosen per node
- Based on RHEL version
- See YARN-6456
● During migration jobs will run as mix of bare-metal
processes and Docker containers
● User-transparent
9
Node Specific Runtime
● Preload images on nodes
- Avoid thundering herd
- Avoid task timeouts
● Force jobs to use a Docker image
● Allow users to pick a different image from a small set of
allowed images
10
Image Management
Challenges
11
● System call overhead due to seccomp
● Docker losing track of containers in high-memory situations
● System PID reuse issue causing Docker to restart
● Debugging tasks inside of containers
● Tasks can’t talk to each other through /tmp anymore
12
Challenges
Future
Improvements
13
● User-specific layers
- Harder than it seems
● Podman
● Kata containers
14
Future Improvements
Phase 1: YARN-3611 - Support Docker Containers in
LinuxContainerExecutor
- 92 subtasks, all resolved
Phase 2: YARN-8274 - YARN Container Phase 2
- 26 subtasks, 7 resolved
15
Docker in Apache Hadoop
Many thanks to those who contributed to this work, including the
following:
Shane Kumpf, Eric Yang, Miklos Szegedi, Billie Rinaldi, Chandni
Singh, Craig Condit, Jason Lowe, Jim Brennan, Nathan Roberts,
Dheeraj Kapur
16
Acknowledgements

More Related Content

PDF
Container Orchestration from Theory to Practice
PPTX
Docker and kubernetes
PDF
Leverage LXC/LXD with Kubernetes
PPTX
Kubernetes Introduction
PDF
An Introduction to Kubernetes
PDF
Introduction to docker
ODP
The journey to container adoption in enterprise
PPTX
Continuous delivery workflow with Docker
Container Orchestration from Theory to Practice
Docker and kubernetes
Leverage LXC/LXD with Kubernetes
Kubernetes Introduction
An Introduction to Kubernetes
Introduction to docker
The journey to container adoption in enterprise
Continuous delivery workflow with Docker

What's hot (20)

ODP
OpenVZ, Virtuozzo and Docker
PDF
Docker architecture (version modified)
PDF
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
PDF
Docker Architecture (v1.3)
PDF
Docker quick start
PPTX
Docker Global Hack Day #3
PPTX
Containers orchestrators: Docker vs. Kubernetes
PDF
Docker internals
PDF
LXD Container Hypervisor
PPTX
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
PDF
Docker Swarm & Machine
PDF
Orchestrating Docker containers at scale
PPTX
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
PDF
Linux High Availability Overview - openSUSE.Asia Summit 2015
PPTX
Performance comparison between Linux Containers and Virtual Machines
PDF
Shifter: Containers in HPC Environments
PDF
Docker Architecture
PPTX
Containers in the Cloud
PDF
Lxd the proper way of runing containers
PDF
Docker 1.11 @ Docker SF Meetup
OpenVZ, Virtuozzo and Docker
Docker architecture (version modified)
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Docker Architecture (v1.3)
Docker quick start
Docker Global Hack Day #3
Containers orchestrators: Docker vs. Kubernetes
Docker internals
LXD Container Hypervisor
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Docker Swarm & Machine
Orchestrating Docker containers at scale
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
Linux High Availability Overview - openSUSE.Asia Summit 2015
Performance comparison between Linux Containers and Virtual Machines
Shifter: Containers in HPC Environments
Docker Architecture
Containers in the Cloud
Lxd the proper way of runing containers
Docker 1.11 @ Docker SF Meetup
Ad

Similar to Moving the Oath Grid to Docker, Eric Badger, Oath (20)

PDF
The internals and the latest trends of container runtimes
PDF
Docker+java
PDF
DockerCC.pdf
PDF
Docker primer and tips
PDF
Super powered Drupal development with docker
PDF
Introducing docker
PDF
Containers: from development to production at DevNation 2015
PDF
Docker Up and Running for Web Developers
PDF
Docker up and Running For Web Developers
PDF
Best Practices for Developing & Deploying Java Applications with Docker
PPTX
Using Docker to boost your development experience with Drupal
PPTX
Powercoders · Docker · Fall 2021.pptx
PDF
JOSA TechTalk: Introduction to docker
PDF
A Gentle Introduction to Docker and Containers
PDF
Janus & docker: friends or foe
PDF
State of Containers and the Convergence of HPC and BigData
PDF
Docker from A to Z, including Swarm and OCCS
PDF
Java and Container - Make it Awesome !
PDF
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
PDF
Ippevent : openshift Introduction
The internals and the latest trends of container runtimes
Docker+java
DockerCC.pdf
Docker primer and tips
Super powered Drupal development with docker
Introducing docker
Containers: from development to production at DevNation 2015
Docker Up and Running for Web Developers
Docker up and Running For Web Developers
Best Practices for Developing & Deploying Java Applications with Docker
Using Docker to boost your development experience with Drupal
Powercoders · Docker · Fall 2021.pptx
JOSA TechTalk: Introduction to docker
A Gentle Introduction to Docker and Containers
Janus & docker: friends or foe
State of Containers and the Convergence of HPC and BigData
Docker from A to Z, including Swarm and OCCS
Java and Container - Make it Awesome !
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
Ippevent : openshift Introduction
Ad

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
PDF
CICD at Oath using Screwdriver
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
PDF
Architecting Petabyte Scale AI Applications
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
PDF
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
CICD at Oath using Screwdriver
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Architecting Petabyte Scale AI Applications
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
MYSQL Presentation for SQL database connectivity
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Advanced Soft Computing BINUS July 2025.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology

Moving the Oath Grid to Docker, Eric Badger, Oath

  • 1. Moving the Oath Grid to Docker 1 Eric Badger, Oath
  • 2. ● RHEL 7 adoption ● Security ● Isolation 2 Motivations *Note: Running arbitrary, user-supplied Docker images on YARN was not a motivation
  • 4. ● Tasks run in Docker containers ● Daemons run on bare-metal RHEL 7 4
  • 5. ● Seccomp profile ● Linux capabilities ● Run with --no-new-privileges ● Read-only containers and limited mounts ● Enter container as user UID:GID 5 Security
  • 6. 6 Bind-mounts Bind-mount RO/RW Reason Hadoop release RO Hadoop jars and confs Usercache/Filecache directories RO Distributed cache NSCD socket RW Use host cache for user lookups HDFS short-circuit socket RW HDFS short-circuit reads/writes Container log directories RW Write logs where nodemanager can see for aggregation Application local directories RW Application-specific temporary space, /tmp, /var/tmp
  • 8. ● No cluster downtime ● No changes to current jobs ● No more than 5% performance degradation 8 Requirements
  • 9. ● Docker or Linux runtime chosen per node - Based on RHEL version - See YARN-6456 ● During migration jobs will run as mix of bare-metal processes and Docker containers ● User-transparent 9 Node Specific Runtime
  • 10. ● Preload images on nodes - Avoid thundering herd - Avoid task timeouts ● Force jobs to use a Docker image ● Allow users to pick a different image from a small set of allowed images 10 Image Management
  • 12. ● System call overhead due to seccomp ● Docker losing track of containers in high-memory situations ● System PID reuse issue causing Docker to restart ● Debugging tasks inside of containers ● Tasks can’t talk to each other through /tmp anymore 12 Challenges
  • 14. ● User-specific layers - Harder than it seems ● Podman ● Kata containers 14 Future Improvements
  • 15. Phase 1: YARN-3611 - Support Docker Containers in LinuxContainerExecutor - 92 subtasks, all resolved Phase 2: YARN-8274 - YARN Container Phase 2 - 26 subtasks, 7 resolved 15 Docker in Apache Hadoop
  • 16. Many thanks to those who contributed to this work, including the following: Shane Kumpf, Eric Yang, Miklos Szegedi, Billie Rinaldi, Chandni Singh, Craig Condit, Jason Lowe, Jim Brennan, Nathan Roberts, Dheeraj Kapur 16 Acknowledgements