SlideShare a Scribd company logo
Elastic resource scheduling for
Netflix's scalable container
cloud
Sharma Podila, Andrew Spyker, Tomasz Bak
Feb 7th 2017
Topics
● Motivations for containers on AWS EC2
● Scheduling using Apache Mesos
● Fenzo deep dive
● Future plans
Containers add to our VM infrastructure
Already in VM’s have ...
microservice driven,
cloud native,
CI/CD devops enabled,
resilient,
elastically scalable,
environment
Container Provides Innovation Velocity
● Iterative local development, deploy when ready
● Manage app and dependencies easily and completely
● Simpler way to express resources, let system manage
Service Batch
Sampling of container usage
Media Encoding
Digital Watermarking
NodeJS UI Services
Operations and General
Stream Processing
Reporting
Sampling of realized container benefits
● Media Encoding - encoding research development time
○ VM’s platform to container platform - 1 month vs. 1 week
● Continuous Integration Testing
○ Build all Netflix codebases in hours
○ Saves development 100’s of hours of debugging
● Netflix API Re-architecture using NodeJS
○ Focus returns to app development
○ Provided reliable smaller instances
○ Simplifies, speeds test and deployment
Scheduling use cases
Reactive stream processing: Mantis
Zuul
Cluster
API
Cluster
Mantis
Stream processing
Cloud native service
● Configurable message delivery guarantees
● Heterogeneous workloads
○ Real-time dashboarding, alerting
○ Anomaly detection, metric generation
○ Interactive exploration of streaming data
Anomaly
Detection
Current Mantis usage
● At peak:
○ 2,300 EC2 instances of M3.2xlarge instances
● Peak of 900 concurrent jobs
● Peak of 5,200 concurrent containers
○ Trough of 4,000 containers
○ Job sizes range from 1 to 500 containers
● Mix of perpetual and interactive exploratory jobs
● Peak of 13 Million events / sec
EC2
VPC
VMVM
TitusJob
Control
Containers
App
Cloud Platform
(metrics, IPC, health)
VMVM
Batch
Containers
Eureka Edda
Container deployment: Titus
Atlas &
Insight
Current Titus usage
#Containers (tasks) for the week of 11/7 in one of the regions
● Peak of ~1,800 instances
○ Mix of m4.4xl, r3.8xl, p2.8xl
○ ~800 instances at trough
● Mix of batch, stream
processing, and some
microservices
Core architectural components
AWS EC2
Apache Mesos
Titus/Mantis Framework
Fenzo
Fenzo at
https://github.com/Netflix/Fenzo
Apache Mesos at
http://mesos.apache.org/
Batch Job
Mgr
Service Job
Mgr
Scheduling using Apache Mesos
Mesos Architecture
Motivation for a new Mesos scheduler
● Cloud native (cluster autoscaling)
● Customizable task placement optimizations
○ Mix of service, batch, and stream topologies
What does a Mesos scheduler do?
● API for users to interact
● Mesos interaction via the driver
● Compute resource assignments for tasks
What does a Mesos scheduler do?
● API for users to interact
● Be connected to Mesos via the driver
● Compute resource assignments for tasks
○ NetflixOSS Fenzo
https://github.com/Netflix/Fenzo
Fenzo deep dive
Scheduling optimizations
Speed Accuracy
First fit assignment Optimal assignment
Real world trade-offs
Fitness
Pending
Assigned
Urgency
Scheduling problem
N tasks to assign from M possible agents
Scheduling optimizations
Resource assignments
DC/Cloud
operator
Scheduling optimizations
● Bin packing
○ By resource usage
○ By job types
● Ease deployment of new
agent AMIs
● Ease server maintenance and
upgrades
DC/Cloud
operator
Application
owner
● Task locality, anti-locality
(noisy neighbors?, etc.)
● Resource affinity
● Task balancing across
racks/AZs/hosts
Scheduling optimizations
DC/Cloud
operator
Application
owner
Cost
● Save cloud footprint costs
● Right instance types
● Save power, cooling costs
● Does everything need to run right
away?
Scheduling optimizations
DC/Cloud
operator
Application
owner
Cost Security
Security aspects of
multi-tenant
applications on a host
Scheduling optimizations
DC/Cloud
operator
Application
owner
Cost Security
Proceed quickly in the
generally right
direction, adapting to
changes
Scheduling optimizations
● Extensible
● Cloud native
● Ease of experimentation
● Scheduling decisions visibility
Fenzo goals
Fenzo scheduling strategy
For each (ordered) task
On each available host
Validate hard constraints
Score fitness and soft constraints
Until score good enough, and
A minimum #hosts evaluated
Pick host with highest score
Experimentation with Fenzo
● Abstractions of tasks and servers (VMs)
● Create various strategies with custom fitness functions
and constraints
○ For example, dynamic task anti-locality
● “Good enough” can be dynamic
○ Based on pending task set size, task type, etc.
● Ordering of servers for allocation based on task type
Experimentation with Fenzo
Task runtime bin packing sample results
Resource bin packing sample results
Fitness function Vs. constraints
● Fitness: site policies
○ Bin packing for utilization, reduce fragmentation
○ Segregate hosts by task types, e.g., service Vs batch
● Constraints: user preferences
○ Resource affinity
○ Task locality
○ Balance tasks across racks or availability zones
● Degree of fitness, score of 0.0 - 1.0
● Composable
○ Multiple weighted fitness functions
● Extensible
○ Combine existing ones with custom plugins
Fitness evaluation
CPU bin packing fitness function
Fitness for
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
Fitness for 0.25 0.5 0.75 1.0 0.0
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
CPU bin packing fitness function
Fitness for 0.25 0.5 0.75 1.0 0.0
✔
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs
Host1 Host2 Host3 Host4 Host5
CPU bin packing fitness function
Combines resource request bin packing with task type bin
packing
resBinpack = (cpuFit + memFit + networkFit) / 3.0
taskTypePack = numSameType / totTasks
fitness = resBinpack * 0.4 + taskTypePack * 0.6
Current fitness evaluator in Titus
Fenzo constraints
● Common constraints built-in
○ Host attribute value
○ Host with unique attribute value
○ Balance across hosts’ unique attribute value
● Can be used as “soft” or “hard” constraint
○ Soft evaluates to 0.0 - 1.0
○ Hard evaluates to true/false
● Additional custom plugins
○ Global constraint to send only GPU requiring tasks to GPU hosts
○ Global constraint to limit EC2 instance types to certain tasks
● CPU
● Memory
● Disk
● Ports
● Network bandwidth
● Scalar (used for GPU)
● Security groups and IP per container
Fenzo supported resources
Why is a task failing to launch?
Fenzo cluster autoscaling
Host 1 Host 2 Host 3 Host 4
Fenzo cluster autoscaling
Host 4Host 3Host 1
vs.
Host 1 Host 2
Host 2
Host 3 Host 4
Fenzo cluster autoscaling
Host 4Host 3Host 1
vs.
Host 1 Host 2
Host 2
Host 3 Host 4
Fenzo cluster autoscaling
● Threshold based
● Shortfall analysis based
Host 4Host 3Host 1
vs.
Host 1 Host 2
Host 2
Host 3 Host 4
Autoscaling multiple agent clusters
m4.4xlarge agents r3.8xlarge agents
Titus
Grouping agents by instance type let’s us autoscale them independently
Min
Desired
Max
Min
Desired
Max
Threshold based autoscaling
● Set up rules per agent attribute value
● Sample:
#Idle
hosts
Trigger down
scale
Trigger up
scale
min
max
Cluster Name Min Idle Max Idle Cooldown Secs
MemosyClstr 2 5 360
ComputeClstr 5 10 300
Shortfall analysis based scale up
● Rule-based scale up has a cool down period
○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis
○ Scale up happens regardless of cool down period
○ Remembers which tasks have already been covered
● Shortcoming: scale can be too aggressive for short
periods of time
Capacity guarantees
● Guarantee capacity for timely job starts
○ Mesos supports quotas, but, inadequate at this time
● Generally, optimize throughput for batch jobs and start
latency for service jobs
● Categorize by expected behavior
○ For example, some service style jobs may be less important
● Critical versus Flex (flexible) scheduling requirements
Capacity guarantees
Capacity guarantees
Critical
Flex
Quotas
Capacity guarantees
Critical
Flex
Critical
Flex
Resource
Allocation
Order
Quotas Prioritiesvs.
Capacity guarantees: hybrid view
AppC1
AppC2
AppC3
AppCN
AppF1
AppF2
AppFN
AppF3
Resource
Allocation
Order
Critical
Flex
Capacity guarantees: hybrid view
Tier Capacity = SUM (App1-cap + App2-cap + 
 + AppN-cap) + BUFFER
BUFFER:
● Accommodate some new or ad hoc jobs with no guarantees
● Red-black pushes of apps temporarily double app capacity
AppC1
AppC2
AppC3
AppCN
AppF1
AppF2
AppFN
AppF3
Resource
Allocation
Order
Critical
Flex
Capacity guarantees: hybrid view
Fenzo supports multi-tiered task
queues
Can have arbitrary number of tiers
Per tier DRF across multiple
queues
Tier 0
Tier 1
Sizing clusters for capacity guarantees
Tier 0:
Used
capacity
Idle
capacity
Autoscaled
Cluster min size (guaranteed capacity)
Cluster max Size
Tier 1:
Used
capacity
Autoscaled
Cluster desired size
Cluster max Size
(Idle size kept near zero)
Netflix container execution values
Netflix container execution values
● Consistent cloud infrastructure with VM’s
○ Virtualize and deeply re-use AWS features
● User and operator tooling common to VM’s
○ IPC and service discovery, telemetry and monitoring
○ Spinnaker integration for CI/CD
● Unique Features
○ Deep Amazon and Netflix infrastructure integration
○ VPC IP per container
○ Advanced security (sec groups, IAM Roles)
Elastic Network Interfaces (ENI)
AWS EC2 Instance
ENI0
IP0
IP1
IP2
IP3
ENI1
IP4
IP5
IP6
IP7
ENI2
IP8
IP9
IP10
IP11
ENI0
IP0
IP1
IP2
IP3
● Each EC2 instance
in VPC has 2 or
more ENIs
● Each ENI can have 2
or more IPs
● Security Groups are
set on the ENI
Network bandwidth isolation
Each container gets an IP on one of the ENIs
Linux tc policies used on virtual Ethernet
For both incoming and outgoing traffic
Bandwidth limited to the requested value
No bursting into unused bandwidth
GPU Enablement
Personalization and recommendations
● Deep learning with neural nets/mini batch
● Makes model training infrastructure self-service
Executor takes Scheduler resource definition
● Maps p2.8xl GPU’s using nvidia-docker-plugin
● Mounts drivers and devices into container
● Fine grain capacity guarantees
○ DRF adapted to elastic clusters
○ Preemptions to improve resource usage efficiency
○ Hierarchical sharing policies via h-DRF
○ Leveraging “Internal spot market”, aka the trough
● Onboarding new applications
○ Scale continues to grow
Ongoing, and future scheduling work
Elastic resource scheduling for
Netflix's scalable container
cloud
Sharma Podila, Andrew Spyker, Tomasz Bak
@podila @aspyker @tomaszbak1974
Questions?

More Related Content

PDF
Honest performance testing with NDBench
PDF
#NetflixEverywhere Global Architecture
PDF
NetflixOSS Open House Lightning talks
PPTX
EVCache at Netflix
PDF
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
PPTX
Tuning kafka pipelines
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PDF
From Three Nines to Five Nines - A Kafka Journey
Honest performance testing with NDBench
#NetflixEverywhere Global Architecture
NetflixOSS Open House Lightning talks
EVCache at Netflix
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
Tuning kafka pipelines
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
From Three Nines to Five Nines - A Kafka Journey

What's hot (20)

PDF
Application Caching: The Hidden Microservice (SAConf)
PDF
Prometheus and Thanos
PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
PDF
Kafka tiered-storage-meetup-2022-final-presented
PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
PPTX
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
PDF
Application Caching: The Hidden Microservice
PDF
EVCache & Moneta (GoSF)
PDF
Integrating Apache Pulsar with Big Data Ecosystem
PDF
Kubernetes Observability with Prometheus by Example
PPTX
Apache Pulsar First Overview
PDF
Running Java Applications inside Kubernetes with Nested Container Architectur...
PPTX
Arc305 how netflix leverages multiple regions to increase availability an i...
PDF
Resource Scheduling using Apache Mesos in Cloud Native Environments
PDF
Big data Argentina meetup 2020-09: Intro to presto on docker
PPTX
OpenStack HA
PDF
Migratory Workloads Across Clouds with Nomad
PDF
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
PDF
London HUG 14/4 - Deploying and Discovering at Scale with Consul and Nomad
PDF
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Application Caching: The Hidden Microservice (SAConf)
Prometheus and Thanos
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka tiered-storage-meetup-2022-final-presented
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Honest Performance Testing with "NDBench" (Vinay Chella, Netflix) | Cassandra...
Application Caching: The Hidden Microservice
EVCache & Moneta (GoSF)
Integrating Apache Pulsar with Big Data Ecosystem
Kubernetes Observability with Prometheus by Example
Apache Pulsar First Overview
Running Java Applications inside Kubernetes with Nested Container Architectur...
Arc305 how netflix leverages multiple regions to increase availability an i...
Resource Scheduling using Apache Mesos in Cloud Native Environments
Big data Argentina meetup 2020-09: Intro to presto on docker
OpenStack HA
Migratory Workloads Across Clouds with Nomad
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
London HUG 14/4 - Deploying and Discovering at Scale with Consul and Nomad
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Ad

Viewers also liked (11)

PDF
Exoscale: Pithos: your personal S3 object store on cassandra
PPTX
A Helmsman meets a Daughter of Troy - The introduction of Kubernetes and Cass...
PPTX
Intro to cluster scheduler for Linux containers
PDF
Netflix and Containers: Not A Stranger Thing
PPTX
Eron Wright - Flink Security Enhancements
PDF
Fasciculo de-cge-2017 (aplicativo pat)
PDF
The Five Stages of Cloud Native
PPTX
Cloud Native Application Framework
PPTX
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
PDF
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
PDF
Patterns of Cloud Native Architecture
Exoscale: Pithos: your personal S3 object store on cassandra
A Helmsman meets a Daughter of Troy - The introduction of Kubernetes and Cass...
Intro to cluster scheduler for Linux containers
Netflix and Containers: Not A Stranger Thing
Eron Wright - Flink Security Enhancements
Fasciculo de-cge-2017 (aplicativo pat)
The Five Stages of Cloud Native
Cloud Native Application Framework
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Patterns of Cloud Native Architecture
Ad

Similar to Netflix container scheduling talk at stanford final (20)

PDF
Prezo at-mesos con2015-final
PDF
Podila mesos con-northamerica_sep2017
PDF
Podila mesos con europe keynote aug sep 2016
PDF
Podila QCon SF 2016
PDF
NetflixOSS Meetup S6E1 - Titus & Containers
PDF
PyConIE 2017 Writing and deploying serverless python applications
PPTX
Introduction to Container Storage Interface (CSI)
PDF
Writing and deploying serverless python applications
PDF
Public Cloud Workshop
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
introduction to micro services
PDF
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
PDF
Netflix Data Benchmark @ HPTS 2017
PDF
Testing kubernetes and_open_shift_at_scale_20170209
PDF
Scheduling a fuller house - Talk at QCon NY 2016
PDF
Netflix Container Scheduling and Execution - QCon New York 2016
PDF
Kubernetes basics and hands on exercise
PPTX
EC2 BY RASHMI GR.pptx
PDF
PyConIT 2018 Writing and deploying serverless python applications
PDF
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
Prezo at-mesos con2015-final
Podila mesos con-northamerica_sep2017
Podila mesos con europe keynote aug sep 2016
Podila QCon SF 2016
NetflixOSS Meetup S6E1 - Titus & Containers
PyConIE 2017 Writing and deploying serverless python applications
Introduction to Container Storage Interface (CSI)
Writing and deploying serverless python applications
Public Cloud Workshop
Netflix Open Source Meetup Season 4 Episode 2
introduction to micro services
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
Netflix Data Benchmark @ HPTS 2017
Testing kubernetes and_open_shift_at_scale_20170209
Scheduling a fuller house - Talk at QCon NY 2016
Netflix Container Scheduling and Execution - QCon New York 2016
Kubernetes basics and hands on exercise
EC2 BY RASHMI GR.pptx
PyConIT 2018 Writing and deploying serverless python applications
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
medical staffing services at VALiNTRY
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Introduction to Artificial Intelligence
VVF-Customer-Presentation2025-Ver1.9.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Softaken Excel to vCard Converter Software.pdf
medical staffing services at VALiNTRY
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How to Choose the Right IT Partner for Your Business in Malaysia
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Migrate SBCGlobal Email to Yahoo Easily
Reimagine Home Health with the Power of Agentic AI​
Introduction to Artificial Intelligence

Netflix container scheduling talk at stanford final

  • 1. Elastic resource scheduling for Netflix's scalable container cloud Sharma Podila, Andrew Spyker, Tomasz Bak Feb 7th 2017
  • 2. Topics ● Motivations for containers on AWS EC2 ● Scheduling using Apache Mesos ● Fenzo deep dive ● Future plans
  • 3. Containers add to our VM infrastructure Already in VM’s have ... microservice driven, cloud native, CI/CD devops enabled, resilient, elastically scalable, environment
  • 4. Container Provides Innovation Velocity ● Iterative local development, deploy when ready ● Manage app and dependencies easily and completely ● Simpler way to express resources, let system manage
  • 5. Service Batch Sampling of container usage Media Encoding Digital Watermarking NodeJS UI Services Operations and General Stream Processing Reporting
  • 6. Sampling of realized container benefits ● Media Encoding - encoding research development time ○ VM’s platform to container platform - 1 month vs. 1 week ● Continuous Integration Testing ○ Build all Netflix codebases in hours ○ Saves development 100’s of hours of debugging ● Netflix API Re-architecture using NodeJS ○ Focus returns to app development ○ Provided reliable smaller instances ○ Simplifies, speeds test and deployment
  • 8. Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service ● Configurable message delivery guarantees ● Heterogeneous workloads ○ Real-time dashboarding, alerting ○ Anomaly detection, metric generation ○ Interactive exploration of streaming data Anomaly Detection
  • 9. Current Mantis usage ● At peak: ○ 2,300 EC2 instances of M3.2xlarge instances ● Peak of 900 concurrent jobs ● Peak of 5,200 concurrent containers ○ Trough of 4,000 containers ○ Job sizes range from 1 to 500 containers ● Mix of perpetual and interactive exploratory jobs ● Peak of 13 Million events / sec
  • 10. EC2 VPC VMVM TitusJob Control Containers App Cloud Platform (metrics, IPC, health) VMVM Batch Containers Eureka Edda Container deployment: Titus Atlas & Insight
  • 11. Current Titus usage #Containers (tasks) for the week of 11/7 in one of the regions ● Peak of ~1,800 instances ○ Mix of m4.4xl, r3.8xl, p2.8xl ○ ~800 instances at trough ● Mix of batch, stream processing, and some microservices
  • 12. Core architectural components AWS EC2 Apache Mesos Titus/Mantis Framework Fenzo Fenzo at https://github.com/Netflix/Fenzo Apache Mesos at http://mesos.apache.org/ Batch Job Mgr Service Job Mgr
  • 15. Motivation for a new Mesos scheduler ● Cloud native (cluster autoscaling) ● Customizable task placement optimizations ○ Mix of service, batch, and stream topologies
  • 16. What does a Mesos scheduler do? ● API for users to interact ● Mesos interaction via the driver ● Compute resource assignments for tasks
  • 17. What does a Mesos scheduler do? ● API for users to interact ● Be connected to Mesos via the driver ● Compute resource assignments for tasks ○ NetflixOSS Fenzo https://github.com/Netflix/Fenzo
  • 19. Scheduling optimizations Speed Accuracy First fit assignment Optimal assignment Real world trade-offs
  • 22. DC/Cloud operator Scheduling optimizations ● Bin packing ○ By resource usage ○ By job types ● Ease deployment of new agent AMIs ● Ease server maintenance and upgrades
  • 23. DC/Cloud operator Application owner ● Task locality, anti-locality (noisy neighbors?, etc.) ● Resource affinity ● Task balancing across racks/AZs/hosts Scheduling optimizations
  • 24. DC/Cloud operator Application owner Cost ● Save cloud footprint costs ● Right instance types ● Save power, cooling costs ● Does everything need to run right away? Scheduling optimizations
  • 25. DC/Cloud operator Application owner Cost Security Security aspects of multi-tenant applications on a host Scheduling optimizations
  • 26. DC/Cloud operator Application owner Cost Security Proceed quickly in the generally right direction, adapting to changes Scheduling optimizations
  • 27. ● Extensible ● Cloud native ● Ease of experimentation ● Scheduling decisions visibility Fenzo goals
  • 28. Fenzo scheduling strategy For each (ordered) task On each available host Validate hard constraints Score fitness and soft constraints Until score good enough, and A minimum #hosts evaluated Pick host with highest score
  • 29. Experimentation with Fenzo ● Abstractions of tasks and servers (VMs) ● Create various strategies with custom fitness functions and constraints ○ For example, dynamic task anti-locality ● “Good enough” can be dynamic ○ Based on pending task set size, task type, etc. ● Ordering of servers for allocation based on task type
  • 30. Experimentation with Fenzo Task runtime bin packing sample results Resource bin packing sample results
  • 31. Fitness function Vs. constraints ● Fitness: site policies ○ Bin packing for utilization, reduce fragmentation ○ Segregate hosts by task types, e.g., service Vs batch ● Constraints: user preferences ○ Resource affinity ○ Task locality ○ Balance tasks across racks or availability zones
  • 32. ● Degree of fitness, score of 0.0 - 1.0 ● Composable ○ Multiple weighted fitness functions ● Extensible ○ Combine existing ones with custom plugins Fitness evaluation
  • 33. CPU bin packing fitness function Fitness for Host1 Host2 Host3 Host4 Host5 fitness = usedCPUs / totalCPUs Host1 Host2 Host3 Host4 Host5
  • 34. Fitness for 0.25 0.5 0.75 1.0 0.0 Host1 Host2 Host3 Host4 Host5 fitness = usedCPUs / totalCPUs Host1 Host2 Host3 Host4 Host5 CPU bin packing fitness function
  • 35. Fitness for 0.25 0.5 0.75 1.0 0.0 ✔ Host1 Host2 Host3 Host4 Host5 fitness = usedCPUs / totalCPUs Host1 Host2 Host3 Host4 Host5 CPU bin packing fitness function
  • 36. Combines resource request bin packing with task type bin packing resBinpack = (cpuFit + memFit + networkFit) / 3.0 taskTypePack = numSameType / totTasks fitness = resBinpack * 0.4 + taskTypePack * 0.6 Current fitness evaluator in Titus
  • 37. Fenzo constraints ● Common constraints built-in ○ Host attribute value ○ Host with unique attribute value ○ Balance across hosts’ unique attribute value ● Can be used as “soft” or “hard” constraint ○ Soft evaluates to 0.0 - 1.0 ○ Hard evaluates to true/false ● Additional custom plugins ○ Global constraint to send only GPU requiring tasks to GPU hosts ○ Global constraint to limit EC2 instance types to certain tasks
  • 38. ● CPU ● Memory ● Disk ● Ports ● Network bandwidth ● Scalar (used for GPU) ● Security groups and IP per container Fenzo supported resources
  • 39. Why is a task failing to launch?
  • 40. Fenzo cluster autoscaling Host 1 Host 2 Host 3 Host 4
  • 41. Fenzo cluster autoscaling Host 4Host 3Host 1 vs. Host 1 Host 2 Host 2 Host 3 Host 4
  • 42. Fenzo cluster autoscaling Host 4Host 3Host 1 vs. Host 1 Host 2 Host 2 Host 3 Host 4
  • 43. Fenzo cluster autoscaling ● Threshold based ● Shortfall analysis based Host 4Host 3Host 1 vs. Host 1 Host 2 Host 2 Host 3 Host 4
  • 44. Autoscaling multiple agent clusters m4.4xlarge agents r3.8xlarge agents Titus Grouping agents by instance type let’s us autoscale them independently Min Desired Max Min Desired Max
  • 45. Threshold based autoscaling ● Set up rules per agent attribute value ● Sample: #Idle hosts Trigger down scale Trigger up scale min max Cluster Name Min Idle Max Idle Cooldown Secs MemosyClstr 2 5 360 ComputeClstr 5 10 300
  • 46. Shortfall analysis based scale up ● Rule-based scale up has a cool down period ○ What if there’s a surge of incoming requests? ● Pending requests trigger shortfall analysis ○ Scale up happens regardless of cool down period ○ Remembers which tasks have already been covered ● Shortcoming: scale can be too aggressive for short periods of time
  • 48. ● Guarantee capacity for timely job starts ○ Mesos supports quotas, but, inadequate at this time ● Generally, optimize throughput for batch jobs and start latency for service jobs ● Categorize by expected behavior ○ For example, some service style jobs may be less important ● Critical versus Flex (flexible) scheduling requirements Capacity guarantees
  • 51. Capacity guarantees: hybrid view AppC1 AppC2 AppC3 AppCN AppF1 AppF2 AppFN AppF3 Resource Allocation Order Critical Flex
  • 52. Capacity guarantees: hybrid view Tier Capacity = SUM (App1-cap + App2-cap + 
 + AppN-cap) + BUFFER BUFFER: ● Accommodate some new or ad hoc jobs with no guarantees ● Red-black pushes of apps temporarily double app capacity AppC1 AppC2 AppC3 AppCN AppF1 AppF2 AppFN AppF3 Resource Allocation Order Critical Flex
  • 53. Capacity guarantees: hybrid view Fenzo supports multi-tiered task queues Can have arbitrary number of tiers Per tier DRF across multiple queues Tier 0 Tier 1
  • 54. Sizing clusters for capacity guarantees Tier 0: Used capacity Idle capacity Autoscaled Cluster min size (guaranteed capacity) Cluster max Size Tier 1: Used capacity Autoscaled Cluster desired size Cluster max Size (Idle size kept near zero)
  • 56. Netflix container execution values ● Consistent cloud infrastructure with VM’s ○ Virtualize and deeply re-use AWS features ● User and operator tooling common to VM’s ○ IPC and service discovery, telemetry and monitoring ○ Spinnaker integration for CI/CD ● Unique Features ○ Deep Amazon and Netflix infrastructure integration ○ VPC IP per container ○ Advanced security (sec groups, IAM Roles)
  • 57. Elastic Network Interfaces (ENI) AWS EC2 Instance ENI0 IP0 IP1 IP2 IP3 ENI1 IP4 IP5 IP6 IP7 ENI2 IP8 IP9 IP10 IP11 ENI0 IP0 IP1 IP2 IP3 ● Each EC2 instance in VPC has 2 or more ENIs ● Each ENI can have 2 or more IPs ● Security Groups are set on the ENI
  • 58. Network bandwidth isolation Each container gets an IP on one of the ENIs Linux tc policies used on virtual Ethernet For both incoming and outgoing traffic Bandwidth limited to the requested value No bursting into unused bandwidth
  • 59. GPU Enablement Personalization and recommendations ● Deep learning with neural nets/mini batch ● Makes model training infrastructure self-service Executor takes Scheduler resource definition ● Maps p2.8xl GPU’s using nvidia-docker-plugin ● Mounts drivers and devices into container
  • 60. ● Fine grain capacity guarantees ○ DRF adapted to elastic clusters ○ Preemptions to improve resource usage efficiency ○ Hierarchical sharing policies via h-DRF ○ Leveraging “Internal spot market”, aka the trough ● Onboarding new applications ○ Scale continues to grow Ongoing, and future scheduling work
  • 61. Elastic resource scheduling for Netflix's scalable container cloud Sharma Podila, Andrew Spyker, Tomasz Bak @podila @aspyker @tomaszbak1974 Questions?