SlideShare a Scribd company logo
MIGRATORY WORKLOADS
ACROSS CLOUDS WITH NOMAD
Phil Watts

DevOps Artificer
PROBLEM STATEMENT
“FLEXING BETWEEN THE CLOUDS”
▸ Goals of Virtualization seem universally applicable
▸ !(Vendor Lock-in)
▸ Not all workloads are valued equally
=>=>
IT Magic Anywhere
SUCCESS CRITERIA
WIN CONDITIONS
‣ Availability of compute resources are independent of the cloud provider
‣ Batch jobs can be allocated based on point in time cost metrics
‣ Work segregation based on compliance qualifications
TOOLCHAIN
MY CURRENT “FAVORITE” TOYS
Resources
Image Creation
Infrastructure Provisioning
Service Discovery
Scheduler
Driver
DEFINITIONS: RESOURCE CONTEXT
THE BANE OF TECHNICAL UNDERSTANDING (AKA WORDS):
▸ Region: The isolation boundary of a Nomad Cluster
▸ Datacenter: Low latency, high bandwidth, private network
▸ Resources: The available capacity provided by a node
Region Datacenter
AWS Continental AWS_Region
GCE Continental GCE_Region
Azure Location Location
Region Datacenter
AWS Global AWS_Region
GCE Global GCE_Region
Azure Global Sets of Locations
Common / Comfortable Pattern Ideal Pattern
NOMAD ARCHITECTURE - SINGLE REGION VIEW
BDFL FOR WORKLOAD DECISIONS
‣ In Nomad, Datacenter can speak to Region Aware Servers
‣ Datacenters don’t need to be the same platform
‣ Default Region is “global”
ARCHITECTURE OF SOLUTION
▸ Nomad Clients potentially
provide Resources for Jobs
▸ Communication between
Datacenters may need
secured
▸ Nodes run a Consul Agent
and Nomad Client
▸ Nomad Servers “Bin Pack”
task onto nodes
THREE PICTURES OF THE SAME THING
Single Region / Multi DataCenter

(different Clouds)
DEFINITIONS: TASK CONTEXT
WORDS: THE SEQUEL
▸ Task: Desired state declaration of workload
▸ Constraints: Rules limiting where a job can run
▸ Evaluations: Queued request to compare desired and present state of work
over the region
▸ Caused by a state change event
▸ Job Completion
▸ Node Addiction/Subtraction
▸ Job Scheduled
▸ Allocations: Mapping of tasks to resources within constraints
JOB TYPES: SERVICE
KEEPING THE SITE UP
▸ Long running jobs that should always be available
▸ Scheduling decisions favor QoS
▸ Example: Ensuring a front end web service is always
available
JOB TYPES: BATCH
WHAT TO DO WITH ALL THIS DATA?
▸ A set of work spanning a few minutes to a few days
▸ Based on the Berkley Sparrow Two Choices model
▸ http://people.eecs.berkeley.edu/~keo/publications/sosp13-
final17.pdf
▸ Probes a set of nodes which meet constraints and sends work
to the "least loaded" nodes
▸ Example: Tasks to manipulate a queue of data when present
JOB TYPES: SYSTEM
KEEPING THE LIGHTS ON
▸ A unique job type used to declare jobs which should run on
every node which meets the job constraints
▸ Are re-evaluated whenever a node joins the cluster
▸ Example: distributing common tasks, which can benefit from
rolling updates, job updates, service discovery
NOMAD SCHEDULING INTERNALS
GETTING FROM WORK AND RESOURCES TO
ACCOMPLISHMENTS
▸ Evaluations read the Job Specification
and find constraints
▸ Evaluation Brokers maintain the pending
queue, priority, and at least once delivery
▸ Schedulers submit an Allocation Plan,
evaluated for feasibility, followed by
priority
▸ Allocations set jobs against resources
LIKE TETRIS FOR WORKLOADS
▸ Tasks require resources
▸ Nodes have “dimensions” of
resources
▸ Allocation fits Tasks inside Nodes
BIN PACKING
TASK GROUPS
PREVENTING TASK SEPARATION ANXIETY
▸ Task Groups allow for multiple Jobs to require they are
scheduled on the same node
▸ Are created implicitly for single tasks in isolation
▸ Can be used to enforce compliance elements required to run
together
▸ Example: Requiring log shipping co-processes
CONSTRAINTS
JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD
▸ Job Constraints limit the resources available for a particular
job group
▸ Constraints can map workloads directly to Customized
Hardware such as AWS Placement Groups
CONSTRAINTS AND COMPLIANCE
SATISFYING COMPLIANCE REQUIREMENTS
▸ Constraints on datacenter can be used for Data
Isolation inside National Boundaries.
▸ Healthcare workload that must stay within the EU
▸ Metadata attributes can allow for custom
declarations.
▸ Eg. PCI DSS Compliance:
▸ Maintain network firewall
▸ Protect run Anti-Malware/Anti-Virus
▸ Monitor and log access
▸ Regularly test security systems and procedures.
1 job "sample_service" {
2 ...
3 meta {
4 pci_dss = true
5 }
6 group "webservice" {
7 constraint {
8 attribute = "meta.pci_dss"
9 value = true
10 }
11 }
12 }
Constraint Snippet
CONSTRAINTS: SATISFYING SPECIAL NEEDS
DIFFERENT THINGS ARE DIFFERENT
▸ Not all platforms are created equal
▸ Platform attributes for specifying Cloud Platforms
1 job "sample_service" {
2 ...
3 constraint {
4 attribute = attr.platform
5 value = aws
6 }
7 }
▸ ${attr.platform} = aws

May be relevant if you need

Float (GPU) processing, which 

AWS offers and GCE doesn’t
RAW EXECS
CHEKHOV’S TASK DRIVER
▸ Unconstrained, Un-isolated, Disabled by Default
“IT SEEMS TO BE A DEEP INSTINCT IN HUMAN BEINGS FOR
MAKING EVERYTHING COMPULSORY THAT ISN'T FORBIDDEN”
▸ Runs as the user Nomad is running as
▸ Disabled by default
client {
options = {
driver.raw_exec.enable = 1
}
}
~Robert A. Heinlein
OPERATOR INTERACTION
RELIABLE MAGIC = OPERATIONS
1 $ nomad run jobfile.nomad -address=$nomad_server
‣ Operators schedule jobs against a
server
‣ Nomad figures out how/where/when
to run tasks
‣ Complex solution through iteration
Phil Watts

DevOps Artificer @ REĀN Cloud
@pwattstbd
github.com/marsupermammal
phil@reancloud.com
www.reancloud.com
import "os"
func presentation() {
os.Exit(0)
}

More Related Content

PDF
Hashicorp Nomad
PDF
San Francisco HashiCorp User Group at GitHub
PDF
London HUG 8/3 - Nomad
PDF
London HUG 14/4 - Deploying and Discovering at Scale with Consul and Nomad
PDF
Nomad Multi-Cloud
PDF
HashiCorp at Just Eat
PDF
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
PDF
Elastic HBase on Mesos - HBaseCon 2015
Hashicorp Nomad
San Francisco HashiCorp User Group at GitHub
London HUG 8/3 - Nomad
London HUG 14/4 - Deploying and Discovering at Scale with Consul and Nomad
Nomad Multi-Cloud
HashiCorp at Just Eat
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
Elastic HBase on Mesos - HBaseCon 2015

What's hot (20)

PDF
HBaseCon 2013: Apache HBase Operations at Pinterest
PPTX
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
PPTX
HBaseCon 2015: Multitenancy in HBase
PDF
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
PPTX
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
PPTX
Redis Labs and SQL Server
PDF
HBaseCon 2015: Elastic HBase on Mesos
PDF
Redis for horizontally scaled data processing at jFrog bintray
PPTX
Zero-downtime Hadoop/HBase Cross-datacenter Migration
PDF
Micro-batching: High-performance writes
PDF
Using Redis at Facebook
KEY
Handling Redis failover with ZooKeeper
PPTX
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
PDF
Mesosphere and Contentteam: A New Way to Run Cassandra
PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
Seastar Summit 2019 vectorized.io
PPTX
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
PPTX
Microsoft Azure Media Services
PDF
Automation of Hadoop cluster operations in Arm Treasure Data
PDF
Redis Day Keynote Salvatore Sanfillipo Redis Labs
HBaseCon 2013: Apache HBase Operations at Pinterest
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
HBaseCon 2015: Multitenancy in HBase
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
Redis Labs and SQL Server
HBaseCon 2015: Elastic HBase on Mesos
Redis for horizontally scaled data processing at jFrog bintray
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Micro-batching: High-performance writes
Using Redis at Facebook
Handling Redis failover with ZooKeeper
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
Mesosphere and Contentteam: A New Way to Run Cassandra
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
Seastar Summit 2019 vectorized.io
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
Microsoft Azure Media Services
Automation of Hadoop cluster operations in Arm Treasure Data
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Ad

Similar to Migratory Workloads Across Clouds with Nomad (20)

PDF
Migratory Workloads Across Clouds with Nomad
PDF
Best Practices for Application Management in AWS
PDF
How to Design a Backend for IoT
PDF
Application modernization patterns with apache kafka, debezium, and kubernete...
PDF
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
PDF
Devops (start walking in the same direction) by ops
PDF
AWS Architecture Fundamentals - Denver
PPTX
Building a Just-in-Time Application Stack for Analysts
PDF
Kubernetes training
PDF
ecs-presentation
PDF
Java in the Cloud : PaaS Platforms in Comparison
PDF
Java in the Cloud : PaaS Platforms in Comparison
PDF
A real-life account of moving 100% to a public cloud
PPTX
Unified Situational Awareness Dashboard for Spacecraft Operations: an inte...
PDF
MongoDB World 2019: Why NBCUniversal Migrated to MongoDB Atlas
PDF
CIlib 2.0: Rethinking Implementation
PPTX
Declare Victory with Big Data
PDF
Serverless Chicago - Datomic Cloud and AWS AppSync - April 26 2018
PPTX
Continuous Deployment with Amazon Web Services by Carlos Conde
PDF
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
Migratory Workloads Across Clouds with Nomad
Best Practices for Application Management in AWS
How to Design a Backend for IoT
Application modernization patterns with apache kafka, debezium, and kubernete...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
Devops (start walking in the same direction) by ops
AWS Architecture Fundamentals - Denver
Building a Just-in-Time Application Stack for Analysts
Kubernetes training
ecs-presentation
Java in the Cloud : PaaS Platforms in Comparison
Java in the Cloud : PaaS Platforms in Comparison
A real-life account of moving 100% to a public cloud
Unified Situational Awareness Dashboard for Spacecraft Operations: an inte...
MongoDB World 2019: Why NBCUniversal Migrated to MongoDB Atlas
CIlib 2.0: Rethinking Implementation
Declare Victory with Big Data
Serverless Chicago - Datomic Cloud and AWS AppSync - April 26 2018
Continuous Deployment with Amazon Web Services by Carlos Conde
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
Ad

Recently uploaded (20)

PPTX
A Presentation on Touch Screen Technology
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
project resource management chapter-09.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Web App vs Mobile App What Should You Build First.pdf
A Presentation on Touch Screen Technology
Enhancing emotion recognition model for a student engagement use case through...
A comparative study of natural language inference in Swahili using monolingua...
A Presentation on Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
project resource management chapter-09.pdf
1 - Historical Antecedents, Social Consideration.pdf
DP Operators-handbook-extract for the Mautical Institute
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Zenith AI: Advanced Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Heart disease approach using modified random forest and particle swarm optimi...
Web App vs Mobile App What Should You Build First.pdf

Migratory Workloads Across Clouds with Nomad

  • 1. MIGRATORY WORKLOADS ACROSS CLOUDS WITH NOMAD Phil Watts
 DevOps Artificer
  • 2. PROBLEM STATEMENT “FLEXING BETWEEN THE CLOUDS” ▸ Goals of Virtualization seem universally applicable ▸ !(Vendor Lock-in) ▸ Not all workloads are valued equally =>=> IT Magic Anywhere
  • 3. SUCCESS CRITERIA WIN CONDITIONS ‣ Availability of compute resources are independent of the cloud provider ‣ Batch jobs can be allocated based on point in time cost metrics ‣ Work segregation based on compliance qualifications
  • 4. TOOLCHAIN MY CURRENT “FAVORITE” TOYS Resources Image Creation Infrastructure Provisioning Service Discovery Scheduler Driver
  • 5. DEFINITIONS: RESOURCE CONTEXT THE BANE OF TECHNICAL UNDERSTANDING (AKA WORDS): ▸ Region: The isolation boundary of a Nomad Cluster ▸ Datacenter: Low latency, high bandwidth, private network ▸ Resources: The available capacity provided by a node Region Datacenter AWS Continental AWS_Region GCE Continental GCE_Region Azure Location Location Region Datacenter AWS Global AWS_Region GCE Global GCE_Region Azure Global Sets of Locations Common / Comfortable Pattern Ideal Pattern
  • 6. NOMAD ARCHITECTURE - SINGLE REGION VIEW BDFL FOR WORKLOAD DECISIONS ‣ In Nomad, Datacenter can speak to Region Aware Servers ‣ Datacenters don’t need to be the same platform ‣ Default Region is “global”
  • 7. ARCHITECTURE OF SOLUTION ▸ Nomad Clients potentially provide Resources for Jobs ▸ Communication between Datacenters may need secured ▸ Nodes run a Consul Agent and Nomad Client ▸ Nomad Servers “Bin Pack” task onto nodes THREE PICTURES OF THE SAME THING Single Region / Multi DataCenter
 (different Clouds)
  • 8. DEFINITIONS: TASK CONTEXT WORDS: THE SEQUEL ▸ Task: Desired state declaration of workload ▸ Constraints: Rules limiting where a job can run ▸ Evaluations: Queued request to compare desired and present state of work over the region ▸ Caused by a state change event ▸ Job Completion ▸ Node Addiction/Subtraction ▸ Job Scheduled ▸ Allocations: Mapping of tasks to resources within constraints
  • 9. JOB TYPES: SERVICE KEEPING THE SITE UP ▸ Long running jobs that should always be available ▸ Scheduling decisions favor QoS ▸ Example: Ensuring a front end web service is always available
  • 10. JOB TYPES: BATCH WHAT TO DO WITH ALL THIS DATA? ▸ A set of work spanning a few minutes to a few days ▸ Based on the Berkley Sparrow Two Choices model ▸ http://people.eecs.berkeley.edu/~keo/publications/sosp13- final17.pdf ▸ Probes a set of nodes which meet constraints and sends work to the "least loaded" nodes ▸ Example: Tasks to manipulate a queue of data when present
  • 11. JOB TYPES: SYSTEM KEEPING THE LIGHTS ON ▸ A unique job type used to declare jobs which should run on every node which meets the job constraints ▸ Are re-evaluated whenever a node joins the cluster ▸ Example: distributing common tasks, which can benefit from rolling updates, job updates, service discovery
  • 12. NOMAD SCHEDULING INTERNALS GETTING FROM WORK AND RESOURCES TO ACCOMPLISHMENTS ▸ Evaluations read the Job Specification and find constraints ▸ Evaluation Brokers maintain the pending queue, priority, and at least once delivery ▸ Schedulers submit an Allocation Plan, evaluated for feasibility, followed by priority ▸ Allocations set jobs against resources
  • 13. LIKE TETRIS FOR WORKLOADS ▸ Tasks require resources ▸ Nodes have “dimensions” of resources ▸ Allocation fits Tasks inside Nodes BIN PACKING
  • 14. TASK GROUPS PREVENTING TASK SEPARATION ANXIETY ▸ Task Groups allow for multiple Jobs to require they are scheduled on the same node ▸ Are created implicitly for single tasks in isolation ▸ Can be used to enforce compliance elements required to run together ▸ Example: Requiring log shipping co-processes
  • 15. CONSTRAINTS JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD ▸ Job Constraints limit the resources available for a particular job group ▸ Constraints can map workloads directly to Customized Hardware such as AWS Placement Groups
  • 16. CONSTRAINTS AND COMPLIANCE SATISFYING COMPLIANCE REQUIREMENTS ▸ Constraints on datacenter can be used for Data Isolation inside National Boundaries. ▸ Healthcare workload that must stay within the EU ▸ Metadata attributes can allow for custom declarations. ▸ Eg. PCI DSS Compliance: ▸ Maintain network firewall ▸ Protect run Anti-Malware/Anti-Virus ▸ Monitor and log access ▸ Regularly test security systems and procedures. 1 job "sample_service" { 2 ... 3 meta { 4 pci_dss = true 5 } 6 group "webservice" { 7 constraint { 8 attribute = "meta.pci_dss" 9 value = true 10 } 11 } 12 } Constraint Snippet
  • 17. CONSTRAINTS: SATISFYING SPECIAL NEEDS DIFFERENT THINGS ARE DIFFERENT ▸ Not all platforms are created equal ▸ Platform attributes for specifying Cloud Platforms 1 job "sample_service" { 2 ... 3 constraint { 4 attribute = attr.platform 5 value = aws 6 } 7 } ▸ ${attr.platform} = aws
 May be relevant if you need
 Float (GPU) processing, which 
 AWS offers and GCE doesn’t
  • 18. RAW EXECS CHEKHOV’S TASK DRIVER ▸ Unconstrained, Un-isolated, Disabled by Default “IT SEEMS TO BE A DEEP INSTINCT IN HUMAN BEINGS FOR MAKING EVERYTHING COMPULSORY THAT ISN'T FORBIDDEN” ▸ Runs as the user Nomad is running as ▸ Disabled by default client { options = { driver.raw_exec.enable = 1 } } ~Robert A. Heinlein
  • 19. OPERATOR INTERACTION RELIABLE MAGIC = OPERATIONS 1 $ nomad run jobfile.nomad -address=$nomad_server ‣ Operators schedule jobs against a server ‣ Nomad figures out how/where/when to run tasks ‣ Complex solution through iteration
  • 20. Phil Watts
 DevOps Artificer @ REĀN Cloud @pwattstbd github.com/marsupermammal phil@reancloud.com www.reancloud.com import "os" func presentation() { os.Exit(0) }