SlideShare a Scribd company logo
HASHICORP
Taming the modern public clouds with Nomad
Diptanu Gon Choudhury
@diptanu
Atmosphere 2016 - Diptanu Choudhury - Taming the public clouds with nomad
Atmosphere 2016 - Diptanu Choudhury - Taming the public clouds with nomad
HASHICORP
Evolution of compute infrastructure
1995 2000 2015
HASHICORP
Evolution of compute infrastructure
HASHICORP
Evolution of compute infrastructure
Global Public Cloud
AWS - US-West-2 AWS - US-East-1
GCP - US-Central-1
Private Clouds Private Clouds
HASHICORP
Challenges of the modern cloud
10s of 1000s of compute nodes to manage
Compute clusters are spread across the globe
Static and offline partitioning of clusters are no longer efficient
HASHICORP
Challenges of the modern cloud
Heterogenous API for accessing compute infrastructure
Heterogenous primitives for managing network, secrets, etc
HASHICORP
Evolution of application architecture
SOA and Micro Services are replacing monoliths
Distributed Systems are the new normal
HASHICORP
Challenges in running modern services
Orchestrated deployment and rollback strategies
More modes of failures
Operator
Datacenter
Skywalker Vader Leia Solo
Operator
Datacenter
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
Skywalker Vader Leia Solo
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Vader Leia Solo
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Vader Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
FFVader
Operator
Datacenter
RUBY PYTHON
PYTHON
PYTHON
GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
FFVader
PYTHON
PYTHON
PYTHON
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.5
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Randomly kills applications
Vader
PYTHON
PYTHON
PYTHON
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.9
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Rebuilt on 04/20/2016
Vader
PYTHON
PYTHON
PYTHON
Operator
Datacenter
RUBY GOLANG
GOLANG
GOLANG
GOLANG
NODE
Skywalker Leia Solo
RUBY
VADER
LEIA
SOLO
192.168.1.4
192.168.1.9
192.168.1.7
192.168.1.253
88:45:13:B6:87:C4
94:CE:4F:C8:54:C3
CA:9A:3D:7F:8B:CB
72:30:9C:0D:1E:74
Rebuilt on 04/20/2016
Vader
PYTHON
PYTHON
PYTHON
This does not scale
HASHICORP
Cluster Schedulers to the rescue
Decouple Work from Resources
Better Quality of Service
Higher Resource Utilization
Nomad
HASHICORP
Multi-Datacenter
Multi-Region
Flexible Workloads
Job Priorities
Bin Packing
Large Scale
Operationally Simple
HASHICORP
Nomad as Cluster Scheduler
Bin Packing
Job Queueing
Over-Subscription
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Nomad as the Cluster Scheduler
Abstraction
API Contracts
Standardization
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Nomad as the Cluster Scheduler
Priorities
Resource Isolation
Pre-emption
Higher Resource Utilization
Decouple Work from Resources
Better Quality of Service
HASHICORP
Job Specification
Declares what to run
HASHICORP
example.nomad
# Define our simple redis job
job "redis" {
# Run only in us-east-1
datacenters = ["us-east-1"]
# Define the single redis task using Docker
task "redis" {
driver = "docker"
config {
image = "redis:latest"
}
resources {
cpu = 500 # Mhz
memory = 256 # MB
network {
mbits = 10
dynamic_ports = ["redis"]
}
}
}
}
HASHICORP
Job Specification
Nomad determines where and
manages how to run
HASHICORP
Job Specification
Abstract work from resources
HASHICORP
Supports multiple Clouds, DCs and Regions
Resources across DCs are presented as single pool
Developers can target multiple datacenter in the same job file
Unified interface for developers across clouds
HASHICORP
Unified interface across hybrid clouds
AWS GCP Azure
On-Prem
DC
Nomad
Job Spec
HASHICORP
Single Region Architecture
SERVER SERVER SERVER
CLIENT CLIENT CLIENT
DC1 DC2 DC3
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
FORWARDING
RPC RPC RPC
HASHICORP
Multi Region Architecture
SERVER SERVER SERVER
FOLLOWER LEADER FOLLOWER
REPLICATION
FORWARDING
REPLICATION
REGION B GOSSIP
REPLICATION REPLICATION
FORWARDING
REGION FORWARDING
REGION A
SERVER
FOLLOWER
SERVER SERVER
LEADER FOLLOWER
Nomad
HASHICORP
Region is Isolation Domain
1-N Datacenters Per Region
Flexibility to do 1:1 (Consul)
Scheduling Boundary
Data Model
ALLOCATION
JOB
EVALUATION
NODE
Evaluation ~= State
Change
Evaluations
Create / Update / Delete Job
Node Up / Node Down
Allocation Failed
Evaluations
SCHEDULER
func(Evaluation) => []AllocationUpdates
Evaluations
SCHEDULER
func(Evaluation) => []AllocationUpdates
Service, Batch, System
HASHICORP
Scheduler Architecture
Concurrent and optimistic scheduling
Event Driven invocation of schedulers
No head of line blocking for different type of workloads
HASHICORP
Client Architecture
Broad OS Support
Host Fingerprinting
Pluggable Drivers
HASHICORP
Drivers
Execute Tasks
Provide Resource Isolation
HASHICORP
Containerized
Virtualized
Standalone
Docker
Qemu / KVM
Java Jar
Static Binaries
Rocket
HASHICORP
Containerized
Virtualized
Standalone
Docker
Rocket
Windows Server Containers
Qemu / KVM
Hyper-V
Xen
Java Jar
Static Binaries
C#
HASHICORP
Maintainance Primitives
First class support for doing maintenance on nodes
Drain allocations running on a node
nomad node-drain -enable 149cc920
Are you sure you want to enable drain mode for node "149cc920"? [y/N]
HASHICORP
Service Discovery Aware
Allows developers to define services exposed by a job
Keep services and checks synced
HASHICORP
example.nomad
job "redis" {
task "redis" {
………
service {
name = “binstore”
tags = [“env:staging”, “stack:beta”]
port = “http”
check {
name = “binstore-http”
type = “http”
path = “/status”
interval = “30s”
timeout = “2s”
}
}
…………
}
}
HASHICORP
System Job Scheduler
Runs a job on every node on the cluster
Great for running monitoring, logging, auditing software
HASHICORP
Log Management
Takes care of rotating logs of services
Log forwarding coming soon
Atmosphere 2016 - Diptanu Choudhury - Taming the public clouds with nomad
HASHICORP
Thanks!
https://github.com/hashicorp/nomad
https://www.nomadproject.io/

More Related Content

PDF
Serverless Multi Region Cache Replication
PDF
OpenCloud - A Research Cloud
PDF
Hybird Cloud - An adoption roadmap
PDF
Spark: Interactive To Production
PPTX
AWS guerrilla orchestration
PDF
Kylin and Druid Presentation
PDF
Tableapp architecture migration story for GCPUG.TW
PPTX
Stacking up with OpenStack: Building for High Availability
Serverless Multi Region Cache Replication
OpenCloud - A Research Cloud
Hybird Cloud - An adoption roadmap
Spark: Interactive To Production
AWS guerrilla orchestration
Kylin and Druid Presentation
Tableapp architecture migration story for GCPUG.TW
Stacking up with OpenStack: Building for High Availability

What's hot (16)

PDF
From AWS to GCP, TABLEAPP Architecture Story
PDF
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
PDF
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
PPTX
Session 03 data_migration_at_scale_by_sameer
PPTX
Randall Hunt - AWS Midwest Community Day Keynote
PDF
EclairJS = Node.Js + Apache Spark
PPTX
Persistent Storage for Containerized Applications
PPTX
goto; London: Keeping your Cloud Footprint in Check
PDF
Oleksandr Nahirniak "Microservices delivery pipeline with .NET Core, Docker a...
PDF
Cassandra @ Yahoo Japan | Cassandra Summit 2016
PPTX
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
PDF
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
PPTX
The Meta of Hadoop - COMAD 2012
PDF
Dataflow in 104corp - AWS UserGroup TW 2018
PPTX
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
PDF
2016 08-30 Kubernetes talk for Waterloo DevOps
From AWS to GCP, TABLEAPP Architecture Story
GCPLA Meetup Workshop - Migration from a Legacy Infrastructure to the Cloud
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Session 03 data_migration_at_scale_by_sameer
Randall Hunt - AWS Midwest Community Day Keynote
EclairJS = Node.Js + Apache Spark
Persistent Storage for Containerized Applications
goto; London: Keeping your Cloud Footprint in Check
Oleksandr Nahirniak "Microservices delivery pipeline with .NET Core, Docker a...
Cassandra @ Yahoo Japan | Cassandra Summit 2016
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
The Meta of Hadoop - COMAD 2012
Dataflow in 104corp - AWS UserGroup TW 2018
How we Auto Scale applications based on CPU with Kubernetes at M6Web?
2016 08-30 Kubernetes talk for Waterloo DevOps
Ad

Similar to Atmosphere 2016 - Diptanu Choudhury - Taming the public clouds with nomad (20)

PPTX
Dragonflow Austin Summit Talk
PPTX
OpenStack Dragonflow shenzhen and Hangzhou meetups
PPTX
Hybrid Cloud and Hyper Cloud
PPTX
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
PDF
Cloud Strategies for a modern hybrid datacenter - Dec 2015
PDF
AWS and VMware: How to Architect and Manage Hybrid Environments
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
PPTX
HA and DR for Cloud Workloads
PDF
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
PPTX
StrongLoop Overview
PPTX
Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016
PDF
WSO2Con ASIA 2016: WSO2 Cloud Strategy Update
PDF
WSO2 Cloud Strategy Update
PDF
Multicloud as the Next Generation of Cloud Infrastructure
PDF
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
PDF
Zero to Serverless in 60s - Anywhere
PPTX
DockerCon EU 2015: Using Docker and SDN for telco-grade applications
PPTX
Dragonflow 01 2016 TLV meetup
PPTX
Couchbase Data Pipeline
Dragonflow Austin Summit Talk
OpenStack Dragonflow shenzhen and Hangzhou meetups
Hybrid Cloud and Hyper Cloud
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Cloud Strategies for a modern hybrid datacenter - Dec 2015
AWS and VMware: How to Architect and Manage Hybrid Environments
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
HA and DR for Cloud Workloads
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
StrongLoop Overview
Running Neutron at Scale - Gal Sagie & Eran Gampel - OpenStack Day Israel 2016
WSO2Con ASIA 2016: WSO2 Cloud Strategy Update
WSO2 Cloud Strategy Update
Multicloud as the Next Generation of Cloud Infrastructure
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Zero to Serverless in 60s - Anywhere
DockerCon EU 2015: Using Docker and SDN for telco-grade applications
Dragonflow 01 2016 TLV meetup
Couchbase Data Pipeline
Ad

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Atmosphere 2016 - Diptanu Choudhury - Taming the public clouds with nomad