SlideShare a Scribd company logo
Self scaling
multi cloud
nomad workloads
Cloud Engineer @ Seaplane.io
@attachmentgenie
Bram
Vogelaar
We have all been there

What would it take
to keep the
platform running
(no matter what)?
And more importantly
.
Vertical vs Horizontal Scaling
Absorb additional pressure and pay for
what you need
The aim becomes 100% 99.999% uptime
Task Scheduler
Spec Sheet
Open-source tool for dynamic workload
scheduling
Batch, containerized, and non-containerized
applications.
Jobs written in (H)ashiCorp (C)onïŹguration
(L)anguage
https://www.nomadproject.io/
CODE EDITOR
Lets build awesome product
job "lorem-ipsum" {
group ”frontend" {
network {
port "http" { to = ”3000” }
}
service {
name = ”lorem"
port = ”http"
}
task "server" {
driver = "docker"
config {
image = ”cicero/lorem-ipsum:v1.0.0"
ports = ["http"]
}
}
CODE EDITOR
Expose it to the world
## traefik.yml
entryPoints:
web:
address: :80
providers:
file:
directory: /etc/traefik.d/dynamic.yml
watch: true
CODE EDITOR
Expose it to the world
## dynamic.yml
http:
routers:
lorem-ipsum:
rule: "Host(`lorem-ipsum.io`)"
service: lorem-ipsum
services:
lorem-ipsum:
loadBalancer:
servers:
- url: http://localhost:26489
CODE EDITOR
Sleep++ step1
job "lorem-ipsum" {
group ”frontend" {
count = 2
constraint {
operator = "distinct_hosts"
value = "true"
}
Your scheduler
should not be an
expensive systemd
Service Discovery + Service Mesh + Scheduler
Service Discovery + Service Mesh
Spec Sheet
Open-Source Service Discovery Tool
Build-in KV store
Service Mesh tool
Uses watchers to have near instant feedback
loops
https://www.consul.io/
CODE EDITOR
Exposing as a service
job "lorem-ipsum" {
group ”frontend" {
service {
name = "www"
tags = ["metrics",”traefik.enable=true”]
port = "http"
check {
name = "alive"
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
CODE EDITOR
Expose it to the world
## dynamic.yml
http:
routers:
lorem-ipsum:
rule: "Host(`lorem-ipsum.io`)"
service: lorem-ipsum
services:
lorem-ipsum:
loadBalancer:
servers:
- url: http://lorem-ipsum.service.consul:26489
CODE EDITOR
Sleep++ lazy++
## traefik.yml
entryPoints:
web:
address: :80
providers:
consulCatalog:
defaultRule: "Host(`{{ .Name }}.lorem-ipsum.io`)"
endpoint:
address: 'http://localhost:8500'
exposedByDefault: false
CODE EDITOR
Dynamic Metric Scraping
# prometheus.yml
- job_name: nomad_workload
scrape_interval: 60s
consul_sd_configs:
- server: localhost:8500
tags:
- metrics
CODE EDITOR
Sleep++ Autoscaling
job "lorem-ipsum" {
group ”frontend" {
scaling {
enabled = false
min = 1
max = 20
policy {
cooldown = "20s"
check "avg_sessions" {
source = "prometheus"
query = "sum(traefik_entrypoint_open_connections{entrypoint="lorem-ipsum"}"
strategy "target-value" {
target = 5
}
}
}
}
Data where your users are
CODE EDITOR
Nomad Federation
server {
enabled = true
bootstrap_expect = 3
server_join {
authoritative_region = “mgmt”
retry_join = [
"1.1.1.1", “1.1.1.2”, “1.1.1.3”,
“2.2.2.1”,”2.2.2.2”,”2.2.2.3”
]
retry_max = 3
retry_interval = "15s"
}
}
CODE EDITOR
Sleep++ Separate mgmt from workload
server {
enabled = true
bootstrap_expect = 3
server_join {
authoritative_region = “mgmt”
retry_join = [ "1.1.1.1", “1.1.1.2”, “1.1.1.3”,
“2.2.2.1”,”2.2.2.2”,”2.2.2.3” ]
retry_max = 3
retry_interval = "15s"
}
}
CODE EDITOR
Sleep++ Ingress
resource "ns1_record" "www" {
zone = “lorem-ipsum.io”
domain = "www.lorem-ipsum.io"
type = "CNAME"
ttl = 60
regions {
name = "usa"
meta = {
country = "US"
}
}
answers {
answer = "usa.lorem-ipsum.io"
region = "usa"
}
}
CODE EDITOR
Identify workload region
external_labels:
datacenter: "%{::trusted.extensions.pp_datacenter}"
region: "%{::trusted.extensions.pp_region}" # ${IATA_CODES}
env: "%{::trusted.extensions.pp_environment}"
What would it take
to keep it running
while losing an
entire region
Limit the blast radius
Spread your risks!
US
AWS
GCP
DO
AZ
EU
AWS
GCP
DO
AZ
Not all vendors are
created equally
VPCs vs Network mesh, Base Images etc,etc
CODE EDITOR
Lazy++ Redirect to other regions
## traefik.yml
entryPoints:
web:
address: :80
providers:
consul:
endpoints:
address: 'http://localhost:8500'
consul kv put traefik/http/services/www/loadbalancer/servers/0/url "-X PUT
-d 'http://usa.lorem-ipsum.io'"
CODE EDITOR
Sleep++ Redirect to other regions
curl http://127.0.0.1:8500/v1/query 
--request POST 
--data @- << EOF
{
"Name": "www",
"Service": {
"Service": "www",
"Failover": {
"Datacenters": ["us", "eu"]
}
}
}
EOF
CODE EDITOR
Sleep++ Redirect to other regions
curl http://127.0.0.1:8500/v1/query 
--request POST 
--data @- << EOF
{
"Name": "www",
"Service": {
"Service": "www",
"Failover": {
"NearestN": 2,
"Datacenters": ["us", "eu"]
}
}
}
EOF
Follow the sun?
Follow the moon?
Discover nearest?
Sleep++ Consul watchers
Thank You
bram@attachmentgenie.com | @attachmentgenie | https://www.slideshare.net/attachmentgenie

More Related Content

PDF
甹 Go èȘžèš€æ‰“é€ ć€šć°æ©Ÿć™š Scale 架構
PPT
Monitoring using Prometheus and Grafana
PPTX
Modern Scheduling for Modern Applications with Nomad
PDF
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
PDF
nter-pod Revolutions: Connected Enterprise Solution in Oracle EPM Cloud
PDF
AWS EC2
PDF
Terraform introduction
PDF
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
甹 Go èȘžèš€æ‰“é€ ć€šć°æ©Ÿć™š Scale 架構
Monitoring using Prometheus and Grafana
Modern Scheduling for Modern Applications with Nomad
PVS-Studio and Continuous Integration: TeamCity. Analysis of the Open RollerC...
nter-pod Revolutions: Connected Enterprise Solution in Oracle EPM Cloud
AWS EC2
Terraform introduction
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi

Similar to Self scaling Multi cloud nomad workloads (20)

PDF
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
PDF
Tools for Solving Performance Issues
 
PDF
Taming event-driven software via formal verification
PDF
Incrementalism: An Industrial Strategy For Adopting Modern Automation
PDF
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
PPTX
A Node.js Developer's Guide to Bluemix
PDF
Micro app-framework - NodeLive Boston
PDF
Micro app-framework
PDF
Zone IDA Proc
PPTX
A framework for self-healing applications – the path to enable auto-remediation
PDF
Monitoring InfluxEnterprise
KEY
fog or: How I Learned to Stop Worrying and Love the Cloud
PDF
using Mithril.js + postgREST to build and consume API's
PPTX
Learn How to Use a Time Series Platform to Monitor All Aspects of Your Kubern...
ZIP
OneTeam Media Server
PDF
Easy Cloud Native Transformation using HashiCorp Nomad
PPTX
A miaƂo być tak... bez wycieków
PDF
Evolving your Data Access with MongoDB Stitch
PDF
Mastering Spring Boot's Actuator with Madhura Bhave
KEY
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Tools for Solving Performance Issues
 
Taming event-driven software via formal verification
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
A Node.js Developer's Guide to Bluemix
Micro app-framework - NodeLive Boston
Micro app-framework
Zone IDA Proc
A framework for self-healing applications – the path to enable auto-remediation
Monitoring InfluxEnterprise
fog or: How I Learned to Stop Worrying and Love the Cloud
using Mithril.js + postgREST to build and consume API's
Learn How to Use a Time Series Platform to Monitor All Aspects of Your Kubern...
OneTeam Media Server
Easy Cloud Native Transformation using HashiCorp Nomad
A miaƂo być tak... bez wycieków
Evolving your Data Access with MongoDB Stitch
Mastering Spring Boot's Actuator with Madhura Bhave
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Ad

More from Bram Vogelaar (20)

PPTX
Terraforming your Platform Engineering organisation.pptx
PDF
Secure second days operations with Boundary and Vault.pdf
PDF
Cost reconciliation in a post CMDB world
PDF
Scraping metrics for fun and profit
PDF
10 things i learned building nomad-packs
PDF
10 things I learned building Nomad packs
PDF
Easy Cloud Native Transformation with Nomad
PDF
Uncomplicated Nomad
PDF
Observability; a gentle introduction
PDF
Running Trusted Payload with Nomad and Waypoint
PDF
Securing Prometheus exporters using HashiCorp Vault
PDF
CICD using jenkins and Nomad
PDF
Bootstrapping multidc observability stack
PDF
Running trusted payloads with Nomad and Waypoint
PDF
Gamification of Chaos Testing
PDF
Puppet and the HashiStack
PDF
Bootstrapping multidc observability stack
PPTX
Creating Reusable Puppet Profiles
PDF
Gamification of Chaos Testing
PDF
Autoscaling with hashi_corp_nomad
Terraforming your Platform Engineering organisation.pptx
Secure second days operations with Boundary and Vault.pdf
Cost reconciliation in a post CMDB world
Scraping metrics for fun and profit
10 things i learned building nomad-packs
10 things I learned building Nomad packs
Easy Cloud Native Transformation with Nomad
Uncomplicated Nomad
Observability; a gentle introduction
Running Trusted Payload with Nomad and Waypoint
Securing Prometheus exporters using HashiCorp Vault
CICD using jenkins and Nomad
Bootstrapping multidc observability stack
Running trusted payloads with Nomad and Waypoint
Gamification of Chaos Testing
Puppet and the HashiStack
Bootstrapping multidc observability stack
Creating Reusable Puppet Profiles
Gamification of Chaos Testing
Autoscaling with hashi_corp_nomad
Ad

Recently uploaded (20)

PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Nekopoi APK 2025 free lastest update
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Digital Strategies for Manufacturing Companies
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
L1 - Introduction to python Backend.pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia
Online Work Permit System for Fast Permit Processing
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms I-SECS-1021-03
Upgrade and Innovation Strategies for SAP ERP Customers
Nekopoi APK 2025 free lastest update
Softaken Excel to vCard Converter Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Operating system designcfffgfgggggggvggggggggg
Digital Strategies for Manufacturing Companies
Navsoft: AI-Powered Business Solutions & Custom Software Development
VVF-Customer-Presentation2025-Ver1.9.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

Self scaling Multi cloud nomad workloads

  • 2. Cloud Engineer @ Seaplane.io @attachmentgenie Bram Vogelaar
  • 3. We have all been there

  • 4. What would it take to keep the platform running (no matter what)? And more importantly
.
  • 6. Absorb additional pressure and pay for what you need
  • 7. The aim becomes 100% 99.999% uptime
  • 8. Task Scheduler Spec Sheet Open-source tool for dynamic workload scheduling Batch, containerized, and non-containerized applications. Jobs written in (H)ashiCorp (C)onïŹguration (L)anguage https://www.nomadproject.io/
  • 9. CODE EDITOR Lets build awesome product job "lorem-ipsum" { group ”frontend" { network { port "http" { to = ”3000” } } service { name = ”lorem" port = ”http" } task "server" { driver = "docker" config { image = ”cicero/lorem-ipsum:v1.0.0" ports = ["http"] } }
  • 10. CODE EDITOR Expose it to the world ## traefik.yml entryPoints: web: address: :80 providers: file: directory: /etc/traefik.d/dynamic.yml watch: true
  • 11. CODE EDITOR Expose it to the world ## dynamic.yml http: routers: lorem-ipsum: rule: "Host(`lorem-ipsum.io`)" service: lorem-ipsum services: lorem-ipsum: loadBalancer: servers: - url: http://localhost:26489
  • 12. CODE EDITOR Sleep++ step1 job "lorem-ipsum" { group ”frontend" { count = 2 constraint { operator = "distinct_hosts" value = "true" }
  • 13. Your scheduler should not be an expensive systemd Service Discovery + Service Mesh + Scheduler
  • 14. Service Discovery + Service Mesh Spec Sheet Open-Source Service Discovery Tool Build-in KV store Service Mesh tool Uses watchers to have near instant feedback loops https://www.consul.io/
  • 15. CODE EDITOR Exposing as a service job "lorem-ipsum" { group ”frontend" { service { name = "www" tags = ["metrics",”traefik.enable=true”] port = "http" check { name = "alive" type = "http" path = "/health" interval = "10s" timeout = "2s" } }
  • 16. CODE EDITOR Expose it to the world ## dynamic.yml http: routers: lorem-ipsum: rule: "Host(`lorem-ipsum.io`)" service: lorem-ipsum services: lorem-ipsum: loadBalancer: servers: - url: http://lorem-ipsum.service.consul:26489
  • 17. CODE EDITOR Sleep++ lazy++ ## traefik.yml entryPoints: web: address: :80 providers: consulCatalog: defaultRule: "Host(`{{ .Name }}.lorem-ipsum.io`)" endpoint: address: 'http://localhost:8500' exposedByDefault: false
  • 18. CODE EDITOR Dynamic Metric Scraping # prometheus.yml - job_name: nomad_workload scrape_interval: 60s consul_sd_configs: - server: localhost:8500 tags: - metrics
  • 19. CODE EDITOR Sleep++ Autoscaling job "lorem-ipsum" { group ”frontend" { scaling { enabled = false min = 1 max = 20 policy { cooldown = "20s" check "avg_sessions" { source = "prometheus" query = "sum(traefik_entrypoint_open_connections{entrypoint="lorem-ipsum"}" strategy "target-value" { target = 5 } } } }
  • 20. Data where your users are
  • 21. CODE EDITOR Nomad Federation server { enabled = true bootstrap_expect = 3 server_join { authoritative_region = “mgmt” retry_join = [ "1.1.1.1", “1.1.1.2”, “1.1.1.3”, “2.2.2.1”,”2.2.2.2”,”2.2.2.3” ] retry_max = 3 retry_interval = "15s" } }
  • 22. CODE EDITOR Sleep++ Separate mgmt from workload server { enabled = true bootstrap_expect = 3 server_join { authoritative_region = “mgmt” retry_join = [ "1.1.1.1", “1.1.1.2”, “1.1.1.3”, “2.2.2.1”,”2.2.2.2”,”2.2.2.3” ] retry_max = 3 retry_interval = "15s" } }
  • 23. CODE EDITOR Sleep++ Ingress resource "ns1_record" "www" { zone = “lorem-ipsum.io” domain = "www.lorem-ipsum.io" type = "CNAME" ttl = 60 regions { name = "usa" meta = { country = "US" } } answers { answer = "usa.lorem-ipsum.io" region = "usa" } }
  • 24. CODE EDITOR Identify workload region external_labels: datacenter: "%{::trusted.extensions.pp_datacenter}" region: "%{::trusted.extensions.pp_region}" # ${IATA_CODES} env: "%{::trusted.extensions.pp_environment}"
  • 25. What would it take to keep it running while losing an entire region Limit the blast radius
  • 27. Not all vendors are created equally VPCs vs Network mesh, Base Images etc,etc
  • 28. CODE EDITOR Lazy++ Redirect to other regions ## traefik.yml entryPoints: web: address: :80 providers: consul: endpoints: address: 'http://localhost:8500' consul kv put traefik/http/services/www/loadbalancer/servers/0/url "-X PUT -d 'http://usa.lorem-ipsum.io'"
  • 29. CODE EDITOR Sleep++ Redirect to other regions curl http://127.0.0.1:8500/v1/query --request POST --data @- << EOF { "Name": "www", "Service": { "Service": "www", "Failover": { "Datacenters": ["us", "eu"] } } } EOF
  • 30. CODE EDITOR Sleep++ Redirect to other regions curl http://127.0.0.1:8500/v1/query --request POST --data @- << EOF { "Name": "www", "Service": { "Service": "www", "Failover": { "NearestN": 2, "Datacenters": ["us", "eu"] } } } EOF
  • 31. Follow the sun? Follow the moon? Discover nearest? Sleep++ Consul watchers
  • 32. Thank You bram@attachmentgenie.com | @attachmentgenie | https://www.slideshare.net/attachmentgenie