SlideShare a Scribd company logo
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes for Developers Meetup – May 13, 2019
Mike Tougeron –
Senior Site Reliability Engineer @
Adobe
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$ whoami && id | grep Adobe
 Mike Tougeron
 Senior Site Reliability Engineer @ Adobe
 Twitter: @mtougeron
 Started using Kubernetes in 2015
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Agenda
 Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure
 Lesson 1: Communication, Teamwork & Training
 Lesson 2: Code to production pipelines
 Lesson 3: The ABCs of Production apps
 Lesson 4: Multi-cloud challenges
 Lesson 5: Knowing your application
 Lesson 6: Metrics based monitoring
 Lesson 7: Take a deep breath
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
High Traffic
350 billion requests
a day
Latency
<50ms @ 95th
percentile
Huge Datasets
Billions of objects to
store
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Adobe Advertising Cloud’s Kubernetes Overview
 ~225 worker nodes; growing to ~300
in May/June
 6 OpenStack data centers across 4
regions
 Running on VMs
 No persistent storage
 No autoscaling; “fixed” footprint
 Smaller but growing
 3 AWS clusters in us-east-1
 Running on m5d.12xlarge ec2 instances
 EBS volumes for persistent storage
 Uses cluster-autoscaler
 Autoscaling events many times per hour
 Prometheus for monitoring
 Dozens of Machine Learning
workloads in AWS
 Reason for frequent autoscaling events
 Cluster updates done via new Image
and rolling update of existing nodes
 Updates are deployed approx every
4-6 weeks
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 1: Communication, Teamwork & Training
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Communication: Reaching large, distributed teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Teamwork: Who’s responsible for what?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Abstraction vs Experts
 Need understanding of core
resources but also need easy
onboarding
 Pair programming training sessions
 Remove need for boiler plate
 Don’t duplicate efforts by avoiding
abstraction
 Don’t abstract to the point where
you’re not using Kubernetes
 kubectl should *not* be your
entrypoint
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 2: Code to production pipelines
De
v
Pull
Request
maste
r
Unit
testin
g
merge
Deplo
y bot
Production
Integration
testing
Insert your steps here!
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Tools to help build application resources
 Helm (templating and/or tiller)
 Kustomize
 Kapitan
 and more…
 We use a combination of Helm
templating for infrastructure/3rd-party
and Kustomize for application teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$> helm template --name opa --namespace opa --values ./values/globals.yaml
--values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud-
opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa
versus
$> ./build.py --chart adcloud-opa --cluster mgmt
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 3: The ABCs of Production
 HorizontalPodAutoscaler
 PodDisruptionBudget
 "DevOps"
 Cluster Upgrades
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
HorizontalPodAutoscaler
 Easily scale on CPU or Memory usage
 Also able to scale on custom metrics like
http_requests from Ingress resources
 Don’t set replicas in your Deployment
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
PodDisruptionBudget
 Not the same thing as a Deployment
strategy
 Helps prevent taking down so many Pods
that the application is overwhelmed
 Can set by minAvailable or
maxUnavailable by number or
percentage
 Good for helping keep quorum
 Doesn’t apply to manual deletions
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
DevOps
 Expertise/specialists
 But empowerment & speed
 Things get lost in shuffle
 Everyone can do everything; aka don’t forget your guardrails
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
deny[msg] {
input.request.kind.kind = "Ingress"
input.request.operation = "CREATE"
host = input.request.object.spec.rules[_].host
ingress = ingresses[other_ns][other_ingress]
other_ns != input.request.namespace
ingress.spec.rules[_].host = host
msg = sprintf("invalid ingress host %q (conflicts with
%v/%v)", [host, other_ns, other_ingress])
}
patch[patchCode] {
isCreateOrUpdate
input.request.kind.kind == "Ingress"
not hasAnnotation(input.request.object,
"kubernetes.io/ingress.class")
patchCode =
makeAnnotationPatch("add",
"kubernetes.io/ingress.class", "nginx-
internal", "")
}
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Cluster Upgrades - Blue/Green or Canary?
 Who really has the hardware to run a 2nd full
Kubernetes cluster in their datacenter?
 Public cloud is easier, but you still have cost
considerations
 Are the application team(s) able to handle
deploying to a 2nd mirrored cluster?
 Does it make more sense to run N workers of a
different version/config for a period of time?
 Do you have the visibility into the cluster to know
how one performs vs the other?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 4: Multi-Cloud Challenges
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Multiple code-bases but consistent infrastructure
 Packer – Shared modular code base, different builders
 Terraform – Separate but closely aligned code bases
 Puppet – Same code base
 Helm – Same modular code base
 Leverage templating to build the same deployments for
different (and future) clouds
 Re-use, re-use, re-use!
 Lab environments in all clouds
 OSSIA for HV/rack metadata for region/zone
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 5: Knowing your applications
 Seems like an obvious statement but it’s easy to forget to
think about
 Kubernetes brings advantages, but not all the ones that
bare metal and virtual machines bring out of the box
 Think about how your app actually functions
 Service Discovery
 Persistent Storage
 Shared Storage (e.g. replication, sharding, etc)
 Scheduling / Restarting
 Networking Ingress / Egress
 Think about how your app is going to handle the way
Kubernetes does things
https://imgur.com/gallery/B4D7Lf1
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as Deployment (What We Did)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Oops…yeah Touge, I think something is wrong…
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as StatefulSet (What We Should Have Done)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 6: Metrics-Based Monitoring
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 7: Take a deep breath
 Same team so we all learn & fix together
 Experience has been enlightening &
engineers have had fun
 Teams already onboarded are moving
faster than before
 Dev cycle to production is faster as we
integrate more automated testing
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Thanks!
Slides: https://touge.me/k8s-7lessons-meetup
Mike Tougeron
Email: tougeron@adobe.com
Twitter: @mtougeron
Images from https://stock.adobe.com

More Related Content

PPTX
Kubernetes - 7 lessons learned from 7 data centers in 7 months
PPTX
When 7 Seconds Per Page isn't Fast Enough - Developer Focus on Akamai Tooling
PPTX
Mailchimp to the Edge - Establishing Akamai Best Practices at Mailchimp
PDF
An intro to serverless and OpenWhisk for Kafka users
PDF
Automated Governance - Continous Lifecycle 2019 - Schlomo Schapiro
PDF
IBM Bluemix saves the game
PDF
Compliant by Default - Digitaler Wandel - 14.08.2019 - Schlomo Schapiro
PDF
CI CD using AWS Developer Tools @ AWS Community Day Chennai 2019
Kubernetes - 7 lessons learned from 7 data centers in 7 months
When 7 Seconds Per Page isn't Fast Enough - Developer Focus on Akamai Tooling
Mailchimp to the Edge - Establishing Akamai Best Practices at Mailchimp
An intro to serverless and OpenWhisk for Kafka users
Automated Governance - Continous Lifecycle 2019 - Schlomo Schapiro
IBM Bluemix saves the game
Compliant by Default - Digitaler Wandel - 14.08.2019 - Schlomo Schapiro
CI CD using AWS Developer Tools @ AWS Community Day Chennai 2019

Similar to Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup) (20)

PPTX
Cloud native java workshop
PDF
MicroShed Testing
PPTX
Journey to Cloud: Fast Track to Azure
PPTX
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
PDF
So you want to provision a test environment...
PDF
Azure fundamentals
PDF
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
PDF
A Toolchain for Lean Architecture at American Airlines
PDF
Multi cloud costs how to leverage insight and avoid overspending
PDF
Mobile cloud2020
PDF
React Native App Development in 2023-Tips to Practice.pdf
PDF
Writing Applications at Cloud Scale
PDF
Ensure the integration of Microservices with Consumer Driven Contracts
PPTX
PDF
PDF
Emerging Cloud Migration Approaches
PDF
Flutter App Performance Optimization_ Tips and Techniques.pdf
PDF
How IBM is helping developers win the race to innovate with next-gen cloud se...
PDF
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
PPTX
CI/CD Best Practices for Your DevOps Journey
Cloud native java workshop
MicroShed Testing
Journey to Cloud: Fast Track to Azure
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
So you want to provision a test environment...
Azure fundamentals
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
A Toolchain for Lean Architecture at American Airlines
Multi cloud costs how to leverage insight and avoid overspending
Mobile cloud2020
React Native App Development in 2023-Tips to Practice.pdf
Writing Applications at Cloud Scale
Ensure the integration of Microservices with Consumer Driven Contracts
Emerging Cloud Migration Approaches
Flutter App Performance Optimization_ Tips and Techniques.pdf
How IBM is helping developers win the race to innovate with next-gen cloud se...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
CI/CD Best Practices for Your DevOps Journey
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
Ad

Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup)

  • 1. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Kubernetes - 7 lessons learned from 7 data centers in 7 months Kubernetes for Developers Meetup – May 13, 2019 Mike Tougeron – Senior Site Reliability Engineer @ Adobe
  • 2. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $ whoami && id | grep Adobe  Mike Tougeron  Senior Site Reliability Engineer @ Adobe  Twitter: @mtougeron  Started using Kubernetes in 2015
  • 3. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Agenda  Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure  Lesson 1: Communication, Teamwork & Training  Lesson 2: Code to production pipelines  Lesson 3: The ABCs of Production apps  Lesson 4: Multi-cloud challenges  Lesson 5: Knowing your application  Lesson 6: Metrics based monitoring  Lesson 7: Take a deep breath
  • 4. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 High Traffic 350 billion requests a day Latency <50ms @ 95th percentile Huge Datasets Billions of objects to store
  • 5. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Adobe Advertising Cloud’s Kubernetes Overview  ~225 worker nodes; growing to ~300 in May/June  6 OpenStack data centers across 4 regions  Running on VMs  No persistent storage  No autoscaling; “fixed” footprint  Smaller but growing  3 AWS clusters in us-east-1  Running on m5d.12xlarge ec2 instances  EBS volumes for persistent storage  Uses cluster-autoscaler  Autoscaling events many times per hour  Prometheus for monitoring  Dozens of Machine Learning workloads in AWS  Reason for frequent autoscaling events  Cluster updates done via new Image and rolling update of existing nodes  Updates are deployed approx every 4-6 weeks
  • 6. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
  • 7. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 1: Communication, Teamwork & Training
  • 8. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Communication: Reaching large, distributed teams
  • 9. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Teamwork: Who’s responsible for what?
  • 10. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Abstraction vs Experts  Need understanding of core resources but also need easy onboarding  Pair programming training sessions  Remove need for boiler plate  Don’t duplicate efforts by avoiding abstraction  Don’t abstract to the point where you’re not using Kubernetes  kubectl should *not* be your entrypoint
  • 11. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 2: Code to production pipelines De v Pull Request maste r Unit testin g merge Deplo y bot Production Integration testing Insert your steps here!
  • 12. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Tools to help build application resources  Helm (templating and/or tiller)  Kustomize  Kapitan  and more…  We use a combination of Helm templating for infrastructure/3rd-party and Kustomize for application teams
  • 13. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $> helm template --name opa --namespace opa --values ./values/globals.yaml --values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud- opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa versus $> ./build.py --chart adcloud-opa --cluster mgmt
  • 14. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 3: The ABCs of Production  HorizontalPodAutoscaler  PodDisruptionBudget  "DevOps"  Cluster Upgrades
  • 15. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 HorizontalPodAutoscaler  Easily scale on CPU or Memory usage  Also able to scale on custom metrics like http_requests from Ingress resources  Don’t set replicas in your Deployment
  • 16. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 PodDisruptionBudget  Not the same thing as a Deployment strategy  Helps prevent taking down so many Pods that the application is overwhelmed  Can set by minAvailable or maxUnavailable by number or percentage  Good for helping keep quorum  Doesn’t apply to manual deletions
  • 17. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 DevOps  Expertise/specialists  But empowerment & speed  Things get lost in shuffle  Everyone can do everything; aka don’t forget your guardrails
  • 18. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 deny[msg] { input.request.kind.kind = "Ingress" input.request.operation = "CREATE" host = input.request.object.spec.rules[_].host ingress = ingresses[other_ns][other_ingress] other_ns != input.request.namespace ingress.spec.rules[_].host = host msg = sprintf("invalid ingress host %q (conflicts with %v/%v)", [host, other_ns, other_ingress]) } patch[patchCode] { isCreateOrUpdate input.request.kind.kind == "Ingress" not hasAnnotation(input.request.object, "kubernetes.io/ingress.class") patchCode = makeAnnotationPatch("add", "kubernetes.io/ingress.class", "nginx- internal", "") }
  • 19. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Cluster Upgrades - Blue/Green or Canary?  Who really has the hardware to run a 2nd full Kubernetes cluster in their datacenter?  Public cloud is easier, but you still have cost considerations  Are the application team(s) able to handle deploying to a 2nd mirrored cluster?  Does it make more sense to run N workers of a different version/config for a period of time?  Do you have the visibility into the cluster to know how one performs vs the other?
  • 20. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 4: Multi-Cloud Challenges
  • 21. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Multiple code-bases but consistent infrastructure  Packer – Shared modular code base, different builders  Terraform – Separate but closely aligned code bases  Puppet – Same code base  Helm – Same modular code base  Leverage templating to build the same deployments for different (and future) clouds  Re-use, re-use, re-use!  Lab environments in all clouds  OSSIA for HV/rack metadata for region/zone
  • 22. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 5: Knowing your applications  Seems like an obvious statement but it’s easy to forget to think about  Kubernetes brings advantages, but not all the ones that bare metal and virtual machines bring out of the box  Think about how your app actually functions  Service Discovery  Persistent Storage  Shared Storage (e.g. replication, sharding, etc)  Scheduling / Restarting  Networking Ingress / Egress  Think about how your app is going to handle the way Kubernetes does things https://imgur.com/gallery/B4D7Lf1
  • 23. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as Deployment (What We Did) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 24. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Oops…yeah Touge, I think something is wrong…
  • 25. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as StatefulSet (What We Should Have Done) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 26. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 6: Metrics-Based Monitoring
  • 27. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 7: Take a deep breath  Same team so we all learn & fix together  Experience has been enlightening & engineers have had fun  Teams already onboarded are moving faster than before  Dev cycle to production is faster as we integrate more automated testing
  • 28. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Thanks! Slides: https://touge.me/k8s-7lessons-meetup Mike Tougeron Email: tougeron@adobe.com Twitter: @mtougeron Images from https://stock.adobe.com

Editor's Notes

  • #5: Adobe Advertising Cloud allows you to manage video, display, and search advertising across traditional TV and digital formats.
  • #7: ./deploy-ami.py master --context aws-lab
  • #9: Repeat, repeat, repeat There's always a medium that someone doesn't read even if they are supposed to Shout it from the mountain top Still drives me nuts
  • #10: Deploybot deploys yaml after being committed to git Team A wrote app, Team X had failure, Who gets alerts? Assumptions made by all parties involved Same type of problem with Registry server All boils down to lack of communication
  • #11: Don’t have good answer for everyone Balance is key to success
  • #12: Crucial to success Slow pipeline slows down adoption & Creates friction Easy pipeline creates the “that’s it?” question far too often :)
  • #20: We chose canary  -  app teams are not far enough to support cross-cluster LB
  • #21: Most data warehousing and analytics processing happens in AWS Bidding and ad serving then happen in via one of our six Openstack regions throughout the world Allows us the best of both worlds Burstable compute and storage when we need it Cheap, fast, low-latency compute that the majority of our workload needs
  • #22: We re-used much of the AWS code, and adapted it to be modular based on the target cloud Consistency across clusters and clouds Write once, target OSSIA – Open Stack Simple Inventory API Written in-house by Mykola Moglyenko Allows us to tag pods by their physical location in the cage, and make decisions that evenly spread out workloads Adobe will be open-sourcing this tool this spring
  • #23: Does a fixed hostname make a difference? For example zookeeper How does the app/service save its state? In memory or on disk? What about cluster data? Is it sharded? Replicated? How well does it handle rescheduling? How do other applications or teams access the app/service?
  • #24: How many people have run an elasticsearch cluster, or at least know about elasticsearch? We followed a blog post to set it up in K8s. Not a bad thing! We just didn’t think in a kubernetes way It looked like this. This lived in our AWS cluster, where our ML jobs causing a lot of auto-scaling up and down Fair amount of volatility When we first deployed it, it worked! Then we upgraded our nodes, which meant draining and replacing them one at a time Lots of app rescheduling Lots of autoscaler activity
  • #25: While deploying new worker images to our nodes, we noticed this happening to elasticsearch Everything was suddenly in CLBO Unassigned primary and replicas When we got things back up, we found we had lost 7% of our data (this was in dev)
  • #26: Converted es-master deployment to a StatefulSet Makes sure that master nodes are gracefully removed and re-added, without impacting quorum Adjusted cluster deployment scripts Respect the pod disruption budget for longer timeouts Pre-cordon nodes Increase size of cluster before draining nodes Disabled the cluster-autoscaler (so the cluster will stay inflated)