SlideShare a Scribd company logo
You got a couple
Microservices, now what?
Adding SRE to DevOps
Gonzalo Maldonado - MustWin
Microservice Honeymoon
^ Your microservice saved your homepage
^ Everyone loves working on the microservice
2 months later
^ Is it still a microservice?
^ Why are we adding new stuff to the monolith?
Can we get rid of ticket driven deployments?
^ What makes deploying a microservice so hard?
^ Where can we run this?
^ Monoliths seemed easier to maintain!
^ Datacenter 4.0
^ Dude, where's my container?
^ The promised land
Sysadmin -> DevOps -> SRE
^ The SRE-cret Sauce
^ Resource & Container Management (Schedulers)
^ Service Discovery (Consul, Skydns & Etcd)
TOC
—Sysadmin -> DevOps -> SRE
—Microservice Honeymoon
—2 months later
—Meanwhile your team is doing ticket driven
deployments.
—The SRE-cret Sauce
—References
Sysadmin -> DevOps -> SRE
—SysAdmin: Manages 1 or 2 services manually.
—DevOps Team: Manages ~10 services semi-
programmatically.
—SRE Team: Manages 100-1K services fully
programmatically.
Sysadmin -> DevOps -> SRE (Tech Stack)
—SysAdmin: Bash, Perl or Python Scripts
—DevOps Team: Chef, Puppet
—SRE Team: Mesos, Swarm, Kubernetes, Consul,
Vault
We don't have 100 services, why should we care
about the SRE tech stack?
Because this stack:
—Saves your team time configuring and deploying a
service
—Allows your engineering team to grow (a single
engineer will be able to manage
a couple dozen services)
We don't have 100 services, why should we care
about the SRE tech stack?
Because this stack:
—It prevents having to rewrite your infrastructure
code as your app
scales
—It gives you elastic resources (Saves you money
on aws).
We don't have 100 services, why should we care
about the SRE tech stack?
—Because it makes deploying Microservices as
easy as getting a heroku app up (and you used to
love microservices).
When doing Microservices gets
hard
The Microservice
Honeymoon: how a
microservice saved your
homepage
Microservices are awesome.
The Microservice Honeymoon: how a microservice
saved your homepage
—Your page loads decreased from 3 seconds to
20ms (Go is so fast!)
The Microservice Honeymoon: how a microservice
saved your homepage
—Hacker News spikes are no longer a big deal
(we're elastic!)
The Microservice Honeymoon: how a microservice
saved your homepage
—Everyone loves working on the Microservice (It's
only 500 lines!)
2 months later...
2 months later...
—If it has 2K lines of code, is it still a microservice?
2 months later...
—Why are people still adding stuff to the monolith?
—The code is already there and they didn't want
to rewrite it (duh.)
—Debugging things is getting harder (You need to
test in multiple
places)
—Getting a new microservice to prod is hard! (!
This.)
Why is creating new Microservices so hard now?
(monoliths felt easier)
"Awesome analogy by @timallenwagner: monolithic
architecture=carrying a 7ft beach ball,
microservice=carrying 200 loose marbles"
Why is creating new Microservices so hard now?
(monoliths felt easier)
—Configuration Management (You have to repeat
recipes)
—Service-inter-dependency-updates (You can't
change a service address
or port without affecting other services)
—Credentials cannot be shared
—Snowflake Runtime Environments (Can't run
node.js code on the JVM box)
Meanwhile, your team is doing ticket driven
deployments
—Deploys have become more complicated, when
there was only a Monolith,
you only had one deploy, and one box.
Meanwhile, your team is doing ticket driven
deployments
—It has gotten to a point, where your team has
decided they "need a
ticket" for each deploy
Where can we run this? Your Sys Admin asks...
—If you're typing apt-get to get a new environment
up, you're doing
something wrong.
—Chef, Puppet, Ansible are good replacements, but
there's something
better you probably already use on your dev
machine.
Your Datacenter has to
change
Datacenter 1.0 1
"How do we use these machines?"
"Can we automate?"
"How can we integrate?"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
Datacenter 2.0 1
"We need bigger computers"
"We need a microservice"
"We need a SysAdmin"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
Datacenter 3.0 1
"We need some VMS."
"We need microservices"
"We need IT"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
Datacenter 3.5 1
"We have a lot of VMs"
"We have lots of microservices"
"We need DevOps"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
Datacenter 3.5 1
"We need to manage our VMs"
"We need to manage our
microservices"
"We need SREs"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
You already heard about docker and why
using containers that share OS
resources is more efficient
than using full virtual machines. But what
else does docker give
you?
Dude, where is my container?
Virtual Machines vs Docker
Dude, where is my container?
What else does docker give you?
* Contained instances (You can run multiple
runtimes on one box)
* Incremental images. (You can use an existing
image as a base)
* Immutable Instances (Your images are stateless)
And this gets us to The Lean Staging
$ git commit -am "The new cool feature"
The Lean Staging
$ git commit -am "The new cool feature"
$ git push
The Lean Staging
$ git commit -am "The new cool feature"
$ git push
Running CI ...........................
The Lean Staging
$ git commit -am "The new cool feature"
$ git push
Running CI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
CI done. Your branch is available at http://super-tesla.thunderdomes.co
What do we need to get
there?
We need Service Reliability
Engineers
What do we need to get
there?
And what are those SRE guys
going to build to achieve that?
What do we need to get there?
What's the SRE-cret sauce?
—Code
—Servers or The Cloud
—A CI service
—A deployment system
What do we need to get there?
Aka. The SRE-cret sauce
—Code (We already have that)
—Servers or The Cloud (Pick AWS, GCP or Azure)
—A CI service (Pick Jenkins, Travis or CircleCI)
—Deployment & Monitoring systems ! Lets Focus
on this
The SRE-cret sauce.
Those Deployment systems will do the following
a. Container Management
b. Service Discovery
c. Configuration Management
d. Authentication & Authorization
This is what you want. Now that you have
discovered Docker, you want to us it on production.
While you could run all your containers on a single
box, this would
prevent you to scale horizontally, and you would
need downtime to add
more memory to that box.
Container Management
Like many things on the tech world, Google was one of the
early
adopters of Schedulers. Schedulers are systems in charge
of
managing the cluster resources by telling applications when
to run.
Container Management: Enter the scheduler
Architectures presented in the white-paper
concerning
Google's Omega Scheduler.
Container Management: Scheduler Options
—Mesosphere DCOS (Based on Apache Mesos)
—Docker Swarm
—Kubernetes
—Nomad
Each scheduler option has it's own pros and
cons and you will need to
pick the one that better fits your team needs.
Container Management: Scheduler Options[^2]
More info here:
https://medium.com/@mustwin/a-handy-guide-to-
the-mesos-kubernetes-swarm-jungle-
ad6bc086c736#.6ji95fm7e
Service Discovery
Service discovery is a mechanism in when adding a
new service instance,
the rest of the services detect this change
automatically.
Service Discovery: Options
Load balancer + Highly available Storage.
Using a load balancer like NGINX/HAProxy + etcd
you can update service
registrations dynamically. The Load Balancer takes
care of DNS
resolutions.
Service Discovery: Options
Etcd + Skydns
SkyDNS performance is comparable to HAProxy, but
it's easier to setup
although not as powerful
Service Discovery: Options
Consul
Consul is a key/value & service registry with built in
DNS support.
Service Discovery: How to pick?
a. Pick a scheduler
* Kubernetes currently only supports etcd.
* Mesos can use Etcd, Zookeeper or Consul.
b. If you're using Consul you're done.
c. For etcd:
* Use HAProxy if you're already using it
* Otherwise just use Skydns and call it a day
Configuration Management
We're going to assume your Microservices are
already 12 Factor apps3
.
Where:
* Service configuration happens in Environment
variables
* Backing services are attached resources (Service
Discovery FTW)
3
https://12factor.net/
Configuration Management (Options)
Most schedulers support this out of the box, with
the caveat that most
don't provide Secret management out of the box
(K8s does).
Secret Managment (Vault)
For secret management we cannot recommend
more Vault because it
provides:
—Secure secret storage
—Dynamic Secrets
—Leasing and Renewal
—Revocation
—Auditing
—Etc.
Other things you will need
—Monitoring: (Prometheus, Nagios, InfluxDB,
Grafana)
—An authentication Service or provider
To Recap. To build The Lean Staging we will need:
—Setup a Scheduler (Kubernetes)
—Setup a CI System (Drone, Jenkins or Travis)
—Hook your Github/Gitlab to that CI
—Change the CI configuration to trigger a Container
build & Deploy
—Have fun!
Gitlab made a really good proof of
concept of it
https://about.gitlab.com/
2016/11/14/idea-to-production/
Recommended reading for SRE Teams:
Distributed Systems fundamentals:
—Notes on Distributed Systems for Young Bloods -
Jeff Hodges
—You Can’t Sacrifice Partition Tolerance - Coda
Hale
—The Raft Consensus Algorithm - Diego Ongaro
Recommended reading for SRE Teams:
Microservices
—Building Microservices - Sam Newman
SRE
—Site Reliability Engineering - Beyer, et al.
—Continuous Delivery - Jez Humble
—The Principles of Product Development Flow -
Reinertsen
https://medium.com/@mustwin/a-handy-guide-to-the-mesos-kubernetes-swarm-
jungle-ad6bc086c736#.a2mymzvsi
^ https://medium.com/@ArmandGrillet/comparison-of-container-schedulers-
c427f4f7421#.uxtk80w35
^ https://about.gitlab.com/2016/11/14/idea-to-production/
^ https://about.gitlab.com/2016/09/14/gitlab-live-event-recap/
^ https://signalfx.com/library/slides-operationalizing-docker-scale-microservices-
orchestration-zenefits/
^ https://medium.com/@mattheath/a-long-journey-into-a-microservice-world-
a714992d2841#.jluhzvs34
^ https://engineering.zenefits.com/2016/09/sauron-ci-automation-at-zenefits/
^ https://news.ycombinator.com/item?id=12880917
^ http://patrobinson.github.io/2016/11/05/docker-in-production/
^ https://thehftguy.wordpress.com/2016/11/01/docker-in-production-an-history-of-
failure/
^ https://medium.com/google-cloud/a-survival-guide-for-containerizing-your-
infrastructure-part-1-why-switch-8e8dee9fc66#.sr5nct3p3
^ https://www.youtube.com/watch?v=WiCru2zIWWs
^ https://speakerdeck.com/mattheath/microservices-and-go-goto-copenhagen-2016
References
—https://medium.com/@mustwin/a-handy-guide-to-
the-mesos-kubernetes-swarm-jungle-
ad6bc086c736#.a2mymzvsi
—https://medium.com/@ArmandGrillet/comparison-
of-container-schedulers-c427f4f7421#.uxtk80w35
—https://about.gitlab.com/2016/11/14/idea-to-
production/
—https://about.gitlab.com/2016/09/14/gitlab-live-
event-recap/
Questions?
Slides will be posted at
medium.com/@mustwin

More Related Content

PPTX
SRE vs DevOps
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
PDF
DevOps & SRE at Google Scale
PPTX
10 Reasons Why You Should Consider Google App Engine (GAE) for Your Next Project
PDF
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
PDF
SRE vs DevOps vs Cloud Native Preso
PPTX
How Small Team Get Ready for SRE (public version)
PDF
DevOps Explained
SRE vs DevOps
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevOps & SRE at Google Scale
10 Reasons Why You Should Consider Google App Engine (GAE) for Your Next Project
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
SRE vs DevOps vs Cloud Native Preso
How Small Team Get Ready for SRE (public version)
DevOps Explained

What's hot (19)

PPTX
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
PPTX
Security Implications for a DevOps Transformation
PDF
Diving Deeper into DevOps Deployments
PDF
From Apollo 13 to Google SRE
PDF
Innovation in Action - #MFSummit2017
PDF
How to plug the data gap in DevOps
PDF
Monitoring in a Microservices World
PDF
DevOps in an Embedded World
PDF
Infrastructure as Code Maturity Model v1
PDF
Cloud expo 2018: From Apollo 13 to Google SRE - When DevOps meets SRE
PDF
DevOps introduction
PPTX
Scaling Enterprise DevOps with CloudBees
PPTX
Microservice Monitoring and Quality Management for Modern Apps and Infrastruc...
PDF
DevOps Patterns Distilled: Implementing The Needed Practices In Practical Steps
PDF
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
PPTX
BASTA! 2017 - DevOps by examples
PDF
Monitoring at the Speed of DevOps
PDF
DevSecOps Basics with Azure Pipelines
PDF
Enabling multicloud in the enterprise with DevSecOps
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Security Implications for a DevOps Transformation
Diving Deeper into DevOps Deployments
From Apollo 13 to Google SRE
Innovation in Action - #MFSummit2017
How to plug the data gap in DevOps
Monitoring in a Microservices World
DevOps in an Embedded World
Infrastructure as Code Maturity Model v1
Cloud expo 2018: From Apollo 13 to Google SRE - When DevOps meets SRE
DevOps introduction
Scaling Enterprise DevOps with CloudBees
Microservice Monitoring and Quality Management for Modern Apps and Infrastruc...
DevOps Patterns Distilled: Implementing The Needed Practices In Practical Steps
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
BASTA! 2017 - DevOps by examples
Monitoring at the Speed of DevOps
DevSecOps Basics with Azure Pipelines
Enabling multicloud in the enterprise with DevSecOps
Ad

Viewers also liked (20)

PDF
SRE From Scratch
PPTX
Site Reliability Engineering Helps Google Conquer The World
PDF
Works of site reliability engineer
PDF
SRE Tools
PDF
SRE - drupal day aveiro 2016
PPT
The ROLE SRE Approach - Getting more concrete
PDF
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
PDF
Sre con16 tier 1 metamorphosis
PDF
SRE in Startup
PPTX
SouthBay SRE Meetup Jan 2016
PPTX
Scio Saa S Readiness Evaluation Sre V1.0
PDF
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
PDF
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
PDF
Software Reliability Engineering
PPTX
I'm No Hero: Full Stack Reliability at LinkedIn
PDF
Docker containers & the Future of Drupal testing
PDF
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
PDF
Microservice architecture
PDF
Drupal workshop ist 2014
PPTX
Building a REST API Microservice for the DevNet API Scavenger Hunt
SRE From Scratch
Site Reliability Engineering Helps Google Conquer The World
Works of site reliability engineer
SRE Tools
SRE - drupal day aveiro 2016
The ROLE SRE Approach - Getting more concrete
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
Sre con16 tier 1 metamorphosis
SRE in Startup
SouthBay SRE Meetup Jan 2016
Scio Saa S Readiness Evaluation Sre V1.0
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Software Reliability Engineering
I'm No Hero: Full Stack Reliability at LinkedIn
Docker containers & the Future of Drupal testing
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
Microservice architecture
Drupal workshop ist 2014
Building a REST API Microservice for the DevNet API Scavenger Hunt
Ad

Similar to You got a couple Microservices, now what? - Adding SRE to DevOps (20)

PPTX
A docker love story
PDF
What they don't tell you about micro-services
PPTX
Microservices and containers for the unitiated
PPT
Integration in the Cloud
PDF
Containers, Docker, and Microservices: the Terrific Trio
PPTX
How do you eat a whale? cloud expo 2017
PPTX
How do you eat a whale velocity 2017
PDF
Containers, microservices and serverless for realists
PPTX
DevOps 101+: From collaboration to microservices
PDF
Build High-Performance, Scalable, Distributed Applications with Stacks of Co...
PDF
Docker Online Meetup #3: Docker in Production
PDF
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
PDF
Cloud Computing as Innovation Hub - Mohammad Fairus Khalid
PDF
Building Microservices Software practics
PDF
Shipping Applications to Production in Containers with Docker
PDF
Rami Sayar - Node microservices with Docker
PPTX
Microservices in academic environment
PDF
Microservices. Microservices everywhere! (At OSCON 2015)
PDF
Docker in production service discovery with consul - road to opscon 2015
PPTX
Docker-N-Beyond
A docker love story
What they don't tell you about micro-services
Microservices and containers for the unitiated
Integration in the Cloud
Containers, Docker, and Microservices: the Terrific Trio
How do you eat a whale? cloud expo 2017
How do you eat a whale velocity 2017
Containers, microservices and serverless for realists
DevOps 101+: From collaboration to microservices
Build High-Performance, Scalable, Distributed Applications with Stacks of Co...
Docker Online Meetup #3: Docker in Production
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
Cloud Computing as Innovation Hub - Mohammad Fairus Khalid
Building Microservices Software practics
Shipping Applications to Production in Containers with Docker
Rami Sayar - Node microservices with Docker
Microservices in academic environment
Microservices. Microservices everywhere! (At OSCON 2015)
Docker in production service discovery with consul - road to opscon 2015
Docker-N-Beyond

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced Soft Computing BINUS July 2025.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf

You got a couple Microservices, now what? - Adding SRE to DevOps

  • 1. You got a couple Microservices, now what? Adding SRE to DevOps Gonzalo Maldonado - MustWin
  • 2. Microservice Honeymoon ^ Your microservice saved your homepage ^ Everyone loves working on the microservice 2 months later ^ Is it still a microservice? ^ Why are we adding new stuff to the monolith? Can we get rid of ticket driven deployments? ^ What makes deploying a microservice so hard? ^ Where can we run this? ^ Monoliths seemed easier to maintain! ^ Datacenter 4.0 ^ Dude, where's my container? ^ The promised land Sysadmin -> DevOps -> SRE ^ The SRE-cret Sauce ^ Resource & Container Management (Schedulers) ^ Service Discovery (Consul, Skydns & Etcd) TOC —Sysadmin -> DevOps -> SRE —Microservice Honeymoon —2 months later —Meanwhile your team is doing ticket driven deployments. —The SRE-cret Sauce —References
  • 3. Sysadmin -> DevOps -> SRE —SysAdmin: Manages 1 or 2 services manually. —DevOps Team: Manages ~10 services semi- programmatically. —SRE Team: Manages 100-1K services fully programmatically.
  • 4. Sysadmin -> DevOps -> SRE (Tech Stack) —SysAdmin: Bash, Perl or Python Scripts —DevOps Team: Chef, Puppet —SRE Team: Mesos, Swarm, Kubernetes, Consul, Vault
  • 5. We don't have 100 services, why should we care about the SRE tech stack? Because this stack: —Saves your team time configuring and deploying a service —Allows your engineering team to grow (a single engineer will be able to manage a couple dozen services)
  • 6. We don't have 100 services, why should we care about the SRE tech stack? Because this stack: —It prevents having to rewrite your infrastructure code as your app scales —It gives you elastic resources (Saves you money on aws).
  • 7. We don't have 100 services, why should we care about the SRE tech stack? —Because it makes deploying Microservices as easy as getting a heroku app up (and you used to love microservices).
  • 9. The Microservice Honeymoon: how a microservice saved your homepage Microservices are awesome.
  • 10. The Microservice Honeymoon: how a microservice saved your homepage —Your page loads decreased from 3 seconds to 20ms (Go is so fast!)
  • 11. The Microservice Honeymoon: how a microservice saved your homepage —Hacker News spikes are no longer a big deal (we're elastic!)
  • 12. The Microservice Honeymoon: how a microservice saved your homepage —Everyone loves working on the Microservice (It's only 500 lines!)
  • 14. 2 months later... —If it has 2K lines of code, is it still a microservice?
  • 15. 2 months later... —Why are people still adding stuff to the monolith? —The code is already there and they didn't want to rewrite it (duh.) —Debugging things is getting harder (You need to test in multiple places) —Getting a new microservice to prod is hard! (! This.)
  • 16. Why is creating new Microservices so hard now? (monoliths felt easier) "Awesome analogy by @timallenwagner: monolithic architecture=carrying a 7ft beach ball, microservice=carrying 200 loose marbles"
  • 17. Why is creating new Microservices so hard now? (monoliths felt easier) —Configuration Management (You have to repeat recipes) —Service-inter-dependency-updates (You can't change a service address or port without affecting other services) —Credentials cannot be shared —Snowflake Runtime Environments (Can't run node.js code on the JVM box)
  • 18. Meanwhile, your team is doing ticket driven deployments —Deploys have become more complicated, when there was only a Monolith, you only had one deploy, and one box.
  • 19. Meanwhile, your team is doing ticket driven deployments —It has gotten to a point, where your team has decided they "need a ticket" for each deploy
  • 20. Where can we run this? Your Sys Admin asks... —If you're typing apt-get to get a new environment up, you're doing something wrong. —Chef, Puppet, Ansible are good replacements, but there's something better you probably already use on your dev machine.
  • 21. Your Datacenter has to change
  • 22. Datacenter 1.0 1 "How do we use these machines?" "Can we automate?" "How can we integrate?" 1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
  • 23. Datacenter 2.0 1 "We need bigger computers" "We need a microservice" "We need a SysAdmin" 1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
  • 24. Datacenter 3.0 1 "We need some VMS." "We need microservices" "We need IT" 1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
  • 25. Datacenter 3.5 1 "We have a lot of VMs" "We have lots of microservices" "We need DevOps" 1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
  • 26. Datacenter 3.5 1 "We need to manage our VMs" "We need to manage our microservices" "We need SREs" 1 http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844
  • 27. You already heard about docker and why using containers that share OS resources is more efficient than using full virtual machines. But what else does docker give you? Dude, where is my container? Virtual Machines vs Docker
  • 28. Dude, where is my container? What else does docker give you? * Contained instances (You can run multiple runtimes on one box) * Incremental images. (You can use an existing image as a base) * Immutable Instances (Your images are stateless)
  • 29. And this gets us to The Lean Staging $ git commit -am "The new cool feature"
  • 30. The Lean Staging $ git commit -am "The new cool feature" $ git push
  • 31. The Lean Staging $ git commit -am "The new cool feature" $ git push Running CI ...........................
  • 32. The Lean Staging $ git commit -am "The new cool feature" $ git push Running CI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ CI done. Your branch is available at http://super-tesla.thunderdomes.co
  • 33. What do we need to get there? We need Service Reliability Engineers
  • 34. What do we need to get there? And what are those SRE guys going to build to achieve that?
  • 35. What do we need to get there? What's the SRE-cret sauce? —Code —Servers or The Cloud —A CI service —A deployment system
  • 36. What do we need to get there? Aka. The SRE-cret sauce —Code (We already have that) —Servers or The Cloud (Pick AWS, GCP or Azure) —A CI service (Pick Jenkins, Travis or CircleCI) —Deployment & Monitoring systems ! Lets Focus on this
  • 37. The SRE-cret sauce. Those Deployment systems will do the following a. Container Management b. Service Discovery c. Configuration Management d. Authentication & Authorization
  • 38. This is what you want. Now that you have discovered Docker, you want to us it on production. While you could run all your containers on a single box, this would prevent you to scale horizontally, and you would need downtime to add more memory to that box. Container Management
  • 39. Like many things on the tech world, Google was one of the early adopters of Schedulers. Schedulers are systems in charge of managing the cluster resources by telling applications when to run. Container Management: Enter the scheduler Architectures presented in the white-paper concerning Google's Omega Scheduler.
  • 40. Container Management: Scheduler Options —Mesosphere DCOS (Based on Apache Mesos) —Docker Swarm —Kubernetes —Nomad
  • 41. Each scheduler option has it's own pros and cons and you will need to pick the one that better fits your team needs.
  • 42. Container Management: Scheduler Options[^2] More info here: https://medium.com/@mustwin/a-handy-guide-to- the-mesos-kubernetes-swarm-jungle- ad6bc086c736#.6ji95fm7e
  • 43. Service Discovery Service discovery is a mechanism in when adding a new service instance, the rest of the services detect this change automatically.
  • 44. Service Discovery: Options Load balancer + Highly available Storage. Using a load balancer like NGINX/HAProxy + etcd you can update service registrations dynamically. The Load Balancer takes care of DNS resolutions.
  • 45. Service Discovery: Options Etcd + Skydns SkyDNS performance is comparable to HAProxy, but it's easier to setup although not as powerful
  • 46. Service Discovery: Options Consul Consul is a key/value & service registry with built in DNS support.
  • 47. Service Discovery: How to pick? a. Pick a scheduler * Kubernetes currently only supports etcd. * Mesos can use Etcd, Zookeeper or Consul. b. If you're using Consul you're done. c. For etcd: * Use HAProxy if you're already using it * Otherwise just use Skydns and call it a day
  • 48. Configuration Management We're going to assume your Microservices are already 12 Factor apps3 . Where: * Service configuration happens in Environment variables * Backing services are attached resources (Service Discovery FTW) 3 https://12factor.net/
  • 49. Configuration Management (Options) Most schedulers support this out of the box, with the caveat that most don't provide Secret management out of the box (K8s does).
  • 50. Secret Managment (Vault) For secret management we cannot recommend more Vault because it provides: —Secure secret storage —Dynamic Secrets —Leasing and Renewal —Revocation —Auditing —Etc.
  • 51. Other things you will need —Monitoring: (Prometheus, Nagios, InfluxDB, Grafana) —An authentication Service or provider
  • 52. To Recap. To build The Lean Staging we will need: —Setup a Scheduler (Kubernetes) —Setup a CI System (Drone, Jenkins or Travis) —Hook your Github/Gitlab to that CI —Change the CI configuration to trigger a Container build & Deploy —Have fun!
  • 53. Gitlab made a really good proof of concept of it https://about.gitlab.com/ 2016/11/14/idea-to-production/
  • 54. Recommended reading for SRE Teams: Distributed Systems fundamentals: —Notes on Distributed Systems for Young Bloods - Jeff Hodges —You Can’t Sacrifice Partition Tolerance - Coda Hale —The Raft Consensus Algorithm - Diego Ongaro
  • 55. Recommended reading for SRE Teams: Microservices —Building Microservices - Sam Newman SRE —Site Reliability Engineering - Beyer, et al. —Continuous Delivery - Jez Humble —The Principles of Product Development Flow - Reinertsen
  • 56. https://medium.com/@mustwin/a-handy-guide-to-the-mesos-kubernetes-swarm- jungle-ad6bc086c736#.a2mymzvsi ^ https://medium.com/@ArmandGrillet/comparison-of-container-schedulers- c427f4f7421#.uxtk80w35 ^ https://about.gitlab.com/2016/11/14/idea-to-production/ ^ https://about.gitlab.com/2016/09/14/gitlab-live-event-recap/ ^ https://signalfx.com/library/slides-operationalizing-docker-scale-microservices- orchestration-zenefits/ ^ https://medium.com/@mattheath/a-long-journey-into-a-microservice-world- a714992d2841#.jluhzvs34 ^ https://engineering.zenefits.com/2016/09/sauron-ci-automation-at-zenefits/ ^ https://news.ycombinator.com/item?id=12880917 ^ http://patrobinson.github.io/2016/11/05/docker-in-production/ ^ https://thehftguy.wordpress.com/2016/11/01/docker-in-production-an-history-of- failure/ ^ https://medium.com/google-cloud/a-survival-guide-for-containerizing-your- infrastructure-part-1-why-switch-8e8dee9fc66#.sr5nct3p3 ^ https://www.youtube.com/watch?v=WiCru2zIWWs ^ https://speakerdeck.com/mattheath/microservices-and-go-goto-copenhagen-2016 References —https://medium.com/@mustwin/a-handy-guide-to- the-mesos-kubernetes-swarm-jungle- ad6bc086c736#.a2mymzvsi —https://medium.com/@ArmandGrillet/comparison- of-container-schedulers-c427f4f7421#.uxtk80w35 —https://about.gitlab.com/2016/11/14/idea-to- production/ —https://about.gitlab.com/2016/09/14/gitlab-live- event-recap/
  • 57. Questions? Slides will be posted at medium.com/@mustwin