You got a couple Microservices, now what? - Adding SRE to DevOps

You got a couple
Microservices, now what?
Adding SRE to DevOps
Gonzalo Maldonado - MustWin

Microservice Honeymoon
^ Your microservice saved your homepage
^ Everyone loves working on the microservice
2 months later
^ Is it still a microservice?
^ Why are we adding new stuff to the monolith?
Can we get rid of ticket driven deployments?
^ What makes deploying a microservice so hard?
^ Where can we run this?
^ Monoliths seemed easier to maintain!
^ Datacenter 4.0
^ Dude, where's my container?
^ The promised land
Sysadmin -> DevOps -> SRE
^ The SRE-cret Sauce
^ Resource & Container Management (Schedulers)
^ Service Discovery (Consul, Skydns & Etcd)
TOC
—Sysadmin -> DevOps -> SRE
—Microservice Honeymoon
—2 months later
—Meanwhile your team is doing ticket driven
deployments.
—The SRE-cret Sauce
—References

Sysadmin -> DevOps -> SRE
—SysAdmin: Manages 1 or 2 services manually.
—DevOps Team: Manages ~10 services semi-
programmatically.
—SRE Team: Manages 100-1K services fully
programmatically.

Sysadmin -> DevOps -> SRE (Tech Stack)
—SysAdmin: Bash, Perl or Python Scripts
—DevOps Team: Chef, Puppet
—SRE Team: Mesos, Swarm, Kubernetes, Consul,
Vault

We don't have 100 services, why should we care
about the SRE tech stack?
Because this stack:
—Saves your team time conﬁguring and deploying a
service
—Allows your engineering team to grow (a single
engineer will be able to manage
a couple dozen services)

Because this stack:
—It prevents having to rewrite your infrastructure
code as your app
scales
—It gives you elastic resources (Saves you money
on aws).

—Because it makes deploying Microservices as
easy as getting a heroku app up (and you used to
love microservices).

When doing Microservices gets
hard

The Microservice
Honeymoon: how a
microservice saved your
homepage
Microservices are awesome.

The Microservice Honeymoon: how a microservice
saved your homepage
—Your page loads decreased from 3 seconds to
20ms (Go is so fast!)

saved your homepage
—Hacker News spikes are no longer a big deal
(we're elastic!)

saved your homepage
—Everyone loves working on the Microservice (It's
only 500 lines!)

2 months later...
—If it has 2K lines of code, is it still a microservice?

2 months later...
—Why are people still adding stuﬀ to the monolith?
—The code is already there and they didn't want
to rewrite it (duh.)
—Debugging things is getting harder (You need to
test in multiple
places)
—Getting a new microservice to prod is hard! (!
This.)

Why is creating new Microservices so hard now?
(monoliths felt easier)
"Awesome analogy by @timallenwagner: monolithic
architecture=carrying a 7ft beach ball,
microservice=carrying 200 loose marbles"

Why is creating new Microservices so hard now?
(monoliths felt easier)
—Configuration Management (You have to repeat
recipes)
—Service-inter-dependency-updates (You can't
change a service address
or port without affecting other services)
—Credentials cannot be shared
—Snowflake Runtime Environments (Can't run
node.js code on the JVM box)

Meanwhile, your team is doing ticket driven
deployments
—Deploys have become more complicated, when
there was only a Monolith,
you only had one deploy, and one box.

Meanwhile, your team is doing ticket driven
deployments
—It has gotten to a point, where your team has
decided they "need a
ticket" for each deploy

Where can we run this? Your Sys Admin asks...
—If you're typing apt-get to get a new environment
up, you're doing
something wrong.
—Chef, Puppet, Ansible are good replacements, but
there's something
better you probably already use on your dev
machine.

Datacenter 1.0 1
"How do we use these machines?"
"Can we automate?"
"How can we integrate?"
1
http://www.slideshare.net/SebastianWeigand/containers-and-customers-55262844

Datacenter 2.0 1
"We need bigger computers"
"We need a microservice"
"We need a SysAdmin"
1

Datacenter 3.0 1
"We need some VMS."
"We need microservices"
"We need IT"
1

Datacenter 3.5 1
"We have a lot of VMs"
"We have lots of microservices"
"We need DevOps"
1

Datacenter 3.5 1
"We need to manage our VMs"
"We need to manage our
microservices"
"We need SREs"
1

You already heard about docker and why
using containers that share OS
resources is more efficient
than using full virtual machines. But what
else does docker give
you?
Dude, where is my container?
Virtual Machines vs Docker

Dude, where is my container?
What else does docker give you?
* Contained instances (You can run multiple
runtimes on one box)
* Incremental images. (You can use an existing
image as a base)
* Immutable Instances (Your images are stateless)

And this gets us to The Lean Staging
$ git commit -am "The new cool feature"

The Lean Staging
$ git push

The Lean Staging
$ git push
Running CI ...........................

The Lean Staging
$ git push
Running CI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
CI done. Your branch is available at http://super-tesla.thunderdomes.co

What do we need to get
there?
We need Service Reliability
Engineers

What do we need to get
there?
And what are those SRE guys
going to build to achieve that?

What do we need to get there?
What's the SRE-cret sauce?
—Code
—Servers or The Cloud
—A CI service
—A deployment system

What do we need to get there?
Aka. The SRE-cret sauce
—Code (We already have that)
—Servers or The Cloud (Pick AWS, GCP or Azure)
—A CI service (Pick Jenkins, Travis or CircleCI)
—Deployment & Monitoring systems ! Lets Focus
on this

The SRE-cret sauce.
Those Deployment systems will do the following
a. Container Management
b. Service Discovery
c. Conﬁguration Management
d. Authentication & Authorization

This is what you want. Now that you have
discovered Docker, you want to us it on production.
While you could run all your containers on a single
box, this would
prevent you to scale horizontally, and you would
need downtime to add
more memory to that box.
Container Management

Like many things on the tech world, Google was one of the
early
adopters of Schedulers. Schedulers are systems in charge
of
managing the cluster resources by telling applications when
to run.
Container Management: Enter the scheduler
Architectures presented in the white-paper
concerning
Google's Omega Scheduler.

Container Management: Scheduler Options
—Mesosphere DCOS (Based on Apache Mesos)
—Docker Swarm
—Kubernetes
—Nomad

Each scheduler option has it's own pros and
cons and you will need to
pick the one that better fits your team needs.

Container Management: Scheduler Options[^2]
More info here:
https://medium.com/@mustwin/a-handy-guide-to-
the-mesos-kubernetes-swarm-jungle-
ad6bc086c736#.6ji95fm7e

Service Discovery
Service discovery is a mechanism in when adding a
new service instance,
the rest of the services detect this change
automatically.

Service Discovery: Options
Load balancer + Highly available Storage.
Using a load balancer like NGINX/HAProxy + etcd
you can update service
registrations dynamically. The Load Balancer takes
care of DNS
resolutions.

Etcd + Skydns
SkyDNS performance is comparable to HAProxy, but
it's easier to setup
although not as powerful

Consul
Consul is a key/value & service registry with built in
DNS support.

Service Discovery: How to pick?
a. Pick a scheduler
* Kubernetes currently only supports etcd.
* Mesos can use Etcd, Zookeeper or Consul.
b. If you're using Consul you're done.
c. For etcd:
* Use HAProxy if you're already using it
* Otherwise just use Skydns and call it a day

Conﬁguration Management
We're going to assume your Microservices are
already 12 Factor apps3
.
Where:
* Service conﬁguration happens in Environment
variables
* Backing services are attached resources (Service
Discovery FTW)
3
https://12factor.net/

Conﬁguration Management (Options)
Most schedulers support this out of the box, with
the caveat that most
don't provide Secret management out of the box
(K8s does).

Secret Managment (Vault)
For secret management we cannot recommend
more Vault because it
provides:
—Secure secret storage
—Dynamic Secrets
—Leasing and Renewal
—Revocation
—Auditing
—Etc.

Other things you will need
—Monitoring: (Prometheus, Nagios, InﬂuxDB,
Grafana)
—An authentication Service or provider

To Recap. To build The Lean Staging we will need:
—Setup a Scheduler (Kubernetes)
—Setup a CI System (Drone, Jenkins or Travis)
—Hook your Github/Gitlab to that CI
—Change the CI conﬁguration to trigger a Container
build & Deploy
—Have fun!

Gitlab made a really good proof of
concept of it
https://about.gitlab.com/
2016/11/14/idea-to-production/

Recommended reading for SRE Teams:
Distributed Systems fundamentals:
—Notes on Distributed Systems for Young Bloods -
Jeﬀ Hodges
—You Can’t Sacriﬁce Partition Tolerance - Coda
Hale
—The Raft Consensus Algorithm - Diego Ongaro

Recommended reading for SRE Teams:
Microservices
—Building Microservices - Sam Newman
SRE
—Site Reliability Engineering - Beyer, et al.
—Continuous Delivery - Jez Humble
—The Principles of Product Development Flow -
Reinertsen

https://medium.com/@mustwin/a-handy-guide-to-the-mesos-kubernetes-swarm-
jungle-ad6bc086c736#.a2mymzvsi
^ https://medium.com/@ArmandGrillet/comparison-of-container-schedulers-
c427f4f7421#.uxtk80w35
^ https://about.gitlab.com/2016/11/14/idea-to-production/
^ https://about.gitlab.com/2016/09/14/gitlab-live-event-recap/
^ https://signalfx.com/library/slides-operationalizing-docker-scale-microservices-
orchestration-zenefits/
^ https://medium.com/@mattheath/a-long-journey-into-a-microservice-world-
a714992d2841#.jluhzvs34
^ https://engineering.zenefits.com/2016/09/sauron-ci-automation-at-zenefits/
^ https://news.ycombinator.com/item?id=12880917
^ http://patrobinson.github.io/2016/11/05/docker-in-production/
^ https://thehftguy.wordpress.com/2016/11/01/docker-in-production-an-history-of-
failure/
^ https://medium.com/google-cloud/a-survival-guide-for-containerizing-your-
infrastructure-part-1-why-switch-8e8dee9fc66#.sr5nct3p3
^ https://www.youtube.com/watch?v=WiCru2zIWWs
^ https://speakerdeck.com/mattheath/microservices-and-go-goto-copenhagen-2016
References
—https://medium.com/@mustwin/a-handy-guide-to-
the-mesos-kubernetes-swarm-jungle-
ad6bc086c736#.a2mymzvsi
—https://medium.com/@ArmandGrillet/comparison-
of-container-schedulers-c427f4f7421#.uxtk80w35
—https://about.gitlab.com/2016/11/14/idea-to-
production/
—https://about.gitlab.com/2016/09/14/gitlab-live-
event-recap/

Questions?
Slides will be posted at
medium.com/@mustwin

You got a couple Microservices, now what? - Adding SRE to DevOps

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to You got a couple Microservices, now what? - Adding SRE to DevOps (20)

Recently uploaded (20)

You got a couple Microservices, now what? - Adding SRE to DevOps