Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)

NU.nl
About
• First dutch digital news platform.
• Unique visitors:
• 7 mln. / month
• 2.1 mln. / day
• Page hits: ~12 mln / day
• API: ~150k rpm / 2500rps

NU.nl
Sanoma
• Part of Sanoma
• NL: NU.nl, Viva, Libelle, Scoopy
• FI: Helsingin Sanomat
• Reaching ~9.8 mln dutch people / month

IT organization
Teams
• NU.nl teams
• Web 1 (application / front-end-ish)
• Web 2 (application / back-end-ish / infra)
• Feature 1 & 2 (cross-discipline)
• iOS
• Android
• Sanoma teams
• DevSupport, Mediatool, Content Aggregation

NU.nl
Growing number of teams
• Increased number of parallel workflows
• Testing
• Releasing
• Roadmaps
• Knowing about everything no longer possible
• Aligning ‘procedures by agreement’ increasingly hard

Current infrastructure
AWS accounts & VPCs
VPC
sanoma
RDS Elasticache
ALBs
EC2
Cloudfront
API CMS WWW XYZ
VPC
nu-test
FOO K8S
VPC
nu-prod
BAR K8S

Infrastructure provisioning
Terrible (Terraform + Ansible)
terrible plan
terrible apply
terrible ansible

Development workflow
From code to release
• Code
• Automated tests
• Code review
• Manually initiated deploy to test
• Feature test
• Manually initiated deploy to staging
• Exploratory test
• Manually initiated deploy to production

DevOps practices
Solid foundation
• All infra in code
• Terraform
• Terrible providing mechanisms:
• Authorization
• Managing TF state files

DevOps practices
But…
• Setting up additional test environments slow
• Slow feedback loop
• Terraform plan vs apply (surprise surprise, it didn’t work)
• Ansible (~20 minutes)
• Vagrant? (but not fully representative of EC2)
• Config drift
• Hard to nail down every system package version
• EC2 instances having different lifecycle

DevOps practices
But… (part 2)
• No scaling infra*
• Heavily invested in Ansible
• Config & secrets management problematic
• GUIs time consuming
• No change history
• Or highly detached from code history
• No context
• Not overly secret
*Yes, we know it’s 2019

DevOps practices
But… (part 3)
• Current deployment system assumes fixed set of servers
• Possible alternatives include:
• ASG rolling updates (can get slow)
• Pull current application code on start-up (even slower)
• Bake AMI
• Periodically poll for application version to be deployed
• Works quite well
• …as long as new code combined with config doesn’t break.
• So a certain level of orchestration would be needed.

Where to start?
Everything’s connected

Timing
What direction to move?
• DevOps challenges
• Desire to improve delivery process, having true artifacts
• Early 2018
• Containers are a well-established way of ‘packaging’ an application
• Kubernetes getting out of early-adopters phase
• NU.nl (re-)launching a new product: NUjij

Improvement layers
A journey or a destination?
1: Containers as artifacts
• Versatile
• Forces us to do certain things right
• 12factor
• Centralized logging
• Easily moved through a pipeline
• Lots of tooling

Improvement layers
2: A flexible platform to deploy and run containerized applications on
• Tackling challenges at platform level instead of per-application:
• Scaling
• Security updates
• Observability
• Deployment & configuration process

Improvement layers
2: A flexible platform to deploy and run containerized applications on
• Kubernetes
• Rapidly increasing adoption
• Short feedback loop
• Ability to run locally (unlike, say, ECS)
• Easily stamp out deployments for:
• feature testing/demo-ing
• e2e tests

Narrowing the scope
Lets not get carried away
The goal is not:
• To chop up change all of our applications into nano- micro-services
• They’re not that monolithic anyway
• To put everything in Kubernetes
• Managed AWS services where possible
• Redis, RDS
Focus on agility and efficiency of what we change most frequently: Code

Initial cluster setup
The journey begins

Multiple clusters
By criticality
3 AWS accounts, 3 clusters:
• osc-nu-prod
• production
• osc-nu-test
• test
• staging
• osc-nu-dev
• proofing infra changes

Kops
Why Kops?
• Manages cluster upgrades
• Rolling upgrade
• Draining nodes
• EKS not yet available
• Let alone in eu-west-1

Kops
Glueing together cluster setup and kube-system setup

Kops
Templating Terraform and custom vars

Components
kube-system
• Networking
• Calico
• EFS
• previousnext/k8s-aws-efs
• No AZ-restrictions when re-scheduling pods
• Creates new EFS filesystem for each PersistentVolumeClaim
• Security & reliability (isolated IOPs budgets)
• Slow on initial deploy

Components
kube-system
• AWS IAM Authenticator
• The ‘Zalando suite’
• Skipper
• Skipper Daemonset
• kube-ingress-aws-controller Deployment
• ExternalDNS
• Configures PowerDNS (& others) based on ingress host

Components
Zalando skipper
• Skipper Daemonset
• Feature rich (metrics, shadow traffic,
blue/green)
• kube-ingress-aws-controller Deployment
• https://github.com/zalando-incubator/kube-
ingress-aws-controller
• Sets up & manages ALB
• Finds appropriate ACM certificate
• Supports multiple ACM certificates per ALB

Components
Autoscaling
• Horizontal Pod Autoscaler
• Scales number of pods based on
(CPU) utilization
• Cluster autoscaler
• Running on master nodes
• Scales asg out when pods pending
• Scales asg in when nodes
underutilized

Components
Logging & metrics
• ELK
• Prometheus / Grafana

Jenkins
Build & Deploy pipeline

Jenkins
Temporary deployment for running tests
• Deploy to temp. namespace
• Jenkins-SU
• Run tests in deployment
• Deploy to test/staging/production
• By bumping image version
• Production: Jenkins-SU
• Clean up temp. namespace
• Jenkins-SU

Jenkins
Jenkins-SU
• Sets up namespace
• Adding RBAC for Jenkins
• Only if ns name matches pattern ‘Jenkins-*’
• Deletes namespace
• Only if ns name matches pattern ‘Jenkins-*’
• Avoids need for Jenkins to be able to delete every namespace
curl -X POST --user ${JENKINS_SU_AUTH} --data '{"name": "${K8S_BUILD_NS}"}' http://su.jenkins-su/ns/
curl -X DELETE --user ${JENKINS_SU_AUTH} --data '{"name": "${K8S_BUILD_NS}"}' http://su.jenkins-su/ns/

Kubernetes in action
Questions
• Will it be stable?
• Will we be able to operate?
• Should we wait for EKS?
• Do we actually want EKS? What will EKS be like?

Incident 1
Accidentally trying to load a ElasticSearch index of 90Gb
• Misconfigured elast-alert (trying to read entire index)
• No memory limit configured

Incident 1
Accidentally trying to load a ElasticSearch index of 90Gb
• Required manual intervention: Yes
• Stopping the bleeding:
• Remove elast-alert
• Permanent fixes:
• Don’t load entire index
• Apply limits

Incident 2
Rapid traffic increase affecting core components
• 2019-03-18 Utrecht shooting
• 11:11 First article published
• 11:56 breaking push
• CPU burstable pods causing node 100% CPU
• Core components (kubelet, ingress) suffering

Incident 2

Incident 2
pod
pod
kubelet
skipper
node
Pods:
0.4 CPU req.
0.8 CPU limit
80% CPU utilization
pod
kubelet
skipper
node
pod
Pods:
0.4 CPU req.
0.8 CPU limit
120% CPU utilization
problems

Incident 2
• Required manual intervention: No
• Fixes:
• Reduce CPU burstable amount of pods
• Increase resource requests of skipper
• Mind QoS: Guaranteed, Burstable, Best effort
• Reserve cpu & memory for kubelet
• --kube-reserved
• --system-reserved

Incident 3
Application update increasing memory footprint
• Upgrade including moving from MongoDB 3 to MongoDB 4
• HorizontalPodAutoscaler based on CPU
• Scaling based on CPU not kicking in
• New increased memory footprint causing OOMkilled

Incident 3

Incident 3
• Required manual intervention: Yes
• Stopping the bleeding:
• Increase memory limit of Talk pods
• Permanent fixes:
• Adjust CPU request/limit & HPA thresholds
• Scale on both CPU and memory
• Note: Not all applications ‘give back’ memory
• Set memory limit higher than request to prevent ‘snowball effect’

Incident 3
OOMKilled snowball effect
pod pod pod pod
pod
pod
pod
pod
pod pod
starting
…
1 2
3 4

3
Memory limits
!?
(obligatory this-is-fine meme)

That’s not fine
Is it?
• On the positive side:
• All are result of (lack of) resource limit configuration
• This can be learned
• On the negative side:
• This needs to be learned
• Note: ‘Availability bias’

Automation
Improving the pipeline
• Automating setting the image version is not enough
• Rolling out Kubernetes manifests still manual task
• Updating configuration & secrets still manual task
• Duplication in manifests between stages
• Not easily seen what parts are different
• Differences intentional or accidental?
• This actually slows us down
• Does git represent the current state?
kubectl -n talk get secrets env -o json |jq -r '.data | map_values(@base64d) | to_entries | .[] | .key + "="" + .value +"""'

Helm
The package manager for Kubernetes
• Charts
• Configured via values
• It’s like Terraform modules
• Or Ansible group_vars
• Leveraging community knowledge and efforts
• E.g. prometheus-operator
• No need to copy charts, able to reference.
• Helm v3

SOPS: Secrets OPerationS
Secrets management stinks, use some sops!
• By Mozilla
• Manage AWS API access, not keys
• Versatile
• YAML, JSON, ENV, INI, binary (plain text)
• Not limited to Kubernetes
• Meaningful diffs
• Alternatives considered:
• Kamus
• Bitnami SealedSecrets

Helmfile
Wiring it together
• Charts
• Referenced from online chart sources or local
• Environments
• Test, staging, production
• Referencing values and secrets
• Releases
• Release name
• Reference to chart
• Values (can be a templated file, using vars and secrets from environment)

Helmfile
Wiring it together
environment
values
secrets
(SOPS)
release X
release Y
release Z
ENV
values
values
values
Helmfile

Helmfile
Wiring it together
• Advantages:
• Meaningful git diffs
• Easily manage multiple releases in single pipeline, e.g.:
• Everything related to monitoring and logging
• Kube-system
• Declarative definition
• Of what would otherwise be numerous helm args and steps in CI/CD pipeline

Helmfile
Wiring it together
• Advantages (continued):
• Ability to pass in ENV vars
• E.g. build result image tags
• Ability to reference complex charts created by community
• Charts as a building block allows re-use. Example:
• Instead of plain yaml you write a chart
• If fitting workflow, the chart can be a published artifact
• Chart can be re-used e.g. in e2e tests

Helmfile
Wiring it together
• Disadvantages:
• 2 levels of templating
• Chart itself
• Only if writing own charts
• Environment & release values into Helm values
• Template error message not overly clear
• Or even misleading
• At least it breaks

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)

Helmfile
Final words
But tiller?
• Helm as a templating engine
• Option: Using Helm 2 ‘Tillerless’
• Tiller outside of cluster, not by-passing RBAC
• Start using Helm as package manager when Helm 3 settles down
• Easy removal of temp. per-feature deploys
• Diffs

Auto-scaling
Breaking news push

Auto-scaling
Types of scaling
• Reactive
• Breaking news
• K8S cluster-autoscaler
• Can’t schedule pod? Add nodes.
• Predictive
• Ticket sale start
• Black Friday

Auto-scaling
Types of scaling
• From within cluster
• K8S cluster-autoscaler
• From outside of cluster
• ASG scaling policies

Auto-scaling
Scaling speed
node spin-up duration
node count 70% utilization

Auto-scaling
Times 5 within 5 minutes?

Cluster auto-scaler
Bag of tricks
• Mix predictive and reactive
• Add asg instances without telling cluster-autoscaler
• Traffic expected to arrive by the time cluster-autoscaler starts to scale in,
leaving plenty of resources as needed.
• Pause pods
• Lower priority pods that can safely be evicted
• Effectively ‘creating headroom’ in cluster

Considerations
When engaging ‘ludicrous mode’™
Can control-plane handle scale?
• KOPS
• Size master nodes for max. cluster size
• Overhead cost
• EKS
• What’s behind the abstraction?
• ELB 503s exist after all
• Plan: Proof of concepts

Consider EKS
Managed control plane
EKS Kops
Managed control plane Total control over setup
Easier: EKS IAM roles for pods
• Launched 2019-09-04 (yesterday)*
Smooth rolling upgrade process
Probably cheaper (2/3 of 3x m4.large) No VPC CNI Pod density limitations
* https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/

EKS IAM roles for pods
Also possible on DIY clusters, officially launched yesterday
• OIDC federation access (OpenID Connect identity provider)
• Assume role via Secure Token Service (STS)
• Projected service account tokens (JWT) in pod
• STS can validate JWT tokens against OIDC provider
• Boils down to:
• Enable/set-up prerequisites in cluster
• Add ServiceAccount having IAM role annotation to pod
• Use recent AWS SDK

Multiple clusters per AWS account
Don’t lock ourselves in a corner.
api.<aws-account-name>.<k8s-sanoma-domain>
api.<cluster-name>.<aws-account-name>.<k8s-sanoma-domain>
Route53 zone 1
Route53 zone 1Route53 zone 2
NS records

CI/CD to separate cluster
Similar flows
• No more taints and tolerations
• Similar authorization mechanism to all deploy targets
• Possibly IAM
• No need for Jenkins-SU
• Clusters should be cattle anyway

Pipelines
GitOps
• Manage namespaces via pipeline:
• kube-system
• monitor
• Creation of application namespaces including RBAC
• Helmfile

System applications
Small improvements
• Prometheus-operator
• PrometheusRule resource type
• Default dashboards
• EFS
• https://github.com/previousnext/k8s-aws-efs
• Current. Works well but not a lot of active development.
• 2 contributors. 46 stars.
• https://github.com/kubernetes-incubator/external-storage
• De facto EFS provisioner. 146 contributors. 1630 stars.
• Bonus: No more time-consuming initial volume set-up

Expand
Increase Return on Investment
• Add more applications
• Facilitate parallel testing & development workflows
• Feature testing
• Mobile app development
• E2e tests

Links
Further reading
Scaling & spot instances:
• https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767
EKS:
• https://medium.com/glia-tech/productionproofing-eks-ed52951ffd6c
QoS:
• https://www.replex.io/blog/everything-you-need-to-know-about-kubernetes-quality-of-service-qos-classes
Failure stories:
• https://k8s.af/

Know your limits
Automate all the things
Everything code
Kubernetes is a journey, not a destination
All should be cattle. No pets allowed!

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)

More Related Content

What's hot (15)

Similar to Kubernetes at NU.nl (Kubernetes meetup 2019-09-05) (20)

Recently uploaded (20)

Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)