SlideShare a Scribd company logo
meetup 16/7/2018
Agenda
● Redis Labs intro and architecture
● Double orchestration
● Our kubernetes solution
● The way to operators
● Operators intro
● Operators development
● Demo
Redis Labs
Intro And Arch
3
Introduction to Redis Enterprise
Open source. The leading in-memory database platform,
supporting any high performance operational, analytics or
hybrid use case.
The open source home and commercial provider of Redis
Enterprise (Redise
) technology, platform, products & services.
Orchestrating Redis & K8s Operators
We Are Hiring !
Redise
- Open Source & Proprietary Technology
Redise
Node
Cluster Manager
Redise
Cluster
• Shared nothing, symmetrical cluster
architecture
• Fully compatible with open source
commands & data structures
Enterprise Layer
Open Source Layer
REST API
Zero latency proxy
• Faster time to market with continuity between
dev/test and production environments that use
Redise
Pack
• Highly available, easier to scale, simpler to manage
Redis technology, integrated with orchestration tools
such as PCF, Kubernetes, Mesosphere...
• Node in a container approach — All Redise
services
inside each container.
Run Redise
clusters on single or multiple nodes
Redise
in Containers
Node in a Pod Approach
Node 1
Vs
Node 2 Node 3 Node 1 Node 2 Node 3
One pod, multiple services per nodeMultiple pods, multiple services per node
Double Orchestration
For fun and profit!
10
Orchestrating Redis & K8s Operators
What’s Double Orchestration ?
Kubernetes PKS
External
Redis Cluster
Orchestration
Node 1
Redis Shards
Node 2 Node N
Internal
Why like this?
• Resource management - Orchestration platforms are
designed to be generic.
• Again - Performance is king.
• Last but not least, it allows us to maintain a common
architecture - regardless of running environment, be it bare
metal, VM, K8s, Pivotal Cloud Foundry.
(.… Surprisingly enough, not everybody in the world uses containers…)
Who Does What
• Node auto-healing
• Node scaling
• Failover & scaling
• Configuration & monitoring
• Service discovery
• Upgrade
+
And specifically on Kubernetes
Node in a
Pod
Statefulset
Persistent
Volumes
Custom
Controller
Services
Manager
Our Kubernetes Solution
StatefulSet
Our cluster nodes are deployed as part of a statefulset
Affinity
Allows us to control the Redislabs cluster nodes topology
Redislabs Service Manager
Create/Update/Delete service entries for each Redis DB hosted on the cluster
RBAC
The Service Manager must have permissions to access the namespace to create services
Ingress
Allow access to Redis DBs from outside of the k8s cluster
17
Redis Labs on Kubernetes - Building Blocks
StatefulSet
• Introduced in 1.5, GAed in 1.9
• Statefulset Pod consistency
– Pod naming
– Scale-out/Scale-in
– Pod Upgrade
• Persistent Disks
– Same PVC will be used when Pod is (re)scheduled
• All Pods are uniform
• Recovery from error state
pod-0 pod-1 pod-2
PV PV PV
Pod features
• Anti-affinity
– Allows us to control where the pods are being scheduled
• Readiness Probes
– Allows us to control the action flows to avoid data loss
• Pre-stop hook
– Drain the node and move resources to a different node
Why?
• Redis Enterprise is a multi-tenant Redis cluster
• Redis Enterprise Database can have 1 or more network endpoints
Problem
• Expose databases as a service instance
Solution
• Python based application that will: create, delete or update necessary database
service entries
• Based on an idempotent reconciliation loop
Redis Enterprise Services Manager
Kubernetes Cluster
Redis Labs Stateful Set
Worker Node
pod-0
Worker Node
pod-2
Worker Node
pod-1
Redis Enterprise Cluster
PV PV PV
K8s API
Services
Manager
Add/Edit/Delete
Database Services
App App App
The way to operators
• Provide a solid primary db solution for end-users
• Stateful application
– Some changes cannot be performed
– Some changes need to mutate the state before applying the actual change
– Data-loss is unacceptable
• Support multiple k8s deployments
– Cloud: GKE, AWS, etc
– Openshift
– PKS
– Vanilla
– On-prem hardware vendor
• Ingress
• Packaging
Redis Labs Challenges
• Started out with 9 static yaml files
– Hard to deploy
– Hard to maintain
– Hard to distribute
– No control over the deployment life-cycle
• Helm
– Customized deployment
– Easier to maintain
– Not fully supported everywhere
– No control over the deployment life-cycle
• Operator
– Simple deployment (2 yaml files)
– Full control over life-cycle
– K8s compatible
Our journey
.yaml
.yaml
Operator
26
Custom Resource
+
Custom Controller
=
Operator
27
kubectl
API Server
StatefulSet Controller
Watch(StatefulSet)
pod-0 pod-1 pod-2
my-sts
kubectl create -f my-sts.yaml
PV PV PV
28
kubectl
StatefulSet Controller
Watch(StatefulSet)
pod-0 pod-1 pod-2
my-sts
kubectl scale statefulset my-sts --replicas=5
pod-3 pod-4
API Server
29
kubectl
API Server
RedisCluster Controller
Watch(RedisCluster)
my-redis-cluster
kubectl create -f my-redis-cluster.yaml
Stateful
Set
UI
service
Service
Account
...
30
kubectl
API Server
RedisCluster Controller
Watch(RedisCluster)
my-redis-cluster
kubectl apply -f my-redis-cluster.yaml
Stateful
Set
UI
service
Service
Account
...
get-status()
example: downscale
● Life Cycle Control
○ Scale Up → Add new pod, Rebalance Data
○ Healing → Restore Backups, Auto Recovery
○ Backup
○ Validations (ex. even # pods)
● Configuration
○ Automate complex deployments (ex. Vault cluster and etcd cluster)
○ Reconfiguration
○ Agnostic configuration (ex. PVC by cloud provider)
● 3rd party resource (ex. prometheus)
● Cross distribution
● Easy to deploy
Why are operators useful?
32
Our Upgrade Flow
In a Redis Enterprise Cluster we need to:
1. Drain pod
2. Stop pod
3. Start new pod
● Downgrade - not supported (oss backward compatibility)
Our Upgrade Flow
With Yaml/Helm -
We used a life cycle preStop hook of a stateful set
1. Encoded inside the yaml - cumbersome
2. Cannot validate version
3. No error handling
With Operator -
1. Maintain logic in code not in a config file
2. Validations: not a downgrade, cluster is not already in an upgrade process
3. Error handling
4. Manage canary deployment
34
crd_cluster.yaml
35
operator.yaml
36
cr.yaml
37
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
labelSelector:
matchExpressions:
key: app
operator: In
values:
{{ template "redisenterprise.name" . }}
key: release
operator: In
values:
{{ .Release.Name }}
key: redis.io/role
operator: In
values:
node
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 31536000
serviceAccountName: {{ template "redisenterprise.serviceAccountName" . }}
{{ with .Values.imagePullSecrets }}
imagePullSecrets:
{{ toYaml . | indent 8 }}
{{ end }}
containers:
name: redis
image: {{ .Values.redisImage.repository }}:{{ .Values.redisImage.tag }}
imagePullPolicy: {{ .Values.redisImage.pullPolicy }}
readinessProbe:
exec:
command:
# check that the node is bootstrapped and that its connected and synced.
bash
c
curl silent localhost:8080/v1/bootstrap && /opt/redislabs/bin/rladmin status
nodes | grep node:$(cat /etc/opt/redislabs/node.id) | grep OK
initialDelaySeconds: 20
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
# enslave the node, if this current node is master, change the master to
the first slave node.
bash
c
/opt/redislabs/bin/rladmin node $(cat /etc/opt/redislabs/node.id) enslave
&& ((/opt/redislabs/bin/rladmin status nodes | grep node:$(cat
/etc/opt/redislabs/node.id) | grep q master) && /opt/redislabs/bin/rlutil
change_master master=$(/opt/redislabs/bin/rladmin status nodes | grep slave |
head 1 | cut d " " f 1| cut d ":" f2) && sleep 10) || /bin/true
resources:
{{ toYaml .Values.redisResources | indent 10 }}
ports:
containerPort: 8001
containerPort: 8443
containerPort: 9443
securityContext:
capabilities:
add:
SYS_RESOURCE
{{ if .Values.persistentVolume.enabled }}
volumeMounts:
mountPath: "/opt/persistent"
name: redisstorage
{{ end }}
env:
name: K8S_ORCHASTRATED_DEPLOYMENT
value: "yes"
name: JOIN_HOSTNAME
value: {{ template "redisenterprise.fullname" . }}
{{ if .Values.persistentVolume.enabled }}
name: PERSISTANCE_PATH
value: /opt/persistent
{{ end }}
name: K8S_SERVICE_NAME
value: {{ template "redisenterprise.fullname" . }}
name: BOOTSTRAP_HANDLE_REDIRECTS
value: "enabled"
name: BOOTSTRAP_CLUSTER_FQDN
value: {{ template "redisenterprise.clusterDNS" . }}
name: BOOTSTRAP_DMC_THREADS
value: "10"
name: BOOTSTRAP_USERNAME
valueFrom:
secretKeyRef:
name: {{ template "redisenterprise.fullname" . }}
key: username
name: BOOTSTRAP_PASSWORD
valueFrom:
secretKeyRef:
name: {{ template "redisenterprise.fullname" . }}
key: password
name: BOOTSTRAP_LICENSE
valueFrom:
secretKeyRef:
name: {{ template "redisenterprise.fullname" . }}
key: license
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: {{ template "redisenterprise.statefulsetName" . }}
labels:
app: {{ template "redisenterprise.name" . }}
chart: {{ template "redisenterprise.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
{{ if .Values.persistentVolume.enabled }}
volumeClaimTemplates:
metadata:
name: redisstorage
labels:
app: {{ template "redisenterprise.name" . }}
chart: {{ template "redisenterprise.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: {{ .Values.persistentVolume.size | quote }}
{{ if .Values.persistentVolume.storageClass }}
{{ if (eq "" .Values.persistentVolume.storageClass) }}
storageClassName: ""
{{ else }}
storageClassName: "{{ .Values.persistentVolume.storageClass }}"
{{ end }}
{{ end }}
{{ end }}
serviceName: {{ template "redisenterprise.fullname" . }}
replicas: {{ .Values.replicas }}
updateStrategy:
type: "RollingUpdate"
template:
metadata:
labels:
redis.io/role: node
app: {{ template "redisenterprise.name" . }}
chart: {{ template "redisenterprise.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
{{ with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{ end }}
38
kubectl create -f cr.yaml
cr.yaml
"apiVersion": "app.redislabs.com/v1alpha1",
"items": [
{
"apiVersion": "app.redislabs.com/v1alpha1",
"kind": "RedisEnterpriseCluster",
"metadata": {
...
"creationTimestamp": "2018-07-12T15:47:31Z",
"generation": 0,
"name": " my-cluster-test",
"namespace": "redis"
},
"spec": {
"nodes": 3,
"serviceAccountName": "my-cluster-test",
"uiServiceType": "ClusterIP",
"username": " demo@redislabs.com "
...
curl http://127.0.0.1:8001/apis/app.redislabs.com/v1alpha1/
redisenterpriseclusters
Operator Development
39
• Started by CoreOS
– CoreOS pioneered by creating a few Operators (Prometheus & vault)
• Operator SDK:
minimize boilerplate and help developers to get started writing Operators
• The Basic API:
– Register Watchers on any Resource
– Create/Read/Update/Delete/Get on any resource
– Register schemas using k8s GO api
• Operator Lifecycle Manager
41
The Reconciliation/Control Loop
• Called for every update, delete or creation on the watched resources
– No way of knowing what type of event except Delete
• Called every X seconds to “resync” resources
• Our responsibility is to allow the user to use our resource as any other in k8s
– AKA idempotent API
• Every call to handle we get our watched resources, we need to determine what to
do exactly
42
Idempotent APIs
Desired State = Current Resource Current State
• Aggregation of deployed
resources
• Internal application
state
43
The Reconciliation/Control Loop - Challenges
• Determine which changes need to happen
• Determine if the change is valid
• K8s doesn’t provide a solid validation before applying changes to CR
– 1.9 has a beta feature for OpenAPI validations
• Long running processes as part of a resource change
Pending
Creation
Running
Invalid
Error
create
create
apply
create
apply
apply
Pending Creation - initial state where cluster is not deployed yet
Running - Cluster Deployed and is either running or starting to run and not ready yet
Invalid - Invalid configuration was requested. E.g. even #nodes. Until a valid configuration is applied the status will remain invalid
Error - Error when trying to deploy or update the Redis Enterprise Cluster
apply
Redis Cluster Status
applyapply
create = kubectl create -f cr.yaml
apply = kubectl apply -f cr.yaml
45
Development Challenges
• Deep understanding of how Kubernetes works (statefulsets, controller, APIs)
• Workflows - Idempotent APIs are challenging due to state mutation
• Double Orchestration - Adds a level of complexity compared to stateless
deployments
• Various SDK issues
https://www.telepresence.io
46
One Last Thing
Demo
We Are Hiring !

More Related Content

PDF
Kubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
PDF
Monitoring Flink with Prometheus
PDF
Migration From Oracle to PostgreSQL
PPTX
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
PDF
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
PDF
Bulk Loading Data into Cassandra
PDF
Oracle to Postgres Migration - part 2
PPTX
Data Engineering Efficiency @ Netflix - Strata 2017
Kubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
Monitoring Flink with Prometheus
Migration From Oracle to PostgreSQL
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Bulk Loading Data into Cassandra
Oracle to Postgres Migration - part 2
Data Engineering Efficiency @ Netflix - Strata 2017

What's hot (20)

PDF
Storing State Forever: Why It Can Be Good For Your Analytics
PDF
The basics of fluentd
PPTX
Apache Unomi presentation and update. By Serge Huber, CTO Jahia
PDF
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
PDF
EDB Postgres DBA Best Practices
 
PDF
Managing Redis with Kubernetes - Kelsey Hightower, Google
PDF
Customer segmentation and marketing automation with Apache Unomi
PPTX
Running MariaDB in multiple data centers
PDF
The basics of fluentd
PDF
Introducing the Apache Flink Kubernetes Operator
PDF
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
PDF
Backup para MySQL
PDF
ASE Tempdb Performance and Tuning
PPTX
Service Discovery In Kubernetes
PPTX
Caching solutions with Redis
PPTX
Kafka at Peak Performance
PDF
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
[OpenInfra Days Korea 2018] (Track 3) Zuul v3 - OpenStack 인프라 코드로 CI/CD 살펴보기
PDF
A guide of PostgreSQL on Kubernetes
Storing State Forever: Why It Can Be Good For Your Analytics
The basics of fluentd
Apache Unomi presentation and update. By Serge Huber, CTO Jahia
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
EDB Postgres DBA Best Practices
 
Managing Redis with Kubernetes - Kelsey Hightower, Google
Customer segmentation and marketing automation with Apache Unomi
Running MariaDB in multiple data centers
The basics of fluentd
Introducing the Apache Flink Kubernetes Operator
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Backup para MySQL
ASE Tempdb Performance and Tuning
Service Discovery In Kubernetes
Caching solutions with Redis
Kafka at Peak Performance
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
Apache Iceberg - A Table Format for Hige Analytic Datasets
[OpenInfra Days Korea 2018] (Track 3) Zuul v3 - OpenStack 인프라 코드로 CI/CD 살펴보기
A guide of PostgreSQL on Kubernetes
Ad

Similar to Orchestrating Redis & K8s Operators (20)

PDF
Redis Meetup TLV - K8s Session 28/10/2018
PDF
Operator Lifecycle Management
PDF
Operator Lifecycle Management
PPTX
Redis on Kubernetes
PDF
RedisDay London 2018 - Layered Orchestration & Redis Enterprise for fun and p...
PPTX
Data weekender deploying prod grade sql 2019 big data clusters
PDF
Splunk: Druid on Kubernetes with Druid-operator
PDF
Dok Talks #124 - Intro to Druid on Kubernetes
PPTX
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
PPTX
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
PDF
Kubernetes Walk Through from Technical View
PDF
Container orchestration from theory to practice
PDF
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
PDF
KubeCon 2017: Kubernetes from Dev to Prod
PDF
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
PPTX
Developing a Redis Module - Hackathon Kickoff
PPTX
Kubernetes Internals
PDF
Hands-On Introduction to Kubernetes at LISA17
PPTX
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
PPTX
Kubernetes #1 intro
Redis Meetup TLV - K8s Session 28/10/2018
Operator Lifecycle Management
Operator Lifecycle Management
Redis on Kubernetes
RedisDay London 2018 - Layered Orchestration & Redis Enterprise for fun and p...
Data weekender deploying prod grade sql 2019 big data clusters
Splunk: Druid on Kubernetes with Druid-operator
Dok Talks #124 - Intro to Druid on Kubernetes
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
Kubernetes Walk Through from Technical View
Container orchestration from theory to practice
DoK Talks #91- Leveraging Druid Operator to manage Apache Druid on Kubernetes
KubeCon 2017: Kubernetes from Dev to Prod
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Developing a Redis Module - Hackathon Kickoff
Kubernetes Internals
Hands-On Introduction to Kubernetes at LISA17
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Kubernetes #1 intro
Ad

More from DoiT International (19)

PPTX
Terraform Modules Restructured
PPTX
GAN training with Tensorflow and Tensor Cores
PPTX
K8s best practices from the field!
PPTX
An Open-Source Platform to Connect, Manage, and Secure Microservices
PDF
Is your Elastic Cluster Stable and Production Ready?
PPTX
Applying ML for Log Analysis
PPTX
GCP for AWS Professionals
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
PPTX
AWS Cyber Security Best Practices
PPTX
Google Cloud Spanner Preview
PPTX
Amazon Athena Hands-On Workshop
PDF
AWS Athena vs. Google BigQuery for interactive SQL Queries
PPTX
Google BigQuery 101 & What’s New
PDF
Running Production-Grade Kubernetes on AWS
PPTX
Scaling Jenkins with Kubernetes by Ami Mahloof
PPTX
CI Implementation with Kubernetes at LivePerson by Saar Demri
PPTX
Kubernetes @ Nanit by Chen Fisher
PDF
Dataflow - A Unified Model for Batch and Streaming Data Processing
PPTX
Kubernetes - State of the Union (Q1-2016)
Terraform Modules Restructured
GAN training with Tensorflow and Tensor Cores
K8s best practices from the field!
An Open-Source Platform to Connect, Manage, and Secure Microservices
Is your Elastic Cluster Stable and Production Ready?
Applying ML for Log Analysis
GCP for AWS Professionals
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
AWS Cyber Security Best Practices
Google Cloud Spanner Preview
Amazon Athena Hands-On Workshop
AWS Athena vs. Google BigQuery for interactive SQL Queries
Google BigQuery 101 & What’s New
Running Production-Grade Kubernetes on AWS
Scaling Jenkins with Kubernetes by Ami Mahloof
CI Implementation with Kubernetes at LivePerson by Saar Demri
Kubernetes @ Nanit by Chen Fisher
Dataflow - A Unified Model for Batch and Streaming Data Processing
Kubernetes - State of the Union (Q1-2016)

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
history of c programming in notes for students .pptx
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PPT
JAVA ppt tutorial basics to learn java programming
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PDF
AI in Product Development-omnex systems
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Introduction to Artificial Intelligence
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PPT
Introduction Database Management System for Course Database
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
System and Network Administraation Chapter 3
VVF-Customer-Presentation2025-Ver1.9.pptx
history of c programming in notes for students .pptx
ISO 45001 Occupational Health and Safety Management System
Operating system designcfffgfgggggggvggggggggg
How Creative Agencies Leverage Project Management Software.pdf
Materi_Pemrograman_Komputer-Looping.pptx
JAVA ppt tutorial basics to learn java programming
Understanding Forklifts - TECH EHS Solution
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
AI in Product Development-omnex systems
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction to Artificial Intelligence
ManageIQ - Sprint 268 Review - Slide Deck
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Introduction Database Management System for Course Database
How to Migrate SBCGlobal Email to Yahoo Easily

Orchestrating Redis & K8s Operators

  • 2. Agenda ● Redis Labs intro and architecture ● Double orchestration ● Our kubernetes solution ● The way to operators ● Operators intro ● Operators development ● Demo
  • 4. Introduction to Redis Enterprise Open source. The leading in-memory database platform, supporting any high performance operational, analytics or hybrid use case. The open source home and commercial provider of Redis Enterprise (Redise ) technology, platform, products & services.
  • 7. Redise - Open Source & Proprietary Technology Redise Node Cluster Manager Redise Cluster • Shared nothing, symmetrical cluster architecture • Fully compatible with open source commands & data structures Enterprise Layer Open Source Layer REST API Zero latency proxy
  • 8. • Faster time to market with continuity between dev/test and production environments that use Redise Pack • Highly available, easier to scale, simpler to manage Redis technology, integrated with orchestration tools such as PCF, Kubernetes, Mesosphere... • Node in a container approach — All Redise services inside each container. Run Redise clusters on single or multiple nodes Redise in Containers
  • 9. Node in a Pod Approach Node 1 Vs Node 2 Node 3 Node 1 Node 2 Node 3 One pod, multiple services per nodeMultiple pods, multiple services per node
  • 12. What’s Double Orchestration ? Kubernetes PKS External Redis Cluster Orchestration Node 1 Redis Shards Node 2 Node N Internal
  • 13. Why like this? • Resource management - Orchestration platforms are designed to be generic. • Again - Performance is king. • Last but not least, it allows us to maintain a common architecture - regardless of running environment, be it bare metal, VM, K8s, Pivotal Cloud Foundry. (.… Surprisingly enough, not everybody in the world uses containers…)
  • 14. Who Does What • Node auto-healing • Node scaling • Failover & scaling • Configuration & monitoring • Service discovery • Upgrade +
  • 15. And specifically on Kubernetes Node in a Pod Statefulset Persistent Volumes Custom Controller Services Manager
  • 17. StatefulSet Our cluster nodes are deployed as part of a statefulset Affinity Allows us to control the Redislabs cluster nodes topology Redislabs Service Manager Create/Update/Delete service entries for each Redis DB hosted on the cluster RBAC The Service Manager must have permissions to access the namespace to create services Ingress Allow access to Redis DBs from outside of the k8s cluster 17 Redis Labs on Kubernetes - Building Blocks
  • 18. StatefulSet • Introduced in 1.5, GAed in 1.9 • Statefulset Pod consistency – Pod naming – Scale-out/Scale-in – Pod Upgrade • Persistent Disks – Same PVC will be used when Pod is (re)scheduled • All Pods are uniform • Recovery from error state pod-0 pod-1 pod-2 PV PV PV
  • 19. Pod features • Anti-affinity – Allows us to control where the pods are being scheduled • Readiness Probes – Allows us to control the action flows to avoid data loss • Pre-stop hook – Drain the node and move resources to a different node
  • 20. Why? • Redis Enterprise is a multi-tenant Redis cluster • Redis Enterprise Database can have 1 or more network endpoints Problem • Expose databases as a service instance Solution • Python based application that will: create, delete or update necessary database service entries • Based on an idempotent reconciliation loop Redis Enterprise Services Manager
  • 21. Kubernetes Cluster Redis Labs Stateful Set Worker Node pod-0 Worker Node pod-2 Worker Node pod-1 Redis Enterprise Cluster PV PV PV K8s API Services Manager Add/Edit/Delete Database Services App App App
  • 22. The way to operators
  • 23. • Provide a solid primary db solution for end-users • Stateful application – Some changes cannot be performed – Some changes need to mutate the state before applying the actual change – Data-loss is unacceptable • Support multiple k8s deployments – Cloud: GKE, AWS, etc – Openshift – PKS – Vanilla – On-prem hardware vendor • Ingress • Packaging Redis Labs Challenges
  • 24. • Started out with 9 static yaml files – Hard to deploy – Hard to maintain – Hard to distribute – No control over the deployment life-cycle • Helm – Customized deployment – Easier to maintain – Not fully supported everywhere – No control over the deployment life-cycle • Operator – Simple deployment (2 yaml files) – Full control over life-cycle – K8s compatible Our journey .yaml .yaml
  • 27. 27 kubectl API Server StatefulSet Controller Watch(StatefulSet) pod-0 pod-1 pod-2 my-sts kubectl create -f my-sts.yaml PV PV PV
  • 28. 28 kubectl StatefulSet Controller Watch(StatefulSet) pod-0 pod-1 pod-2 my-sts kubectl scale statefulset my-sts --replicas=5 pod-3 pod-4 API Server
  • 29. 29 kubectl API Server RedisCluster Controller Watch(RedisCluster) my-redis-cluster kubectl create -f my-redis-cluster.yaml Stateful Set UI service Service Account ...
  • 30. 30 kubectl API Server RedisCluster Controller Watch(RedisCluster) my-redis-cluster kubectl apply -f my-redis-cluster.yaml Stateful Set UI service Service Account ... get-status() example: downscale
  • 31. ● Life Cycle Control ○ Scale Up → Add new pod, Rebalance Data ○ Healing → Restore Backups, Auto Recovery ○ Backup ○ Validations (ex. even # pods) ● Configuration ○ Automate complex deployments (ex. Vault cluster and etcd cluster) ○ Reconfiguration ○ Agnostic configuration (ex. PVC by cloud provider) ● 3rd party resource (ex. prometheus) ● Cross distribution ● Easy to deploy Why are operators useful?
  • 32. 32 Our Upgrade Flow In a Redis Enterprise Cluster we need to: 1. Drain pod 2. Stop pod 3. Start new pod ● Downgrade - not supported (oss backward compatibility)
  • 33. Our Upgrade Flow With Yaml/Helm - We used a life cycle preStop hook of a stateful set 1. Encoded inside the yaml - cumbersome 2. Cannot validate version 3. No error handling With Operator - 1. Maintain logic in code not in a config file 2. Validations: not a downgrade, cluster is not already in an upgrade process 3. Error handling 4. Manage canary deployment
  • 37. 37 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: labelSelector: matchExpressions: key: app operator: In values: {{ template "redisenterprise.name" . }} key: release operator: In values: {{ .Release.Name }} key: redis.io/role operator: In values: node topologyKey: kubernetes.io/hostname terminationGracePeriodSeconds: 31536000 serviceAccountName: {{ template "redisenterprise.serviceAccountName" . }} {{ with .Values.imagePullSecrets }} imagePullSecrets: {{ toYaml . | indent 8 }} {{ end }} containers: name: redis image: {{ .Values.redisImage.repository }}:{{ .Values.redisImage.tag }} imagePullPolicy: {{ .Values.redisImage.pullPolicy }} readinessProbe: exec: command: # check that the node is bootstrapped and that its connected and synced. bash c curl silent localhost:8080/v1/bootstrap && /opt/redislabs/bin/rladmin status nodes | grep node:$(cat /etc/opt/redislabs/node.id) | grep OK initialDelaySeconds: 20 timeoutSeconds: 5 lifecycle: preStop: exec: command: # enslave the node, if this current node is master, change the master to the first slave node. bash c /opt/redislabs/bin/rladmin node $(cat /etc/opt/redislabs/node.id) enslave && ((/opt/redislabs/bin/rladmin status nodes | grep node:$(cat /etc/opt/redislabs/node.id) | grep q master) && /opt/redislabs/bin/rlutil change_master master=$(/opt/redislabs/bin/rladmin status nodes | grep slave | head 1 | cut d " " f 1| cut d ":" f2) && sleep 10) || /bin/true resources: {{ toYaml .Values.redisResources | indent 10 }} ports: containerPort: 8001 containerPort: 8443 containerPort: 9443 securityContext: capabilities: add: SYS_RESOURCE {{ if .Values.persistentVolume.enabled }} volumeMounts: mountPath: "/opt/persistent" name: redisstorage {{ end }} env: name: K8S_ORCHASTRATED_DEPLOYMENT value: "yes" name: JOIN_HOSTNAME value: {{ template "redisenterprise.fullname" . }} {{ if .Values.persistentVolume.enabled }} name: PERSISTANCE_PATH value: /opt/persistent {{ end }} name: K8S_SERVICE_NAME value: {{ template "redisenterprise.fullname" . }} name: BOOTSTRAP_HANDLE_REDIRECTS value: "enabled" name: BOOTSTRAP_CLUSTER_FQDN value: {{ template "redisenterprise.clusterDNS" . }} name: BOOTSTRAP_DMC_THREADS value: "10" name: BOOTSTRAP_USERNAME valueFrom: secretKeyRef: name: {{ template "redisenterprise.fullname" . }} key: username name: BOOTSTRAP_PASSWORD valueFrom: secretKeyRef: name: {{ template "redisenterprise.fullname" . }} key: password name: BOOTSTRAP_LICENSE valueFrom: secretKeyRef: name: {{ template "redisenterprise.fullname" . }} key: license apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: {{ template "redisenterprise.statefulsetName" . }} labels: app: {{ template "redisenterprise.name" . }} chart: {{ template "redisenterprise.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }} spec: {{ if .Values.persistentVolume.enabled }} volumeClaimTemplates: metadata: name: redisstorage labels: app: {{ template "redisenterprise.name" . }} chart: {{ template "redisenterprise.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }} spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: {{ .Values.persistentVolume.size | quote }} {{ if .Values.persistentVolume.storageClass }} {{ if (eq "" .Values.persistentVolume.storageClass) }} storageClassName: "" {{ else }} storageClassName: "{{ .Values.persistentVolume.storageClass }}" {{ end }} {{ end }} {{ end }} serviceName: {{ template "redisenterprise.fullname" . }} replicas: {{ .Values.replicas }} updateStrategy: type: "RollingUpdate" template: metadata: labels: redis.io/role: node app: {{ template "redisenterprise.name" . }} chart: {{ template "redisenterprise.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }} spec: {{ with .Values.nodeSelector }} nodeSelector: {{ toYaml . | indent 8 }} {{ end }}
  • 38. 38 kubectl create -f cr.yaml cr.yaml "apiVersion": "app.redislabs.com/v1alpha1", "items": [ { "apiVersion": "app.redislabs.com/v1alpha1", "kind": "RedisEnterpriseCluster", "metadata": { ... "creationTimestamp": "2018-07-12T15:47:31Z", "generation": 0, "name": " my-cluster-test", "namespace": "redis" }, "spec": { "nodes": 3, "serviceAccountName": "my-cluster-test", "uiServiceType": "ClusterIP", "username": " demo@redislabs.com " ... curl http://127.0.0.1:8001/apis/app.redislabs.com/v1alpha1/ redisenterpriseclusters
  • 40. • Started by CoreOS – CoreOS pioneered by creating a few Operators (Prometheus & vault) • Operator SDK: minimize boilerplate and help developers to get started writing Operators • The Basic API: – Register Watchers on any Resource – Create/Read/Update/Delete/Get on any resource – Register schemas using k8s GO api • Operator Lifecycle Manager
  • 41. 41 The Reconciliation/Control Loop • Called for every update, delete or creation on the watched resources – No way of knowing what type of event except Delete • Called every X seconds to “resync” resources • Our responsibility is to allow the user to use our resource as any other in k8s – AKA idempotent API • Every call to handle we get our watched resources, we need to determine what to do exactly
  • 42. 42 Idempotent APIs Desired State = Current Resource Current State • Aggregation of deployed resources • Internal application state
  • 43. 43 The Reconciliation/Control Loop - Challenges • Determine which changes need to happen • Determine if the change is valid • K8s doesn’t provide a solid validation before applying changes to CR – 1.9 has a beta feature for OpenAPI validations • Long running processes as part of a resource change
  • 44. Pending Creation Running Invalid Error create create apply create apply apply Pending Creation - initial state where cluster is not deployed yet Running - Cluster Deployed and is either running or starting to run and not ready yet Invalid - Invalid configuration was requested. E.g. even #nodes. Until a valid configuration is applied the status will remain invalid Error - Error when trying to deploy or update the Redis Enterprise Cluster apply Redis Cluster Status applyapply create = kubectl create -f cr.yaml apply = kubectl apply -f cr.yaml
  • 45. 45 Development Challenges • Deep understanding of how Kubernetes works (statefulsets, controller, APIs) • Workflows - Idempotent APIs are challenging due to state mutation • Double Orchestration - Adds a level of complexity compared to stateless deployments • Various SDK issues
  • 47. Demo