Cassandra Operator with
Yelp PaaSTA
Raghavendra Prabhu & Matthew Mead-Briggs
JetStack Connect Nov 2019
Yelp’s Mission
Connecting people with great
local businesses.
Powered by Cassandra
Your Reviews
1
2
1
Waitlist for diners2
Overview
● Why Kubernetes
● Status quo with Cassandra
● Cassandra Operator
● Pain Points
● Future
What is PaaSTA
github.com/yelp/paasta github.com/yelp/clusterman
Why K8s
● Why Kubernetes
○ Why not persist with plain EC2
● Why DIY the operator
Status Quo of Cassandra @ Yelp
● What is Cassandra
● Roughly a hundred clusters on AWS ASG
● New cluster launches with k8s
○ Batteries included: Good defaults, TLS
● Migration strategy in place
○ Backward compatible discovery mechanism
● K8s clusters deployed on spot fleet as well.
● Local k8s development cluster with
https://github.com/kubernetes-sigs/kind
Cassandra
Operator
us-west-2
us-west-1
us-east-1
Yelp Cassandra @ 100000 ft
Multi-region Cluster
us-west-2
us-west-1
us-east-1
Yelp Cassandra @ 100000 ft + Operator
Multi-region Cluster
State of Cassandra Operator at Yelp
● Cassandra cluster specification
● What is in a Cassandra Pod
● Storage aka State
● Reconciliation
○ StatefulSet
○ Core event loop
● Deployment
The Recipe aka the Cluster Spec
Smartstack Seed Provider
Synapse
Client
Service
HAProxy
Nerve
Service
ZK
Cassandra Operator with Yelp PaaSTA
Cassandra Pod
● Cassandra container
● Sidecars
○ Hacheck for Nerve (Smartstack)
○ Cron Jobs
○ Sensu alerting
● Node: metrics collection
○ Puppet
To sidecar or not to sidecar
● Emit data to host/external
service
● Collect data from process in
hosts namespace
● Sidecar collects data
Storage aka State
● Dynamic Provisioning
○ StorageClass per cluster
○ “Compute follows Data”
■ Immediate Volume Binding Mode
■ Stripe cluster across AZs
● EBS for Cassandra
○ Clear separation of stateful and stateless
○ Makes it easy to delete statefulsets
○ Bouncing the cluster is also quite fast
Storage aka State
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
Hash-based reconciliation
● Compute hash of Pod Template
● Attach as label to the StatefulSet
● Compare label on existing StatefulSet to newly computed
Cluster Readiness
● Cluster ready = AND(pod readiness) over all
○ Service Readiness
● Readiness per pod: UN in Cassandra
● Liveness check: U for Cassandra
● Hooks
○ Draining
Locking
● Clusters are multi-region
● Operators are per-region
● Non-federated setup
● Coordination with etcd leases
● LeaseID stored in Custom Resource Status
IAM roles
● For cassandra we need access to S3 and dynamoDB for
backups
● https://github.com/uswitch/kiam
● Proxies the EC2 metadata service for Pods
● Allows us to lift and shift IAM profiles from EC2
PaaSTA Secrets Support
● PaaSTA on Mesos already supports secrets
● User friendly cli to “create” secrets
● Use Vault’s transit endpoint to encrypt
● Sync these secrets into kubernetes Secrets
● Cassandra is using these for TLS secrets
$ echo "SOMETHINGSECRET" | paasta secret add -s cassandra_k8s -n
secret-name-here -c norcal-devc
Deployment / PaaSTA integration
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
Migration
● Launching new clusters in k8s is easy
● Migration of existing clusters without downtime is hard
● Unified discovery with smartstack
● How: Add k8s nodes to existing Cassandra cluster
● We have migrated a few already!
Pain Points
Pain points
● Client-side Validation
● Statefulset inflexibility with changes
○ Manual intervention for stuck statefulset deployments
○ Resizing the Persistent Volume
○ Orphaned EBS volumes
● Unready/Dead Nodes, Spot fleet and garbage collection
Cassandra Operator with Yelp PaaSTA
Heading towards >
● Load-based autoscaling for Cassandra pods
● EBS snapshotting automation
● Fleet autoscaling with Clusterman
● Better integration tests for the operator and for the clusters!
● Production clusters on AWS spot fleet
● More workloads on kubernetes (just started our Kafka operator)
Conclusion
We're Hiring!
www.yelp.com/careers/
Cassandra Operator with Yelp PaaSTA
Questions?
Credits
● Apache cassandra logo
● https://kubernetes.io/
● https://etcd.io/
● https://aws.amazon.com/architecture/icons/
● https://www.yelp.com/brand
● https://thenounproject.com/
● https://commons.wikimedia.org/wiki/File:Back-to-the-future-logo.svg
● https://www.writeups.org/star-trek-brent-spiner-data/
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp

More Related Content

PDF
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
PDF
Cassandra in Docker at Yelp: Opportunities and Challenges
PDF
The Concierge Paradigm
PPTX
Gett && Golang
PDF
How LogDNA Scaled Elasticsearch on Kubernetes
PPTX
vSphere With OpenStack
PPTX
Open stack HA - Theory to Reality
PPTX
Deployment topologies for high availability (ha)
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Cassandra in Docker at Yelp: Opportunities and Challenges
The Concierge Paradigm
Gett && Golang
How LogDNA Scaled Elasticsearch on Kubernetes
vSphere With OpenStack
Open stack HA - Theory to Reality
Deployment topologies for high availability (ha)

What's hot (20)

PPTX
OpenStack HA
PPTX
Topologies of OpenStack
PDF
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
PDF
An approach for migrating enterprise apps into open stack
PDF
KubeCon US 2021 - Recap - DCMeetup
PDF
Kuryr + open shift
PPTX
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
PDF
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
PDF
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
Intro to Kubernetes
PDF
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
PDF
Kuryr kubernetes: the seamless path to adding pods to your datacenter networking
PDF
Spring Cloud and Netflix OSS overview v1
PPTX
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
PPTX
Data Engineer's Lunch #46: Node.js and API calls
PDF
Netflix Open Source Meetup Season 4 Episode 1
PPTX
Dynomite @ Redis Conference 2016
PDF
Hybrid architecture solutions with kubernetes and the cloud native stack
PPT
OpenStack What's New in Essex
OpenStack HA
Topologies of OpenStack
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
An approach for migrating enterprise apps into open stack
KubeCon US 2021 - Recap - DCMeetup
Kuryr + open shift
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
Netflix Open Source Meetup Season 4 Episode 2
Intro to Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Kuryr kubernetes: the seamless path to adding pods to your datacenter networking
Spring Cloud and Netflix OSS overview v1
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
Data Engineer's Lunch #46: Node.js and API calls
Netflix Open Source Meetup Season 4 Episode 1
Dynomite @ Redis Conference 2016
Hybrid architecture solutions with kubernetes and the cloud native stack
OpenStack What's New in Essex
Ad

Similar to Cassandra Operator with Yelp PaaSTA (20)

PDF
Orchestrating Cassandra with Kubernetes
PDF
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
PDF
Kubernetes Architecture and Introduction – Paris Kubernetes Meetup
PPTX
Webinar: Data Protection for Kubernetes
PDF
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
PDF
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
PPTX
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud - An...
PDF
Managing containers at scale
PDF
CN Asturias - Stateful application for kubernetes
PDF
From Containerized Application to Secure and Scaling With Kubernetes
PDF
Beyond PaaS v.s IaaS: How to Manage Both
PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
PDF
Introduction to Kubernetes Workshop
PDF
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
PPTX
Container orchestration and microservices world
PDF
Kubernetes - Shifting the mindset from servers to containers - microxchg 201...
PPTX
Introduction+to+Kubernetes-Details-D.pptx
PPTX
A brief study on Kubernetes and its components
PPTX
Containerized Hadoop beyond Kubernetes
PDF
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Kubernetes Architecture and Introduction – Paris Kubernetes Meetup
Webinar: Data Protection for Kubernetes
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud - An...
Managing containers at scale
CN Asturias - Stateful application for kubernetes
From Containerized Application to Secure and Scaling With Kubernetes
Beyond PaaS v.s IaaS: How to Manage Both
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Introduction to Kubernetes Workshop
OSDC 2017 | Something Openshift Kubernetes Containers by Kristian Köhntopp
Container orchestration and microservices world
Kubernetes - Shifting the mindset from servers to containers - microxchg 201...
Introduction+to+Kubernetes-Details-D.pptx
A brief study on Kubernetes and its components
Containerized Hadoop beyond Kubernetes
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...
Ad

More from Raghavendra Prabhu (20)

PDF
Safe and Fast Automation on AWS for Fun and Profit
PDF
Pass Elk: CAP Theorem since 90s and Beyond
PDF
Taskerman: A Distributed Cluster Task Manager
PDF
Taskerman - a distributed cluster task manager
PDF
NUMA and Java Databases
PDF
Linux NUMA & Databases: Perils and Opportunities
PPTX
Working from home - fun, facts and scares!
PPTX
Securing databases with systemd for containers and services
PDF
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
PDF
Dock'em: Distributed Systems Testing with NetEm and Docker
PDF
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
PDF
Jutsu or Dô: Open documentation: continuous process than a body
PDF
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
PDF
Corpus collapsum: Partition tolerance of Galera put to test
PDF
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
PDF
Running virtualized Galera instances for fun and profit
PDF
ACIDic Clusters: Review of current relation databases with synchronous replic...
PDF
Percona XtraDB Cluster before every release: Glimpse into CI testing
PDF
Feed me more: MySQL Memory analysed
PDF
Xtrabackup and FTWRL
Safe and Fast Automation on AWS for Fun and Profit
Pass Elk: CAP Theorem since 90s and Beyond
Taskerman: A Distributed Cluster Task Manager
Taskerman - a distributed cluster task manager
NUMA and Java Databases
Linux NUMA & Databases: Perils and Opportunities
Working from home - fun, facts and scares!
Securing databases with systemd for containers and services
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Dock'em: Distributed Systems Testing with NetEm and Docker
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Jutsu or Dô: Open documentation: continuous process than a body
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera put to test
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Running virtualized Galera instances for fun and profit
ACIDic Clusters: Review of current relation databases with synchronous replic...
Percona XtraDB Cluster before every release: Glimpse into CI testing
Feed me more: MySQL Memory analysed
Xtrabackup and FTWRL

Recently uploaded (20)

PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Introduction to Windows Operating System
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Visual explanation of Dijkstra's Algorithm using Python
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PPTX
most interesting chapter in the world ppt
Wondershare Recoverit Full Crack New Version (Latest 2025)
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Matchmaking for JVMs: How to Pick the Perfect GC Partner
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
MCP Security Tutorial - Beginner to Advanced
Introduction to Windows Operating System
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Trending Python Topics for Data Visualization in 2025
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
CNN LeNet5 Architecture: Neural Networks
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Tech Workshop Escape Room Tech Workshop
Visual explanation of Dijkstra's Algorithm using Python
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
most interesting chapter in the world ppt

Cassandra Operator with Yelp PaaSTA

  • 1. Cassandra Operator with Yelp PaaSTA Raghavendra Prabhu & Matthew Mead-Briggs JetStack Connect Nov 2019
  • 2. Yelp’s Mission Connecting people with great local businesses.
  • 3. Powered by Cassandra Your Reviews 1 2 1 Waitlist for diners2
  • 4. Overview ● Why Kubernetes ● Status quo with Cassandra ● Cassandra Operator ● Pain Points ● Future
  • 5. What is PaaSTA github.com/yelp/paasta github.com/yelp/clusterman
  • 6. Why K8s ● Why Kubernetes ○ Why not persist with plain EC2 ● Why DIY the operator
  • 7. Status Quo of Cassandra @ Yelp ● What is Cassandra ● Roughly a hundred clusters on AWS ASG ● New cluster launches with k8s ○ Batteries included: Good defaults, TLS ● Migration strategy in place ○ Backward compatible discovery mechanism ● K8s clusters deployed on spot fleet as well. ● Local k8s development cluster with https://github.com/kubernetes-sigs/kind
  • 9. us-west-2 us-west-1 us-east-1 Yelp Cassandra @ 100000 ft Multi-region Cluster
  • 10. us-west-2 us-west-1 us-east-1 Yelp Cassandra @ 100000 ft + Operator Multi-region Cluster
  • 11. State of Cassandra Operator at Yelp ● Cassandra cluster specification ● What is in a Cassandra Pod ● Storage aka State ● Reconciliation ○ StatefulSet ○ Core event loop ● Deployment
  • 12. The Recipe aka the Cluster Spec
  • 15. Cassandra Pod ● Cassandra container ● Sidecars ○ Hacheck for Nerve (Smartstack) ○ Cron Jobs ○ Sensu alerting ● Node: metrics collection ○ Puppet
  • 16. To sidecar or not to sidecar ● Emit data to host/external service ● Collect data from process in hosts namespace ● Sidecar collects data
  • 17. Storage aka State ● Dynamic Provisioning ○ StorageClass per cluster ○ “Compute follows Data” ■ Immediate Volume Binding Mode ■ Stripe cluster across AZs ● EBS for Cassandra ○ Clear separation of stateful and stateless ○ Makes it easy to delete statefulsets ○ Bouncing the cluster is also quite fast
  • 23. Hash-based reconciliation ● Compute hash of Pod Template ● Attach as label to the StatefulSet ● Compare label on existing StatefulSet to newly computed
  • 24. Cluster Readiness ● Cluster ready = AND(pod readiness) over all ○ Service Readiness ● Readiness per pod: UN in Cassandra ● Liveness check: U for Cassandra ● Hooks ○ Draining
  • 25. Locking ● Clusters are multi-region ● Operators are per-region ● Non-federated setup ● Coordination with etcd leases ● LeaseID stored in Custom Resource Status
  • 26. IAM roles ● For cassandra we need access to S3 and dynamoDB for backups ● https://github.com/uswitch/kiam ● Proxies the EC2 metadata service for Pods ● Allows us to lift and shift IAM profiles from EC2
  • 27. PaaSTA Secrets Support ● PaaSTA on Mesos already supports secrets ● User friendly cli to “create” secrets ● Use Vault’s transit endpoint to encrypt ● Sync these secrets into kubernetes Secrets ● Cassandra is using these for TLS secrets $ echo "SOMETHINGSECRET" | paasta secret add -s cassandra_k8s -n secret-name-here -c norcal-devc
  • 28. Deployment / PaaSTA integration
  • 31. Migration ● Launching new clusters in k8s is easy ● Migration of existing clusters without downtime is hard ● Unified discovery with smartstack ● How: Add k8s nodes to existing Cassandra cluster ● We have migrated a few already!
  • 33. Pain points ● Client-side Validation ● Statefulset inflexibility with changes ○ Manual intervention for stuck statefulset deployments ○ Resizing the Persistent Volume ○ Orphaned EBS volumes ● Unready/Dead Nodes, Spot fleet and garbage collection
  • 35. Heading towards > ● Load-based autoscaling for Cassandra pods ● EBS snapshotting automation ● Fleet autoscaling with Clusterman ● Better integration tests for the operator and for the clusters! ● Production clusters on AWS spot fleet ● More workloads on kubernetes (just started our Kafka operator)
  • 40. Credits ● Apache cassandra logo ● https://kubernetes.io/ ● https://etcd.io/ ● https://aws.amazon.com/architecture/icons/ ● https://www.yelp.com/brand ● https://thenounproject.com/ ● https://commons.wikimedia.org/wiki/File:Back-to-the-future-logo.svg ● https://www.writeups.org/star-trek-brent-spiner-data/