SlideShare a Scribd company logo
Large-Scale
Automated Storage
on Kubernetes
Matt Schallert @mattschallert
SRE @ Uber
About Me
● SRE @ Uber since 2016
● M3 - Uber’s OSS Metrics Platform
@mattschallert
Agenda
● Background: Metrics @ Uber
● Scaling Challenges
● Automation
● M3DB on Kubernetes
@mattschallert
2015
@mattschallert
Mid 2016
@mattschallert
● Cassandra stack scaled to needs but was expensive
○ Human cost: operational overhead
○ Infra cost: lots of high-IO servers
● Scaling challenges inevitable, needed to reduce
complexity
Mid 2016
@mattschallert
M3DB
● Open source distributed time series database
● Highly compressed (11x), fast, compaction-free storage
● Sharded, replicated (zone & rack-aware)
● Production late 2016
@mattschallert
M3DB
● Reduced operational overhead
● Runs on commodity hardware with SSDs
● Strongly consistent cluster membership backed by etcd
@mattschallert
m3cluster
● Reusable cluster management libraries created for M3DB
○ Topology representation, shard distribution algos
○ Service discovery, runtime config, leader election
○ Etcd (strongly consistent KV store) as source of truth
for all components
● Similar to Apache Helix
○ Helix is Java-heavy, we’re a Go-based stack
m3cluster
● Topologies represented as desired goal states, M3DB
converges
@mattschallert
m3cluster: Placement
@mattschallert
Late 2016
@mattschallert
Post-M3DB Deployment
● M3DB solved most pressing issue (Cassandra)
○ Improved stability, reduced costs
● System as a whole: still a high operational overhead
@mattschallert
Fragmented representation of components
@mattschallert
System Representation (2016)
@mattschallert
System Representation (2016)
System Representation (2016)
System Representation (2016)
System Representation (2016)
System Representation
● Systems represented as…
○ Lists fetched from Uber config service
○ Static lists deployed via Puppet
○ Static lists coupled with service deploys
○ m3cluster placements
@mattschallert
Fragmentation
● Had smaller components rather than one unifying view
of the system as a whole
● Difficult to reason about
○ “Where do I need to make this change?”
○ Painful onboarding for new engineers
● Impediment to automation
○ Replace host: update config, build N services, deploy
System Representation (2016)
System Representation: 2018
@mattschallert
m3cluster benefits
● Common libraries for all workloads: stateful (M3DB),
semi-stateful (aggregation tier), stateless (ingestion)
○ Implement “distribute shards across racks according
to disk size” once, reuse everywhere
● Single source of truth: everything stored in etcd
@mattschallert
Operational Improvements
● Easier to reason about the entire system
○ Unifying view: Placements
● Operations much easier, possible to automate
○ Just modifying goal state in etcd
● One config per deployment (no instance-specific)
○ Cattle vs. pets: instances aren’t special, treat them as
a whole
M3DB Goal State
● Shard assignments stored in etcd
○ M3DB nodes observe desired state, react, update
shard state
● Imperative actions such as “add a node” are really
changes in declarative state
○ Easy for both people or tooling to interact with
@mattschallert
M3DB: Adding a Node
@mattschallert
M3DB: Adding a Node
M3DB: Adding a Node
@mattschallert
M3DB: Adding a Node
@mattschallert
M3DB Goal State
● Goal-state primitives built with future automation in mind
● m3cluster interacts with single source of truth
● Vs. imperative changes:
○ No instance-level operations
○ No state or steps to keep track of: can always
reconcile state of world (restarts, etc. safe)
@mattschallert
System Representation: 2018
@mattschallert
M3DB Operations
● Retooled entire stack as goal state-driven, dynamic
○ Operations simplified, but still triggered by a person
● M3DB requires more care to manage than stateless
○ Takes time to stream data
@mattschallert
2016
Clusters
2
@mattschallert
2018
Clusters
40+
@mattschallert
Automating M3DB
● Wanted goal state at a higher level
○ Clusters as cattle
● M3DB can only do so much; needed to expand scope
● Chose to build on top of Kubernetes
○ Value for M3DB OSS community
○ Internal Uber use cases
@mattschallert
Kubernetes @ 35,000 ft
● Container orchestration system
● Support for stateful workloads
● Self-healing
● Extensible
○ CRD: define your own API resources
○ Operator: custom controller
@mattschallert
● Pod: base unit of work
○ Group of like containers deployed together
● Pods can come and go
○ Self-healing → pods move in response to failures
● No pods are special
Kubernetes Driving Principles
@mattschallert
Kubernetes Controllers
for {
desired := getDesiredState()
current := getCurrentState()
makeChanges(desired, current)
}
Writing Controllers Guide
M3DB & Kubernetes
M3DB:
● Goal state driven
● Single source of truth
● Nodes work to converge
Kubernetes:
● Goal state driven
● Single source of truth
● Components work to
converge
@mattschallert
M3DB Operator
● Manages M3DB clusters running on Kubernetes
○ Creation, deletion, scaling, maintenance
● Performs actions a person would have to do in full
cluster lifecycle
○ Updating M3DB clusters
○ Adding more instances
@mattschallert
M3DB Operator
● Users define desired M3DB clusters
○ “9 node cluster, spread across 3 zones, with remote
SSDs attached”
● Operator updates desired Kubernetes state, waits for
convergence, updates M3DB desired state
@mattschallert
M3DB Operator: Create Example
@mattschallert
@mattschallert
@mattschallert
@mattschallert
M3DB Operator: Day 2 Operations
● Scaling a cluster
○ M3DB adds 1-by-1, operation handles many
● Replacing instances, updating config, etc.
● All operations follow same pattern
○ Look at Kubernetes state, look at M3DB state, affect
change
@mattschallert
M3DB Operator: Success Factors
● Bridge between goal state-driven APIS (anti-footgun)
○ Can’t accidentally remove more pods than desired
○ Can’t accidentally remove two M3DB instances
○ Operator can be restarted, pick back up
● Share similar concepts, expanded scope
○ Failure domains, resource quantities, etc.
● Finding common abstractions made mental model easier
○ Implementation can be separate, interface the same
● + Goal-state driven approach simplified operations
○ External factors (failures, deploys) don’t disrupt steady
state
Lessons Learned: M3
@mattschallert
● Embracing Kubernetes principles in M3DB made
Kubernetes onboarding easier
○ No pods are special, may move around
○ Self-healing after failures
● Instances as cattle, not pets
○ No instance-level operations or config
Lessons Learned: Kubernetes
@mattschallert
● Any orchestration system implies complexity
○ Your own context: is it worth it?
○ How does your own system respond to failures,
rescheduling, etc.?
● Maybe not a good fit if have single special instances
○ M3DB can tolerate failures
Considerations
@mattschallert
● Requirements & guarantees of your system will influence
your requirements for automation
○ M3DB strongly consistent: wanted strongly
consistent cluster membership
Considerations
@mattschallert
● Building common abstractions for all our workloads
eased mental model
● Following goal-state driven approach eased automation
● Closely aligned with Kubernetes principles
Recap
@mattschallert
● github.com/m3db/m3
○ M3DB, m3cluster, m3aggregator
○ github.com/m3db/m3db-operator
● m3db.io & docs.m3db.io
● Please leave feedback on O’Reilly site!
Thank you!
@mattschallert
Large-Scale Automated Storage on Kubernetes - Matt Schallert OSCON 2019
Pre-2015
Early 2015
M3DB Operator: Today
● [ THIS SLIDE IS SKIPPED ]
● https://github.com/m3db/m3db-operator
● Managing 1 M3DB clusters ~as easy as 10
Late 2015
Mid 2016
Post-M3DB Deployment
System Interfaces
System Interfaces
● Ingestion path used multiple text-based line protocols
○ Difficult to encode metric metadata
○ Redundant serialization costs
● Storage layers used various RPC formats
System Representation (2016)
M3DB Operator [SLIDE SKIPPED]
ROUGH DRAFT DIAGRAM
● [ Transition to Kubernetes, possibly using edge vs. level
triggered ]
○ Level triggered: reacting to current state. Actions
such as “replace this node” can’t be missed.
○ Edge triggered: reacting to events. Events can be
missed.
● [ Maybe something about idempotency requirement ? ]
Edge vs. Level Triggered Logic
m3cluster: Placement
message Instance {
string id
string isolation_group
string zone
uint32 weight
string endpoint
repeated Shard shards
...
}
message Placement {
map<string, Instance> instances
uint32 replica_factor
uint32 num_shards
bool is_sharded
}
m3cluster: Shard
enum ShardState {
INITIALIZING = 0;
AVAILABLE = 1;
LEAVING = 2;
}
message Shard {
uint32 id = 1;
ShardState state = 2;
string source_id = 3;
}
M3cluster: all
enum ShardState {
INITIALIZING = 0;
AVAILABLE = 1;
LEAVING = 2;
}
message Shard {
uint32 id = 1;
ShardState state = 2;
string source_id = 3;
}
message Instance {
string id = 1;
string isolation_group = 2;
string zone = 3;
uint32 weight = 4;
string endpoint = 5;
repeated Shard shards = 6;
...
}
message Placement {
map<string, Instance> instances = 1;
uint32 replica_factor = 2;
uint32 num_shards = 3;
bool is_sharded = 4;
}
● Consistency of operations
○ Many actors in the system
○ Failures can happen, may re-reconcile states
○ Kubernetes & m3cluster leverage etcd
Prerequisites for Orchestration
● [ Lessons learned orchestrating M3DB generally ]
● [ Learnings specific to Kubernetes ]
Lessons Learned

More Related Content

PPTX
Amazon aws big data demystified | Introduction to streaming and messaging flu...
PPTX
CrateDB - Giacomo Ceribelli
PPTX
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
PPTX
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
PDF
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
PPTX
Introduction to NoSql
PPTX
Zeppelin and spark sql demystified
PPTX
Cassandra Lunch #59 Functions in Cassandra
Amazon aws big data demystified | Introduction to streaming and messaging flu...
CrateDB - Giacomo Ceribelli
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Introduction to NoSql
Zeppelin and spark sql demystified
Cassandra Lunch #59 Functions in Cassandra

What's hot (20)

PDF
Stateful streaming data pipelines
PDF
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
PDF
Mongodb meetup
ODP
Cassandra - Tips And Techniques
PDF
Taskerman - a distributed cluster task manager
PDF
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PPTX
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
PPTX
What Kiwi.com Has Learned Running ScyllaDB and Go
PPTX
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
PDF
2013 05 ny
PDF
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
PPTX
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
PDF
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
PDF
The architecture of SkySQL
PDF
Event Driven Microservices
PDF
Databases and how to choose them
PDF
ScyllaDB @ Apache BigData, may 2016
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PPTX
Log Events @Twitter
Stateful streaming data pipelines
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
Mongodb meetup
Cassandra - Tips And Techniques
Taskerman - a distributed cluster task manager
PostgreSQL on AWS: Tips & Tricks (and horror stories)
ScyllaDB: NoSQL at Ludicrous Speed
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
What Kiwi.com Has Learned Running ScyllaDB and Go
Apache Cassandra Lunch #52: Airflow and Cassandra for Cluster Management
2013 05 ny
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
The architecture of SkySQL
Event Driven Microservices
Databases and how to choose them
ScyllaDB @ Apache BigData, may 2016
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Log Events @Twitter
Ad

Similar to Large-Scale Automated Storage on Kubernetes - Matt Schallert OSCON 2019 (20)

PDF
Building RESTtful services in MEAN
PDF
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
PDF
Introducing the ultimate MariaDB cloud, SkySQL
PDF
Running Cassandra in AWS
PDF
Running MySQL in AWS
PDF
Big data Argentina meetup 2020-09: Intro to presto on docker
PDF
What to expect from MariaDB Platform X5, part 1
PDF
Elasticsearch as a time series database
PDF
Introducing MagnetoDB, a key-value storage sevice for OpenStack
PPTX
Megastore by Google
PDF
NetflixOSS Meetup season 3 episode 1
PDF
Enabling Presto to handle massive scale at lightning speed
PDF
Scaling Monitoring At Databricks From Prometheus to M3
PDF
MariaDB MaxScale: an Intelligent Database Proxy
PDF
MariaDB MaxScale: an Intelligent Database Proxy
PDF
Enabling presto to handle massive scale at lightning speed
PDF
Evolution of DBA in the Cloud Era
PDF
Design Summit - Automate roadmap - Madhu Kanoor
PDF
Improving Apache Spark's Reliability with DataSourceV2
PDF
Scalable complex event processing on samza @UBER
Building RESTtful services in MEAN
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Introducing the ultimate MariaDB cloud, SkySQL
Running Cassandra in AWS
Running MySQL in AWS
Big data Argentina meetup 2020-09: Intro to presto on docker
What to expect from MariaDB Platform X5, part 1
Elasticsearch as a time series database
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Megastore by Google
NetflixOSS Meetup season 3 episode 1
Enabling Presto to handle massive scale at lightning speed
Scaling Monitoring At Databricks From Prometheus to M3
MariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database Proxy
Enabling presto to handle massive scale at lightning speed
Evolution of DBA in the Cloud Era
Design Summit - Automate roadmap - Madhu Kanoor
Improving Apache Spark's Reliability with DataSourceV2
Scalable complex event processing on samza @UBER
Ad

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Hybrid model detection and classification of lung cancer
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
OMC Textile Division Presentation 2021.pptx
A comparative study of natural language inference in Swahili using monolingua...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles - August'25-Week II
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Getting Started with Data Integration: FME Form 101
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Web App vs Mobile App What Should You Build First.pdf
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Enhancing emotion recognition model for a student engagement use case through...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Encapsulation_ Review paper, used for researhc scholars
Heart disease approach using modified random forest and particle swarm optimi...
A comparative analysis of optical character recognition models for extracting...
Hybrid model detection and classification of lung cancer
SOPHOS-XG Firewall Administrator PPT.pptx

Large-Scale Automated Storage on Kubernetes - Matt Schallert OSCON 2019

  • 1. Large-Scale Automated Storage on Kubernetes Matt Schallert @mattschallert SRE @ Uber
  • 2. About Me ● SRE @ Uber since 2016 ● M3 - Uber’s OSS Metrics Platform @mattschallert
  • 3. Agenda ● Background: Metrics @ Uber ● Scaling Challenges ● Automation ● M3DB on Kubernetes @mattschallert
  • 6. ● Cassandra stack scaled to needs but was expensive ○ Human cost: operational overhead ○ Infra cost: lots of high-IO servers ● Scaling challenges inevitable, needed to reduce complexity Mid 2016 @mattschallert
  • 7. M3DB ● Open source distributed time series database ● Highly compressed (11x), fast, compaction-free storage ● Sharded, replicated (zone & rack-aware) ● Production late 2016 @mattschallert
  • 8. M3DB ● Reduced operational overhead ● Runs on commodity hardware with SSDs ● Strongly consistent cluster membership backed by etcd @mattschallert
  • 9. m3cluster ● Reusable cluster management libraries created for M3DB ○ Topology representation, shard distribution algos ○ Service discovery, runtime config, leader election ○ Etcd (strongly consistent KV store) as source of truth for all components ● Similar to Apache Helix ○ Helix is Java-heavy, we’re a Go-based stack
  • 10. m3cluster ● Topologies represented as desired goal states, M3DB converges @mattschallert
  • 13. Post-M3DB Deployment ● M3DB solved most pressing issue (Cassandra) ○ Improved stability, reduced costs ● System as a whole: still a high operational overhead @mattschallert
  • 14. Fragmented representation of components @mattschallert
  • 20. System Representation ● Systems represented as… ○ Lists fetched from Uber config service ○ Static lists deployed via Puppet ○ Static lists coupled with service deploys ○ m3cluster placements @mattschallert
  • 21. Fragmentation ● Had smaller components rather than one unifying view of the system as a whole ● Difficult to reason about ○ “Where do I need to make this change?” ○ Painful onboarding for new engineers ● Impediment to automation ○ Replace host: update config, build N services, deploy
  • 24. m3cluster benefits ● Common libraries for all workloads: stateful (M3DB), semi-stateful (aggregation tier), stateless (ingestion) ○ Implement “distribute shards across racks according to disk size” once, reuse everywhere ● Single source of truth: everything stored in etcd @mattschallert
  • 25. Operational Improvements ● Easier to reason about the entire system ○ Unifying view: Placements ● Operations much easier, possible to automate ○ Just modifying goal state in etcd ● One config per deployment (no instance-specific) ○ Cattle vs. pets: instances aren’t special, treat them as a whole
  • 26. M3DB Goal State ● Shard assignments stored in etcd ○ M3DB nodes observe desired state, react, update shard state ● Imperative actions such as “add a node” are really changes in declarative state ○ Easy for both people or tooling to interact with @mattschallert
  • 27. M3DB: Adding a Node @mattschallert
  • 29. M3DB: Adding a Node @mattschallert
  • 30. M3DB: Adding a Node @mattschallert
  • 31. M3DB Goal State ● Goal-state primitives built with future automation in mind ● m3cluster interacts with single source of truth ● Vs. imperative changes: ○ No instance-level operations ○ No state or steps to keep track of: can always reconcile state of world (restarts, etc. safe) @mattschallert
  • 33. M3DB Operations ● Retooled entire stack as goal state-driven, dynamic ○ Operations simplified, but still triggered by a person ● M3DB requires more care to manage than stateless ○ Takes time to stream data @mattschallert
  • 36. Automating M3DB ● Wanted goal state at a higher level ○ Clusters as cattle ● M3DB can only do so much; needed to expand scope ● Chose to build on top of Kubernetes ○ Value for M3DB OSS community ○ Internal Uber use cases @mattschallert
  • 37. Kubernetes @ 35,000 ft ● Container orchestration system ● Support for stateful workloads ● Self-healing ● Extensible ○ CRD: define your own API resources ○ Operator: custom controller @mattschallert
  • 38. ● Pod: base unit of work ○ Group of like containers deployed together ● Pods can come and go ○ Self-healing → pods move in response to failures ● No pods are special Kubernetes Driving Principles @mattschallert
  • 39. Kubernetes Controllers for { desired := getDesiredState() current := getCurrentState() makeChanges(desired, current) } Writing Controllers Guide
  • 40. M3DB & Kubernetes M3DB: ● Goal state driven ● Single source of truth ● Nodes work to converge Kubernetes: ● Goal state driven ● Single source of truth ● Components work to converge @mattschallert
  • 41. M3DB Operator ● Manages M3DB clusters running on Kubernetes ○ Creation, deletion, scaling, maintenance ● Performs actions a person would have to do in full cluster lifecycle ○ Updating M3DB clusters ○ Adding more instances @mattschallert
  • 42. M3DB Operator ● Users define desired M3DB clusters ○ “9 node cluster, spread across 3 zones, with remote SSDs attached” ● Operator updates desired Kubernetes state, waits for convergence, updates M3DB desired state @mattschallert
  • 43. M3DB Operator: Create Example @mattschallert
  • 47. M3DB Operator: Day 2 Operations ● Scaling a cluster ○ M3DB adds 1-by-1, operation handles many ● Replacing instances, updating config, etc. ● All operations follow same pattern ○ Look at Kubernetes state, look at M3DB state, affect change @mattschallert
  • 48. M3DB Operator: Success Factors ● Bridge between goal state-driven APIS (anti-footgun) ○ Can’t accidentally remove more pods than desired ○ Can’t accidentally remove two M3DB instances ○ Operator can be restarted, pick back up ● Share similar concepts, expanded scope ○ Failure domains, resource quantities, etc.
  • 49. ● Finding common abstractions made mental model easier ○ Implementation can be separate, interface the same ● + Goal-state driven approach simplified operations ○ External factors (failures, deploys) don’t disrupt steady state Lessons Learned: M3 @mattschallert
  • 50. ● Embracing Kubernetes principles in M3DB made Kubernetes onboarding easier ○ No pods are special, may move around ○ Self-healing after failures ● Instances as cattle, not pets ○ No instance-level operations or config Lessons Learned: Kubernetes @mattschallert
  • 51. ● Any orchestration system implies complexity ○ Your own context: is it worth it? ○ How does your own system respond to failures, rescheduling, etc.? ● Maybe not a good fit if have single special instances ○ M3DB can tolerate failures Considerations @mattschallert
  • 52. ● Requirements & guarantees of your system will influence your requirements for automation ○ M3DB strongly consistent: wanted strongly consistent cluster membership Considerations @mattschallert
  • 53. ● Building common abstractions for all our workloads eased mental model ● Following goal-state driven approach eased automation ● Closely aligned with Kubernetes principles Recap @mattschallert
  • 54. ● github.com/m3db/m3 ○ M3DB, m3cluster, m3aggregator ○ github.com/m3db/m3db-operator ● m3db.io & docs.m3db.io ● Please leave feedback on O’Reilly site! Thank you! @mattschallert
  • 58. M3DB Operator: Today ● [ THIS SLIDE IS SKIPPED ] ● https://github.com/m3db/m3db-operator ● Managing 1 M3DB clusters ~as easy as 10
  • 63. System Interfaces ● Ingestion path used multiple text-based line protocols ○ Difficult to encode metric metadata ○ Redundant serialization costs ● Storage layers used various RPC formats
  • 65. M3DB Operator [SLIDE SKIPPED] ROUGH DRAFT DIAGRAM
  • 66. ● [ Transition to Kubernetes, possibly using edge vs. level triggered ] ○ Level triggered: reacting to current state. Actions such as “replace this node” can’t be missed. ○ Edge triggered: reacting to events. Events can be missed. ● [ Maybe something about idempotency requirement ? ] Edge vs. Level Triggered Logic
  • 67. m3cluster: Placement message Instance { string id string isolation_group string zone uint32 weight string endpoint repeated Shard shards ... } message Placement { map<string, Instance> instances uint32 replica_factor uint32 num_shards bool is_sharded }
  • 68. m3cluster: Shard enum ShardState { INITIALIZING = 0; AVAILABLE = 1; LEAVING = 2; } message Shard { uint32 id = 1; ShardState state = 2; string source_id = 3; }
  • 69. M3cluster: all enum ShardState { INITIALIZING = 0; AVAILABLE = 1; LEAVING = 2; } message Shard { uint32 id = 1; ShardState state = 2; string source_id = 3; } message Instance { string id = 1; string isolation_group = 2; string zone = 3; uint32 weight = 4; string endpoint = 5; repeated Shard shards = 6; ... } message Placement { map<string, Instance> instances = 1; uint32 replica_factor = 2; uint32 num_shards = 3; bool is_sharded = 4; }
  • 70. ● Consistency of operations ○ Many actors in the system ○ Failures can happen, may re-reconcile states ○ Kubernetes & m3cluster leverage etcd Prerequisites for Orchestration
  • 71. ● [ Lessons learned orchestrating M3DB generally ] ● [ Learnings specific to Kubernetes ] Lessons Learned