Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021

Pulsar Virtual Summit North America 2021
Apache BookKeeper State Store:
A Durable Key-Value Store

Prashant Kumar
Principal Software Engineer @ Splunk
● Principal Software Developer at Splunk.
● Ex Yahoo, Verizon Media.
● Prior Experience, Key member and
contributor to Sherpa, the geographically
replicated multi tenant key value store @
Yahoo.

Agenda
I. Introduction to Apache Bookkeeper Statestore.
II. Why yet another KV store?
III. How does it fit into Pulsar ecosystem?
IV. Intended use case.
V. Brief architecture.
VI. Current state and production worthiness.
VII. Product roadmap and future work.

Apache Bookkeeper Statestore
● It’s a Key-Value store.
● It’s durable.
● It’s locally replicated.
● It’s eventually consistent.
● It’s fault tolerant.
● It’s cloud native and k8s based deployment.

Argh!. Yet another KV store?

Integral part of Apache Pulsar ecosystem
● Uses same Zookeeper deployment that Pulsar uses.
● Uses same Bookkeeper deployment that Pulsar uses.
● Uses same infrastructure for metrics, dashboards etc as
Bookkeeper
● Part of bookkeeper/stream code base.
● Existing client side integration in Apache Pulsar function
service.

Primary use cases
● Store and access function state and checkpoints
● A secondary metadata store for Apache Pulsar, away from
Zookeeper
● Other various KV store use cases

Data model

High level serving architecture (Bird view)

High level Datastore architecture

Benchmarking
● Benchmarking with YCSB
● Setup
○ YCSB Thread count = 40
○ # k8s pods = 3
○ cpuRequest = 8
○ cpuLimit = 16
○ memoryRequest = 24Gi
○ memoryLimit = 24Gi
● Read output
○ Throughput = 22557.7 Ops/S
○ Average latency = 1.699 ms
○ 99%tile latency = 5.323 ms
● Write output
○ Throughput = 15256.16 Ops/S
○ Average latency = 8.820 ms
○ 99%tile = 27.071 ms

Production readiness
● It has already been in production for last few
months
● It’s a k8s based deployment
● Sustained production traffic
○ Read throughput 240 Ops/S
○ Write throughput 90 Ops/S

Product roadmap
● Contribute internal changes back to open source
● Storage hardening
○ Pick up the data litter actively and reactively
○ Clear obsolete transaction log
● Operability
○ Improved and granular monitoring and alerting.
● Availability
○ Implementation of replica.
○ Serving read traffic from a replica.
○ Elevation of replica to be primary when primary fails
● Scaling and load balancing
○ Splitting a shard
○ Moving shards across cluster

References
● Statestore Repo:
https://github.com/apache/bookkeeper/tree/master/stream
● Distributedlog Repo:
https://github.com/apache/bookkeeper/tree/master/stream/distri
butedlog
● Pulsar Function - Statestore integration:
https://github.com/apache/pulsar/blob/master/pulsar-
functions/worker/src/main/java/org/apache/pulsar/functions/wor
ker/PulsarWorkerService.java#L420

Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021

More Related Content

What's hot (20)

Similar to Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021 (20)

More from StreamNative (20)

Recently uploaded (20)

Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021

Editor's Notes