Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency

Brought to you by
Aggregator Leaf Tailer:
Bringing Data to Your Users with
Ultra Low Latency
Jeffery Utter
Staff Developer at theScore

Jeffery Utter
Staff Developer, theScore
■ Built half of a distributed database
■ P99.*s matter to 100% of users
■ Wrote my ﬁrst line of Java at age 30-something
■ I lead a double-life as a double-bassist

Table of Contents
■ What is Aggregator, Leaf, Tailor:
■ Goals and Constraints
■ Why not < insert your favorite (distributed) database >
■ Datadex
■ Performance Tips
● Java
● RocksDB
■ Overview
● Architecture
● Future
■ Conclusion

Aggregator, Leaf, Tailer
■ Term started getting use in 2019
■ Largely promoted by Rockset
● Rockset Concepts, Design & Architecture [1]
● Aggregator Leaf Tailer: An Alternative to Lambda Architecture for Real-Time Analytics [2]
● Rockset’s Aggregator-Leaf-Tailer Architecture for SQL on semi structured data [3]
■ Prior Art
● Facebook: Science and the Social Graph (2008) [4]
● Serving Facebook Multifeed: Eﬃciency, performance gains through redesign (2015) [5]
● FollowFeed: LinkedIn's Feed Made Faster and Smarter (2016) [6]

■ Aggregator — Low latency aggregation of data stored in one or more Leaf
■ Leaf — All data stored and indexed in one or more leaf
■ Tailer — Pulls new data from various sources and inserts it into the leaves

Goals & Constraints
■ Low latency
■ Low operational complexity
● Ease of maintenance
● Ease of deployment (in all
geolocations)
■ Developer ergonomics
■ Scalability

Traditional RDBMS (Postgres)
■ Duplication of effort to populate databases
■ Operational overhead - database setup, maintenance, scaling

■ Implicit shared “schema”
■ Good scalability via hosted offerings
■ Operational overhead for on-prem
Cloud NoSQL (MongoDB)

Kafka-native (Rockset/kSQL)
■ Not an exact match with our querying needs (kSQL has no secondary
indexes)
■ Both seem geared towards analytic workﬂows
■ Operational overhead for on-prem

■ Aggregator — gRPC Java
■ Leaf — RocksDB
■ Tailer — Kafka

■ zGC
■ Careful memory allocations
■ GRPC Streaming
■ Double-edged sword
■ “Bypasses” service mesh
Low Latency

■ Single codebase
■ Single deployable unit (for now)
■ Instances managed by Kubernetes Operator
■ Deploy / Release / Upgrade cycle similar to other backend applications
Low Operational Complexity

■ Simple conﬁguration through CRD
■ Elixir Client library
■ Simple query “language”
■ “Watch” feature to stream updates to downstream services
Developer Ergonomics

Scalability
■ Fast scale-out through snapshot/backup/restore mechanism
■ Future improvements to independently scale Aggregator/Leaf/Tailer

Minimize Allocations
Ops/sec Error
Before 3,053.990 ± 742.316
After 3,964.574 ± 240.020
~ 30% Increase
Throughput
■ Re-use buffers for key
serialization/deserialization
■ Re-use buffer for reading values - up to
a certain size (fastGet)

Ruthlessly Narrow
Search Range
After
Before

Resources
1. Rockset Concepts, Design & Architecture:
https://rockset.com/Rockset_Concepts_Design_Architecture.pdf
2. Aggregator Leaf Tailer: An Alternative to Lambda Architecture for Real-Time Analytics:
https://rockset.com/blog/aggregator-leaf-tailer-an-architecture-for-live-analytics-on-event-streams/
3. Rockset’s Aggregator-Leaf-Tailer Architecture for SQL on semi structured data:
http://www.hpts.ws/papers/2019/RocksetHPTS19.pdf
4. Facebook: Science and the Social Graph
https://www.infoq.com/presentations/Facebook-Software-Stack/ (about 53 minutes in)
5. Serving Facebook Multifeed: Eﬃciency, performance gains through redesign:
https://engineering.fb.com/2015/03/10/production-engineering/serving-facebook-multifeed-eﬃciency-performance-gains-
through-redesign/
6. FollowFeed: LinkedIn's Feed Made Faster and Smarter:
https://engineering.linkedin.com/blog/2016/03/followfeed--linkedin-s-feed-made-faster-and-smarter
Realtime Indexing for Fast Queries on Massive Semi-Structured Data:
https://www.p99conf.io/session/realtime-indexing-for-fast-queries-on-massive-semi-structured-data/

Brought to you by
Jeffery Utter
jeff@jeffutter.com
@jeffutter

Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency

More Related Content

What's hot (20)

Similar to Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency (20)

More from ScyllaDB (20)

Recently uploaded (20)

Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency