SlideShare a Scribd company logo
Write tsdb database
like playing lego
@linxgnu
@huydx
gocon.autumn 2018
Observability team
Building, maintaining large scale
- Metrics system
- Log system
- Alert system
- Distributed tracing system
Today talk is about our story
of writing our own database
for metrics system
Time series database
A series of float value follow time axis


(t1, x1), (t2, x2)… (tn, xn) (where t is time
stampt and x is value at the moment)
Tn
Xn
What we need
- A storage to store time series data which is
- Extremely fast to write (hundreds of millions
data point / minute)
- Very fast to read (few thousands query per
sec)
- Efficient space usage (memory/disk)
What we can trade off
- Data consistency (data could be duplicate)
- Immutable data (once it written, could not be
change)
We need a storage to store
data looks like
message Sample {
double value = 1;
sint64 timestamp = 2;
}
// Serie is a collection of sample data with same serie_id
message Serie {
uint64 id = 1;
repeated Sample samples = 2;
repeated Label labels = 3;
}
// Series collection of series
message Series {
repeated Serie series = 1;
uint32 total_samples = 2;
}
With interfaces like


func (c *Storage) Save(series *proto.Series) (failed
[]*proto.Series, err error)
func (c *Storage) Load(serieIDs []uint64,
fromTimestampMillis, toTimestampMillis int64)
We tried many options but
- None of available solutions fit (performance
problem (clickhouse), or overprice (influxdb))
- Or some fit, but poorly maintained (facebook
beringei) or (netflix/atlas)
- Or looks potential, but poorly documented and
very unstable (uber/m3)
We decided to
- Build our own on memory TSDB
- For fast read/write
- But trade off for low retention (1 day instead
of months or years)
- But …
- We’re not database expert
Write on memory TSDB database (gocon tokyo autumn 2018)
Solutions
- Reuse as much as possible what people
already did good
- TANSTAAFL — “there ain’t no such thing as
a free lunch”
TSDB anatomy
// Series collection of series
message Series {
repeated Serie series = 1;
uint32 total_samples = 2;
}

- How to store Series efficient
- Especially space (because we’re using RAM)
Prometheus/tsdb package
- Which provides us
- Implementation to store series as chunk
- And compress it super efficient with loss-less
delta-of-delta encoding algorithm
(bstream.go) (original idea is from beringei)
We need better
compression
- Save single byte for each data point == save dozens
Gig of RAM
- Further compress (freeze) old data (not frequent read)
- Lossless compression
- Brotli
- Zstd
valyala/gozstd package
- Datadog/zstd has some memory allocate
problem
- Could do stream compression with reader/writer
interface
We need data replication
- We could not just lose data when restart,
replication will solve the problem
- Replication in distributed environment is hard
Write on memory TSDB database (gocon tokyo autumn 2018)
What we need for data
replication?
- Leader election
We know where to find
distributed system best
quality package
- github.com/hashicorp
hashicorp/raft package
- Golang implementation of the Raft consensus
protocol (https://raft.github.io/)
- Provide us
- Leader election
- Log replication
- EVERY communication between nodes are
stored as replicated log (like event sourcing)
- You need to provide your own replicated log
implementation
bsm/raft-badger package
- Implementation of replicated log based on badger
kv database (https://github.com/dgraph-io/badger)
- Badger is fast with SSD
Topology management
- We need to store some cluster information like
- Seed node
- Shard information
- …
- Candidate:
- Etcd / Centraldogma
LINE/centraldogma
- Configuration store
- Store data as arbitrary text (json/yaml..)
- Interesting feature
- Watch change
- Version controlled
LINE/centraldogma-go
package
- Centraldogma go client
- Full feature (json parse, watch change,…)
Thanks to tons of
awesome golang OSS
- Our storage now is serving in avg 1m samples
written per second without any problem
- And could store few billions samples in single
machine
Building your own
database is hard, but not
impossible
- We feel that it’s like playing lego with building
blocks area awesome golang OSS package
- May be that’s the reason why many awesome
databases are written in golang
- https://github.com/gostor/awesome-go-storage
We still has tons of things
to share, so stay tune!

More Related Content

PDF
Distributed Tracing, from internal SAAS insights
PPTX
MongoDB Backup & Disaster Recovery
PDF
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
PDF
XNAT Tuning & Monitoring
PPTX
Backup, Restore, and Disaster Recovery
PPTX
HBase at Flurry
PPT
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
Distributed Tracing, from internal SAAS insights
MongoDB Backup & Disaster Recovery
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
XNAT Tuning & Monitoring
Backup, Restore, and Disaster Recovery
HBase at Flurry
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides

What's hot (20)

PPTX
HDFS Internals
PDF
SignalFx: Making Cassandra Perform as a Time Series Database
PDF
Exadata下的数据并行加载、并行卸载及性能监控
PPTX
Hadoop Architecture_Cluster_Cap_Plan
PPTX
Elements of cache design
PDF
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
PPTX
Flume and Hadoop performance insights
ODP
Nach os network
ODP
Nach os network
ODP
Nach os network
PDF
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
PPTX
Hadoop architecture by ajay
PPTX
Introducing MongoDB in a multi-site HA environment
PDF
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
PPTX
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
PDF
Kafka on ZFS: Better Living Through Filesystems
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PPTX
Cache design
PDF
Managing terabytes: When Postgres gets big
HDFS Internals
SignalFx: Making Cassandra Perform as a Time Series Database
Exadata下的数据并行加载、并行卸载及性能监控
Hadoop Architecture_Cluster_Cap_Plan
Elements of cache design
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Flume and Hadoop performance insights
Nach os network
Nach os network
Nach os network
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Hadoop architecture by ajay
Introducing MongoDB in a multi-site HA environment
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Kafka on ZFS: Better Living Through Filesystems
ScyllaDB: NoSQL at Ludicrous Speed
Cache design
Managing terabytes: When Postgres gets big
Ad

Similar to Write on memory TSDB database (gocon tokyo autumn 2018) (20)

PPTX
In-Memory Computing: How, Why? and common Patterns
PPT
GC free coding in @Java presented @Geecon
PDF
GCP Data Engineer cheatsheet
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
Gcp data engineer
PDF
introduction to data processing using Hadoop and Pig
PPTX
Explore big data at speed of thought with Spark 2.0 and Snappydata
PPTX
Big Data Lakes Benchmarking 2018
PDF
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
PDF
Impala presentation ahad rana
PDF
An Overview of Spanner: Google's Globally Distributed Database
PPTX
CodeStock - Exploring .NET memory management - a trip down memory lane
PDF
Architecting a 35 PB distributed parallel file system for science
PDF
Understanding and building big data Architectures - NoSQL
PDF
Research computing at ILRI
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
PostgreSQL Prologue
PDF
Designs, Lessons and Advice from Building Large Distributed Systems
PDF
Multitenancy: Kafka clusters for everyone at LINE
In-Memory Computing: How, Why? and common Patterns
GC free coding in @Java presented @Geecon
GCP Data Engineer cheatsheet
The Parquet Format and Performance Optimization Opportunities
Gcp data engineer
introduction to data processing using Hadoop and Pig
Explore big data at speed of thought with Spark 2.0 and Snappydata
Big Data Lakes Benchmarking 2018
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Impala presentation ahad rana
An Overview of Spanner: Google's Globally Distributed Database
CodeStock - Exploring .NET memory management - a trip down memory lane
Architecting a 35 PB distributed parallel file system for science
Understanding and building big data Architectures - NoSQL
Research computing at ILRI
Elasticsearch for Logs & Metrics - a deep dive
PostgreSQL Prologue
Designs, Lessons and Advice from Building Large Distributed Systems
Multitenancy: Kafka clusters for everyone at LINE
Ad

More from Huy Do (16)

PDF
Some note about GC algorithm
PDF
Engineering Efficiency in LINE
PDF
GOCON Autumn (Story of our own Monitoring Agent in golang)
PDF
Story Writing Byte Serializer in Golang
PDF
Akka と Typeの話
PDF
[Scalameetup]spark shuffle
PDF
DI in ruby
PDF
Itlc2015
PDF
Consistent Hashingの小ネタ
PDF
Thriftを用いた分散型のNyancatを作ってきた
PDF
NoSQL for great good [hanoi.rb talk]
PDF
実践Akka
PDF
CA15卒勉強会 メタプログラミングについて
PDF
Making CLI app in ruby
PDF
CacheとRailsの簡単まとめ
PDF
[Htmlday]present
Some note about GC algorithm
Engineering Efficiency in LINE
GOCON Autumn (Story of our own Monitoring Agent in golang)
Story Writing Byte Serializer in Golang
Akka と Typeの話
[Scalameetup]spark shuffle
DI in ruby
Itlc2015
Consistent Hashingの小ネタ
Thriftを用いた分散型のNyancatを作ってきた
NoSQL for great good [hanoi.rb talk]
実践Akka
CA15卒勉強会 メタプログラミングについて
Making CLI app in ruby
CacheとRailsの簡単まとめ
[Htmlday]present

Recently uploaded (20)

PDF
Digital Logic Computer Design lecture notes
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Project quality management in manufacturing
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Welding lecture in detail for understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Digital Logic Computer Design lecture notes
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Project quality management in manufacturing
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
Operating System & Kernel Study Guide-1 - converted.pdf
R24 SURVEYING LAB MANUAL for civil enggi
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
UNIT 4 Total Quality Management .pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Welding lecture in detail for understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Automation-in-Manufacturing-Chapter-Introduction.pdf

Write on memory TSDB database (gocon tokyo autumn 2018)

  • 1. Write tsdb database like playing lego @linxgnu @huydx gocon.autumn 2018
  • 2. Observability team Building, maintaining large scale - Metrics system - Log system - Alert system - Distributed tracing system
  • 3. Today talk is about our story of writing our own database for metrics system
  • 5. A series of float value follow time axis 
 (t1, x1), (t2, x2)… (tn, xn) (where t is time stampt and x is value at the moment) Tn Xn
  • 6. What we need - A storage to store time series data which is - Extremely fast to write (hundreds of millions data point / minute) - Very fast to read (few thousands query per sec) - Efficient space usage (memory/disk)
  • 7. What we can trade off - Data consistency (data could be duplicate) - Immutable data (once it written, could not be change)
  • 8. We need a storage to store data looks like message Sample { double value = 1; sint64 timestamp = 2; } // Serie is a collection of sample data with same serie_id message Serie { uint64 id = 1; repeated Sample samples = 2; repeated Label labels = 3; } // Series collection of series message Series { repeated Serie series = 1; uint32 total_samples = 2; }
  • 9. With interfaces like 
 func (c *Storage) Save(series *proto.Series) (failed []*proto.Series, err error) func (c *Storage) Load(serieIDs []uint64, fromTimestampMillis, toTimestampMillis int64)
  • 10. We tried many options but - None of available solutions fit (performance problem (clickhouse), or overprice (influxdb)) - Or some fit, but poorly maintained (facebook beringei) or (netflix/atlas) - Or looks potential, but poorly documented and very unstable (uber/m3)
  • 11. We decided to - Build our own on memory TSDB - For fast read/write - But trade off for low retention (1 day instead of months or years) - But … - We’re not database expert
  • 13. Solutions - Reuse as much as possible what people already did good - TANSTAAFL — “there ain’t no such thing as a free lunch”
  • 14. TSDB anatomy // Series collection of series message Series { repeated Serie series = 1; uint32 total_samples = 2; }
 - How to store Series efficient - Especially space (because we’re using RAM)
  • 15. Prometheus/tsdb package - Which provides us - Implementation to store series as chunk - And compress it super efficient with loss-less delta-of-delta encoding algorithm (bstream.go) (original idea is from beringei)
  • 16. We need better compression - Save single byte for each data point == save dozens Gig of RAM - Further compress (freeze) old data (not frequent read) - Lossless compression - Brotli - Zstd
  • 17. valyala/gozstd package - Datadog/zstd has some memory allocate problem - Could do stream compression with reader/writer interface
  • 18. We need data replication - We could not just lose data when restart, replication will solve the problem - Replication in distributed environment is hard
  • 20. What we need for data replication? - Leader election
  • 21. We know where to find distributed system best quality package - github.com/hashicorp
  • 22. hashicorp/raft package - Golang implementation of the Raft consensus protocol (https://raft.github.io/) - Provide us - Leader election - Log replication - EVERY communication between nodes are stored as replicated log (like event sourcing) - You need to provide your own replicated log implementation
  • 23. bsm/raft-badger package - Implementation of replicated log based on badger kv database (https://github.com/dgraph-io/badger) - Badger is fast with SSD
  • 24. Topology management - We need to store some cluster information like - Seed node - Shard information - … - Candidate: - Etcd / Centraldogma
  • 25. LINE/centraldogma - Configuration store - Store data as arbitrary text (json/yaml..) - Interesting feature - Watch change - Version controlled
  • 26. LINE/centraldogma-go package - Centraldogma go client - Full feature (json parse, watch change,…)
  • 27. Thanks to tons of awesome golang OSS - Our storage now is serving in avg 1m samples written per second without any problem - And could store few billions samples in single machine
  • 28. Building your own database is hard, but not impossible - We feel that it’s like playing lego with building blocks area awesome golang OSS package - May be that’s the reason why many awesome databases are written in golang - https://github.com/gostor/awesome-go-storage
  • 29. We still has tons of things to share, so stay tune!