SlideShare a Scribd company logo
OpenTSDB 2.0
Distributed, Scalable Time Series Database
Benoit Sigoure tsunanet@gmail.com
Chris Larsen clarsen@llnw.com
Who We Are
Benoit Sigoure
● Created OpenTSDB at StumbleUpon
● Software Engineer @ Arista Networks
Chris Larsen
● Release manager for OpenTSDB 2.0
● Operations Engineer @ Limelight Networks
What Is OpenTSDB?
● Open Source Time Series Database
● Store trillions of data points
● Sucks up all data and keeps going
● Never lose precision
● Scales using HBase
What good is it?
● Systems Monitoring & Measurement
○ Servers
○ Networks
● Sensor Data
○ The Internet of Things
○ SCADA
● Financial Data
● Scientific Experiment Results
Use Cases
OVH: #3 largest cloud/hosting provider
Monitor everything: networking, temperature, voltage,
application performance, resource utilization,
customer-facing metrics, etc.
● 35 servers, 100k writes/s, 25TB raw data
● 5-day moving window of HBase snapshots
● Redis cache on top for customer-facing data
Use Cases
Yahoo
Monitoring application performance and statistics
● 15 servers, 280k writes/s
● Increased UID size to 4 bytes instead of 3, allowing
for over 4 billion values
● Looking at using HBase Append requests to avoid
TSD compactions
Use Cases
Arista Networks: High performance networking
● Single-node HBase (no HDFS) + 2 TSDs (one for
writing, one for reading)
● 5K writes per second, 500G of data, piece of cake
to deploy/maintain
● Varnish for caching
Some Other Users
● Box: 23 servers, 90K wps, System, app network, business metrics
● Limelight Networks: 8 servers, 30k wps, 24TB of data
● Ticketmaster: 13 servers, 90K wps, ~40GB a day
What Are Time Series?
● Time Series: data points for an identity
over time
● Typical Identity:
○ Dotted string: web01.sys.cpu.user.0
● OpenTSDB Identity:
○ Metric: sys.cpu.user
○ Tags (name/value pairs):
host=web01 cpu=0
What Are Time Series?
Data Point:
● Metric + Tags
● + Value: 42
● + Timestamp: 1234567890
sys.cpu.user 1234567890 42 host=web01 cpu=0
^ a data point ^
How it Works
Writing Data
1) Open Telnet style socket, write:
put sys.cpu.user 1234567890 42 host=web01 cpu=0
2) ..or with 2.0, post JSON to:
http://<host>:<port>/api/put
3) .. or import big files with CLI
● No schema definition
● No RRD file creation
● Just write!
Querying Data
● Graph with the GUI
● CLI tools
● HTTP API
● Aggregate multiple series
● Simple query language
To average all CPUs on host:
start=1h-ago
avg sys.cpu.user host=web01
HBase Data Tables
● tsdb - Data point table. Massive
● tsdb-uid - Name to UID and UID to
name mappings
● tsdb-meta - Time series index and
meta-data (new in 2.0)
● tsdb-tree - Config and index for
heirarchical naming schema (new
in 2.0)
Lets see how OpenTSDB uses
HBase...
UID Table Schema
● Integer UIDs assigned to each value per type (metric,
tagk, tagv) in tsdb-uid table
● 64 bit integers in row x00 reflect last used UID
CF:Qualifier Row Key UID
id:metric sys.cpu.user 1
id:tagk host 1
id:tagv web01 1
id:tagk cpu 2
id:tagv 0 2
Improved UID Assignment
● Pre 1.2 Assignment:
○ Client acquires lock on row x00
○ uid = getRequest(UID type)
○ Increment uid
○ getRequest(name) confirm name hasn’t been assigned
○ putRequest(type, uid)
○ putRequest(reverse map)
○ putRequest(forward map)
○ Release lock
● Lock held for a long time
● Puts overwrite data
Improved UID Assignment
● 1.2 & 2.0 Assignment:
○ atomicIncrementRequest(UID type)
○ getRequest(name) confirm name hasn’t been assigned
○ compareAndSet(reverse map)
○ compareAndSet(forward map)
● 3 atomic operations on different rows
● Much better concurrency with multiple TSDs assigning
UIDs
● CAS calls fail on existing data, log the error
Data Table Schema
● Row key is a concatenation of UIDs and time:
○ metric + timestamp + tagk1 + tagv1… + tagkN + tagvN
● sys.cpu.user 1234567890 42 host=web01 cpu=0
x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02
● Timestamp normalized on 1 hour boundaries
● All data points for an hour are stored in one row
● Enables fast scans of all time series for a metric
● …or pass a row key regex filter for specific time series
with particular tags
Data Table Schema
1.x Column Qualifiers:
● Type: floating point or integer value
● Value Length: number of bytes the value is encoded on
● Delta: offset from row timestamp in seconds
● Compacted columns concatenate an hour of qualifiers
and values into a single column
[ 0b11111111, 0b1111 0111 ]
<----------------> ^<->
delta (seconds) type value length
Data Table Schema
OpenTSDB 2.x Schema Design Goals
● Must maintain backwards compatibility
● Support millisecond precision timestamps
● Support other objects, e.g. annotations, blobs
○ E.g. could store quality measurements for each data
point
● Store millisecond and other objects in the same row as
data points
○ Scoops up all relevant time series data in one scan
Data Table Schema
Millisecond Support:
● Need 3,599,999 possible time offsets instead of 3,600
● Solution: 4 byte qualifier instead of 2
● Prefixed with xF0 to differentiate from a 2 value
compacted column
● Can mix second and millisecond precision in one row
● Still concatenates into one compacted column
[ 0b11111101, 0b10111011, 0b10011111, 0b11 000111 ]
<--><--------------------------------> ^^^<->
msec precision delta (milliseconds) type value length
2 unused bits
Data Table Schema
Annotations and Other Objects
● Odd number of bytes in qualifier (3 or 5)
● Prefix IDs: x01 = annotation, x02 = blob
● Remainder is offset in seconds or milliseconds
● Unbounded length, not meant for compaction
{
"tsuid": "000001000001000001",
"description": "Server Maintenance",
"notes": "Upgrading the server, ignore these values, winter is coming",
"endTime": 1369159800,
"startTime": 1369159200
}
TSUID Index and Meta Table
● Time Series UID = data table row key without timestamp
E.g. sys.cpu.user host=web01 cpu=0
x00x00x01x00x00x01x00x00x01x00x00x02x00x00x02
● Use TSUID as the row key
○ Can use a row key regex filter to scan for metrics with a tag pair
or the tags associated with a metric
● Atomic Increment for each data point
○ ts_counter column: Track number of data points written
○ ts_meta column: Async callback chain to create meta object if
increment returns a 1
○ Potentially doubles RPC count
OpenTSDB Trees
● Provide a hierarchical
representation of time series
○ Useful for Graphite or
browsing the data store
● Flexible rule system processes
metrics and tags
● Out of band creation or
process in real-time with
TSUID increment callback
chaining
OpenTSDB Trees
Example Time Series:
myapp.bytes_sent dc=dal host=web01
myapp.bytes_received dc=dal host=web01
Example Ruleset:
Level Order Rule
Type
Field Regex
0 0 tagk dc
1 0 tagk host
2 0 metric (.*)..*
3 0 metric .*.(*.)
OpenTSDB Trees
Example Results:
Flattened Names:
dal.web01.myapp.bytes_sent
dal.web01.myapp.bytes_received
Tree View :
● dal
○ web01
■ myapp
● bytes_sent
● bytes_received
^ leaves ^
Tree Table Schema
Column Qualifiers:
● tree: JSON object with tree description
● rule:<level>:<order>: JSON object definition of a rule
belonging to a tree
● branch: JSON object linking to child branches and/or leaves
● leaf:<tsuid>: JSON object linking to a specific TSUID
● tree_collision:<tsuid>: Time series was already included in
tree
● tree_not_matched:<tsuid>: Time series did not appear in tree
Tree Table Schema
● Row Keys = Tree ID + Branch ID
● Branch ID = 4 byte hashes of branch hierarchy
E.g. dal.web01.myapp.bytes_sent
dal = 0001838F(hex)
web01 = 06BC4C55
myapp = 06387CF5
bytes_sent = leaf pointing to a TSUID
Key for branch on Tree #1 =
00010001838F06BC4C5506387CF5
Tree Table Schema
Row Key CF: “t” Keys truncated, 1B tree ID, 2 byte hashes
x01 “tree”: {“name”:”Test Tree”} “rule0:0”: {<rule>} “rule1:0”: {<rule>}
x01x83x8F “branch”: {“name”:”dal”, “branches”:[{ “name”:”web01”},{“name”:”web01”}]}
x01x83x8Fx4Cx55 “branch”: {“name”:”web01”, “branches”:[{ “name”:”myapp”}]}
x01x83x8Fx4Cx55x7Cx5
F
“branch”: {“name”:”myapp”,
“branches”:null}
“leaf:<tsuid1>”: {“name”:”
bytes_sent”}
“leaf:<tsuid2>”: {“name”:”
bytes_received”}
x01x83x8Fx4Dx00 “branch”: {“name”:”web02”, “branches”:[{ “name”:”myapp”}]}
x01x83x8Fx4Dx00x7Cx5
F
“branch”: {“name”:”myapp”,
“branches”:null}
“leaf:<tsuid3>”: {“name”:”
bytes_sent”}
“leaf:<tsuid4>”: {“name”:”
bytes_sent”}
Row Key Regex for branch “dal”: ^Qx01x83x8FE(?:.{2})$
Matches branch “web01” and “web02” only.
Time Series Naming Optimization
● Designed for fast aggregation:
○ Average CPU usage across hosts in web pool
○ Total bytes sent from all hosts running application
MySQL
● High cardinality increases query latency
● Aim to minimize rows scanned
● Determine where aggregation is useful
○ May not care about average CPU usage across
entire network.
Therefore shift web tag to the metric name
Time Series Naming Optimization
Example Time Series
● sys.cpu.user host=web01 pool=web
● sys.cpu.user host=db01 pool=db
● host = 1m total unique values, 1,000 in web and 50 in db
● pool = 20 unique values
Very high cardinality for “sys.cpu.user”
● Query 1) 30 day query for “sys.cpu.user pool=db” scans 720m
rows (24h * 30d * 1m hosts) but only returns 36k (24h * 30d * 50)
● Query 2) 30 day query for “sys.cpu.user host=db01 pool=db” still
scans 720m rows but returns 36
Time Series Naming Optimization
Solution: Move pool tag values into metric name
● web.sys.cpu.user host=web01
● db.sys.cpu.user host=db01
Much lower cardinality for “sys.cpu.user”
● Query 1) 30 day query for “db.sys.cpu.user” scans 36k rows (24h
* 30d * 50 hosts) and returns all 36k (24h * 30d * 50)
● Query 2) 30 day query for “db.sys.cpu.user host=db01” still scans
36k rows and returns 36
New for OpenTSDB 2.0
● RESTful HTTP API
● Plugin support for de/serialization
● Emitter plugin support for publishing
○ Use for WebSockets/display
updates
○ Stream Processing
● Non-interpolating aggregation
functions
New for OpenTSDB 2.0
Special rate and counter functions
● Suppress spikes
Raw Rate: Counter Reset = 1000
OpenTSDB Front-Ends
Metrilyx from TicketMaster https://github.com/Ticketmaster/metrilyx-2.0
OpenTSDB Front-Ends
Status Wolf from Box https://github.com/box/StatusWolf
New in AsyncHBase 1.5
● AsyncHBase is a fully asynchronous, multi-
threaded HBase client
● Now supports HBase 0.96 / 0.98
● Remains 2x faster than HTable in
PerformanceEvaluation
● Support for scanner filters, META prefetch,
“fail-fast” RPCs
The Future of OpenTSDB
The Future
● Parallel scanners to improve
queries
● Coprocessor support for query
improvements similar to Salesforce’
s Phoenix with SkipScans
● Time series searching, lookups
(which tags belong to which
series?)
The Future
● Duplicate timestamp handling
● Counter increment and blob storage support
● Rollups/pre-aggregations
● Greater query flexibility:
○ Aggregate metrics
○ Regex on tags
○ Scalar calculations
More Information
● Thank you to everyone who has helped test, debug and add to OpenTSDB
2.0!
● Contribute at github.com/OpenTSDB/opentsdb
● Website: opentsdb.net
● Documentation: opentsdb.net/docs/build/html
● Mailing List: groups.google.com/group/opentsdb
Images
● http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html
● http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg
● http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg
● http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG
● http://upload.wikimedia.org/wikipedia/commons/4/42/PostItNotePad.JPG
● http://openclipart.org/image/300px/svg_to_png/68557/green_leaf_icon.png
● http://nickapedia.com/2012/06/05/api-all-the-things-razor-api-wiki/
● http://lego.cuusoo.com/ideas/view/96
● http://upload.wikimedia.org/wikipedia/commons/3/35/KL_Cyrix_FasMath_CX83D87.jpg
● http://www.flickr.com/photos/smkybear/4624182536/

More Related Content

PDF
Ansible
PDF
아파치 카프카 입문과 활용 강의자료
PDF
MySQL 상태 메시지 분석 및 활용
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
PDF
Automated master failover
PDF
MySQL/MariaDB Proxy Software Test
PDF
ProxySQL High Avalability and Configuration Management Overview
PPTX
Ansible
아파치 카프카 입문과 활용 강의자료
MySQL 상태 메시지 분석 및 활용
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Automated master failover
MySQL/MariaDB Proxy Software Test
ProxySQL High Avalability and Configuration Management Overview

What's hot (20)

PPTX
Survey of High Performance NoSQL Systems
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
PDF
Seastore: Next Generation Backing Store for Ceph
PDF
Planning for Disaster Recovery (DR) with Galera Cluster
PPTX
Enhance your multi-cloud application performance using Redis Enterprise P2
PDF
How to Analyze and Tune MySQL Queries for Better Performance
PDF
Inside MongoDB: the Internals of an Open-Source Database
PPTX
Introduction to Redis
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PPTX
Tuning PostgreSQL for High Write Throughput
PDF
Network Automation: Ansible 101
PDF
MongoDB Performance Tuning
PDF
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
PPTX
Automated Deployments with Ansible
ODP
Elasticsearch V/s Relational Database
PDF
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
PDF
Deploying IPv6 in OpenStack Environments
PDF
High Availability for OpenStack
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
PPTX
Introduction to Redis
Survey of High Performance NoSQL Systems
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Seastore: Next Generation Backing Store for Ceph
Planning for Disaster Recovery (DR) with Galera Cluster
Enhance your multi-cloud application performance using Redis Enterprise P2
How to Analyze and Tune MySQL Queries for Better Performance
Inside MongoDB: the Internals of an Open-Source Database
Introduction to Redis
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Tuning PostgreSQL for High Write Throughput
Network Automation: Ansible 101
MongoDB Performance Tuning
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Automated Deployments with Ansible
Elasticsearch V/s Relational Database
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
Deploying IPv6 in OpenStack Environments
High Availability for OpenStack
How to Manage Scale-Out Environments with MariaDB MaxScale
Introduction to Redis
Ad

Similar to OpenTSDB 2.0 (20)

PPTX
Update on OpenTSDB and AsyncHBase
PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
Need for Time series Database
PDF
OpenTSDB: HBaseCon2017
PDF
OSMC 2013 | openTSDB - metrics for a distributed world
PPTX
HBaseCon 2013: OpenTSDB at Box
PDF
Survey real time databases
PPTX
Apache IOTDB: a Time Series Database for Industrial IoT
PDF
Open TSDB Lightning Talk
PPTX
Monitoring MySQL with OpenTSDB
PDF
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
PDF
PERFORMANCE STUDY OF TIME SERIES DATABASES
PDF
Performance Comparison between Pytorch and Mindspore
PDF
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
PDF
Scaling Pinterest's Monitoring
PDF
openTSDB - Metrics for a distributed world
PPT
On time-series databases
PDF
Efficient and Fast Time Series Storage - The missing link in dynamic software...
PPTX
temporal and spatial database.pptx
Update on OpenTSDB and AsyncHBase
HBaseCon 2015: OpenTSDB and AsyncHBase Update
Update on OpenTSDB and AsyncHBase
Need for Time series Database
OpenTSDB: HBaseCon2017
OSMC 2013 | openTSDB - metrics for a distributed world
HBaseCon 2013: OpenTSDB at Box
Survey real time databases
Apache IOTDB: a Time Series Database for Industrial IoT
Open TSDB Lightning Talk
Monitoring MySQL with OpenTSDB
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
PERFORMANCE STUDY OF TIME SERIES DATABASES
Performance Comparison between Pytorch and Mindspore
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Scaling Pinterest's Monitoring
openTSDB - Metrics for a distributed world
On time-series databases
Efficient and Fast Time Series Storage - The missing link in dynamic software...
temporal and spatial database.pptx
Ad

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client

Recently uploaded (20)

PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Transform Your Business with a Software ERP System
PDF
System and Network Administraation Chapter 3
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
medical staffing services at VALiNTRY
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
L1 - Introduction to python Backend.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
AI in Product Development-omnex systems
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Transform Your Business with a Software ERP System
System and Network Administraation Chapter 3
Odoo POS Development Services by CandidRoot Solutions
medical staffing services at VALiNTRY
PTS Company Brochure 2025 (1).pdf.......
L1 - Introduction to python Backend.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
VVF-Customer-Presentation2025-Ver1.9.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
AI in Product Development-omnex systems
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Odoo Companies in India – Driving Business Transformation.pdf
Essential Infomation Tech presentation.pptx
Upgrade and Innovation Strategies for SAP ERP Customers

OpenTSDB 2.0

  • 1. OpenTSDB 2.0 Distributed, Scalable Time Series Database Benoit Sigoure tsunanet@gmail.com Chris Larsen clarsen@llnw.com
  • 2. Who We Are Benoit Sigoure ● Created OpenTSDB at StumbleUpon ● Software Engineer @ Arista Networks Chris Larsen ● Release manager for OpenTSDB 2.0 ● Operations Engineer @ Limelight Networks
  • 3. What Is OpenTSDB? ● Open Source Time Series Database ● Store trillions of data points ● Sucks up all data and keeps going ● Never lose precision ● Scales using HBase
  • 4. What good is it? ● Systems Monitoring & Measurement ○ Servers ○ Networks ● Sensor Data ○ The Internet of Things ○ SCADA ● Financial Data ● Scientific Experiment Results
  • 5. Use Cases OVH: #3 largest cloud/hosting provider Monitor everything: networking, temperature, voltage, application performance, resource utilization, customer-facing metrics, etc. ● 35 servers, 100k writes/s, 25TB raw data ● 5-day moving window of HBase snapshots ● Redis cache on top for customer-facing data
  • 6. Use Cases Yahoo Monitoring application performance and statistics ● 15 servers, 280k writes/s ● Increased UID size to 4 bytes instead of 3, allowing for over 4 billion values ● Looking at using HBase Append requests to avoid TSD compactions
  • 7. Use Cases Arista Networks: High performance networking ● Single-node HBase (no HDFS) + 2 TSDs (one for writing, one for reading) ● 5K writes per second, 500G of data, piece of cake to deploy/maintain ● Varnish for caching
  • 8. Some Other Users ● Box: 23 servers, 90K wps, System, app network, business metrics ● Limelight Networks: 8 servers, 30k wps, 24TB of data ● Ticketmaster: 13 servers, 90K wps, ~40GB a day
  • 9. What Are Time Series? ● Time Series: data points for an identity over time ● Typical Identity: ○ Dotted string: web01.sys.cpu.user.0 ● OpenTSDB Identity: ○ Metric: sys.cpu.user ○ Tags (name/value pairs): host=web01 cpu=0
  • 10. What Are Time Series? Data Point: ● Metric + Tags ● + Value: 42 ● + Timestamp: 1234567890 sys.cpu.user 1234567890 42 host=web01 cpu=0 ^ a data point ^
  • 12. Writing Data 1) Open Telnet style socket, write: put sys.cpu.user 1234567890 42 host=web01 cpu=0 2) ..or with 2.0, post JSON to: http://<host>:<port>/api/put 3) .. or import big files with CLI ● No schema definition ● No RRD file creation ● Just write!
  • 13. Querying Data ● Graph with the GUI ● CLI tools ● HTTP API ● Aggregate multiple series ● Simple query language To average all CPUs on host: start=1h-ago avg sys.cpu.user host=web01
  • 14. HBase Data Tables ● tsdb - Data point table. Massive ● tsdb-uid - Name to UID and UID to name mappings ● tsdb-meta - Time series index and meta-data (new in 2.0) ● tsdb-tree - Config and index for heirarchical naming schema (new in 2.0) Lets see how OpenTSDB uses HBase...
  • 15. UID Table Schema ● Integer UIDs assigned to each value per type (metric, tagk, tagv) in tsdb-uid table ● 64 bit integers in row x00 reflect last used UID CF:Qualifier Row Key UID id:metric sys.cpu.user 1 id:tagk host 1 id:tagv web01 1 id:tagk cpu 2 id:tagv 0 2
  • 16. Improved UID Assignment ● Pre 1.2 Assignment: ○ Client acquires lock on row x00 ○ uid = getRequest(UID type) ○ Increment uid ○ getRequest(name) confirm name hasn’t been assigned ○ putRequest(type, uid) ○ putRequest(reverse map) ○ putRequest(forward map) ○ Release lock ● Lock held for a long time ● Puts overwrite data
  • 17. Improved UID Assignment ● 1.2 & 2.0 Assignment: ○ atomicIncrementRequest(UID type) ○ getRequest(name) confirm name hasn’t been assigned ○ compareAndSet(reverse map) ○ compareAndSet(forward map) ● 3 atomic operations on different rows ● Much better concurrency with multiple TSDs assigning UIDs ● CAS calls fail on existing data, log the error
  • 18. Data Table Schema ● Row key is a concatenation of UIDs and time: ○ metric + timestamp + tagk1 + tagv1… + tagkN + tagvN ● sys.cpu.user 1234567890 42 host=web01 cpu=0 x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02 ● Timestamp normalized on 1 hour boundaries ● All data points for an hour are stored in one row ● Enables fast scans of all time series for a metric ● …or pass a row key regex filter for specific time series with particular tags
  • 19. Data Table Schema 1.x Column Qualifiers: ● Type: floating point or integer value ● Value Length: number of bytes the value is encoded on ● Delta: offset from row timestamp in seconds ● Compacted columns concatenate an hour of qualifiers and values into a single column [ 0b11111111, 0b1111 0111 ] <----------------> ^<-> delta (seconds) type value length
  • 20. Data Table Schema OpenTSDB 2.x Schema Design Goals ● Must maintain backwards compatibility ● Support millisecond precision timestamps ● Support other objects, e.g. annotations, blobs ○ E.g. could store quality measurements for each data point ● Store millisecond and other objects in the same row as data points ○ Scoops up all relevant time series data in one scan
  • 21. Data Table Schema Millisecond Support: ● Need 3,599,999 possible time offsets instead of 3,600 ● Solution: 4 byte qualifier instead of 2 ● Prefixed with xF0 to differentiate from a 2 value compacted column ● Can mix second and millisecond precision in one row ● Still concatenates into one compacted column [ 0b11111101, 0b10111011, 0b10011111, 0b11 000111 ] <--><--------------------------------> ^^^<-> msec precision delta (milliseconds) type value length 2 unused bits
  • 22. Data Table Schema Annotations and Other Objects ● Odd number of bytes in qualifier (3 or 5) ● Prefix IDs: x01 = annotation, x02 = blob ● Remainder is offset in seconds or milliseconds ● Unbounded length, not meant for compaction { "tsuid": "000001000001000001", "description": "Server Maintenance", "notes": "Upgrading the server, ignore these values, winter is coming", "endTime": 1369159800, "startTime": 1369159200 }
  • 23. TSUID Index and Meta Table ● Time Series UID = data table row key without timestamp E.g. sys.cpu.user host=web01 cpu=0 x00x00x01x00x00x01x00x00x01x00x00x02x00x00x02 ● Use TSUID as the row key ○ Can use a row key regex filter to scan for metrics with a tag pair or the tags associated with a metric ● Atomic Increment for each data point ○ ts_counter column: Track number of data points written ○ ts_meta column: Async callback chain to create meta object if increment returns a 1 ○ Potentially doubles RPC count
  • 24. OpenTSDB Trees ● Provide a hierarchical representation of time series ○ Useful for Graphite or browsing the data store ● Flexible rule system processes metrics and tags ● Out of band creation or process in real-time with TSUID increment callback chaining
  • 25. OpenTSDB Trees Example Time Series: myapp.bytes_sent dc=dal host=web01 myapp.bytes_received dc=dal host=web01 Example Ruleset: Level Order Rule Type Field Regex 0 0 tagk dc 1 0 tagk host 2 0 metric (.*)..* 3 0 metric .*.(*.)
  • 26. OpenTSDB Trees Example Results: Flattened Names: dal.web01.myapp.bytes_sent dal.web01.myapp.bytes_received Tree View : ● dal ○ web01 ■ myapp ● bytes_sent ● bytes_received ^ leaves ^
  • 27. Tree Table Schema Column Qualifiers: ● tree: JSON object with tree description ● rule:<level>:<order>: JSON object definition of a rule belonging to a tree ● branch: JSON object linking to child branches and/or leaves ● leaf:<tsuid>: JSON object linking to a specific TSUID ● tree_collision:<tsuid>: Time series was already included in tree ● tree_not_matched:<tsuid>: Time series did not appear in tree
  • 28. Tree Table Schema ● Row Keys = Tree ID + Branch ID ● Branch ID = 4 byte hashes of branch hierarchy E.g. dal.web01.myapp.bytes_sent dal = 0001838F(hex) web01 = 06BC4C55 myapp = 06387CF5 bytes_sent = leaf pointing to a TSUID Key for branch on Tree #1 = 00010001838F06BC4C5506387CF5
  • 29. Tree Table Schema Row Key CF: “t” Keys truncated, 1B tree ID, 2 byte hashes x01 “tree”: {“name”:”Test Tree”} “rule0:0”: {<rule>} “rule1:0”: {<rule>} x01x83x8F “branch”: {“name”:”dal”, “branches”:[{ “name”:”web01”},{“name”:”web01”}]} x01x83x8Fx4Cx55 “branch”: {“name”:”web01”, “branches”:[{ “name”:”myapp”}]} x01x83x8Fx4Cx55x7Cx5 F “branch”: {“name”:”myapp”, “branches”:null} “leaf:<tsuid1>”: {“name”:” bytes_sent”} “leaf:<tsuid2>”: {“name”:” bytes_received”} x01x83x8Fx4Dx00 “branch”: {“name”:”web02”, “branches”:[{ “name”:”myapp”}]} x01x83x8Fx4Dx00x7Cx5 F “branch”: {“name”:”myapp”, “branches”:null} “leaf:<tsuid3>”: {“name”:” bytes_sent”} “leaf:<tsuid4>”: {“name”:” bytes_sent”} Row Key Regex for branch “dal”: ^Qx01x83x8FE(?:.{2})$ Matches branch “web01” and “web02” only.
  • 30. Time Series Naming Optimization ● Designed for fast aggregation: ○ Average CPU usage across hosts in web pool ○ Total bytes sent from all hosts running application MySQL ● High cardinality increases query latency ● Aim to minimize rows scanned ● Determine where aggregation is useful ○ May not care about average CPU usage across entire network. Therefore shift web tag to the metric name
  • 31. Time Series Naming Optimization Example Time Series ● sys.cpu.user host=web01 pool=web ● sys.cpu.user host=db01 pool=db ● host = 1m total unique values, 1,000 in web and 50 in db ● pool = 20 unique values Very high cardinality for “sys.cpu.user” ● Query 1) 30 day query for “sys.cpu.user pool=db” scans 720m rows (24h * 30d * 1m hosts) but only returns 36k (24h * 30d * 50) ● Query 2) 30 day query for “sys.cpu.user host=db01 pool=db” still scans 720m rows but returns 36
  • 32. Time Series Naming Optimization Solution: Move pool tag values into metric name ● web.sys.cpu.user host=web01 ● db.sys.cpu.user host=db01 Much lower cardinality for “sys.cpu.user” ● Query 1) 30 day query for “db.sys.cpu.user” scans 36k rows (24h * 30d * 50 hosts) and returns all 36k (24h * 30d * 50) ● Query 2) 30 day query for “db.sys.cpu.user host=db01” still scans 36k rows and returns 36
  • 33. New for OpenTSDB 2.0 ● RESTful HTTP API ● Plugin support for de/serialization ● Emitter plugin support for publishing ○ Use for WebSockets/display updates ○ Stream Processing ● Non-interpolating aggregation functions
  • 34. New for OpenTSDB 2.0 Special rate and counter functions ● Suppress spikes Raw Rate: Counter Reset = 1000
  • 35. OpenTSDB Front-Ends Metrilyx from TicketMaster https://github.com/Ticketmaster/metrilyx-2.0
  • 36. OpenTSDB Front-Ends Status Wolf from Box https://github.com/box/StatusWolf
  • 37. New in AsyncHBase 1.5 ● AsyncHBase is a fully asynchronous, multi- threaded HBase client ● Now supports HBase 0.96 / 0.98 ● Remains 2x faster than HTable in PerformanceEvaluation ● Support for scanner filters, META prefetch, “fail-fast” RPCs
  • 38. The Future of OpenTSDB
  • 39. The Future ● Parallel scanners to improve queries ● Coprocessor support for query improvements similar to Salesforce’ s Phoenix with SkipScans ● Time series searching, lookups (which tags belong to which series?)
  • 40. The Future ● Duplicate timestamp handling ● Counter increment and blob storage support ● Rollups/pre-aggregations ● Greater query flexibility: ○ Aggregate metrics ○ Regex on tags ○ Scalar calculations
  • 41. More Information ● Thank you to everyone who has helped test, debug and add to OpenTSDB 2.0! ● Contribute at github.com/OpenTSDB/opentsdb ● Website: opentsdb.net ● Documentation: opentsdb.net/docs/build/html ● Mailing List: groups.google.com/group/opentsdb Images ● http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html ● http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg ● http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg ● http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG ● http://upload.wikimedia.org/wikipedia/commons/4/42/PostItNotePad.JPG ● http://openclipart.org/image/300px/svg_to_png/68557/green_leaf_icon.png ● http://nickapedia.com/2012/06/05/api-all-the-things-razor-api-wiki/ ● http://lego.cuusoo.com/ideas/view/96 ● http://upload.wikimedia.org/wikipedia/commons/3/35/KL_Cyrix_FasMath_CX83D87.jpg ● http://www.flickr.com/photos/smkybear/4624182536/