SlideShare a Scribd company logo
Scalable Secure Time Series Database
https://NationalSecurityAgency.github.io/timely
Overview
lBuilt on Apache Accumulo
– Proven Security, Scale & Reliability
lUses Netty for communication protocols
– Widely adopted, easy to integrate
lProvides secure access to labeled data
– Easily customized to meet unique architectures
History
lIntegrated OpenTSDB with Apache Accumulo
– Using Eric Newtons shim code
– Seemed to have issues with scale
– FAIL - Could not get past StackOverflowError
•(OpenTSDB issue #334)
lDecided to write it from scratch
– Keep Grafana
– Use Grafana OpenTSDB datasource plugin
lHad something working in 2 weeks
Simple Architecture
lInsert data points
lSubscribe to data points
lQuery for aggregated data points
Timely
Ingest Subscribe
Time Series
Application Interfaces
lSupports multiple protocols
– udp, tcp, https, websocket
lOperations for storing data
– All protocols, security tag optional
lOperations for working with time series data
– https and websocket
lOperations for subscribing to data
– websocket only
Timely Input Format (Text)
lSimple text based on OpenTSDB put format:
put <metric> <timestamp> <value> <tag>[,<tag>...]
lExample
put sys.cpu.idle 1469735914000 25.0 host=s01n04 rack=s01 instance=0
lSupported in all protocols
lviz tag used to label data
– viz=private
Timely Input Format (Binary)
lBinary format uses Google FlatBuffers encoding
lIDL file located in the source code
lGenerate client code in multiple languages
lCurrently supported in UDP and TCP protocols
Sending Data to Timely
lSend data directly from your application
lCan use existing collection agents:
– OpenTSDB Tcollector
– CollectD
lCan leverage StatsD servers also
– HADOOP-12360 (StatsD Metrics2 sink)
Storage Format
lMeta Table
– Stores unique metric and tag information
lMetrics Table
– Stores individual metric data
– Each data point stored N ways, N = # tags
lSeveral bytes to store each key
– Run Length Encoding
– Compression
Visualizing Time Series Data
lTimely built to work with Grafana
lTimely App for Grafana
– Drop it into the Grafana plugins directory
– Provides Timely data source
– Integrates security features into Grafana
– Example dashboards provided
Timely App – Data Sources
lDefine Timely Data Sources
lTest Connectivity
Timely App – Menu Items
•Login to defined data source
lView Metric Names / Tags
Timely App – Login
lTop – Login using client certificates
lBottom – Login using username / password
Sample Dashboards
lTimely App included dashboards:
– Timely Status
– System Overview
– Hadoop Overview
– Accumulo Overview
System Overview
System Overview (cont.)
HDFS NameNode Metrics
HDFS DataNode Metrics
HDFS DataNode Metrics (cont.)
Accumulo Overview
Accumulo Overview (cont.)
Subscribing to Data
lSubscription API over WebSocket protocol
– WebSocket is a bi-directional protocol
– Timely uses secure WebSockets (wss)
lCreate connection and subscribe to:
– Data for specific metric names
– Data for a specific time window
– Optionally, data that matches tag names and values
lCan register multiple subscriptions
lRemove subscriptions when appropriate
Security - Implementation
lTimely stores the labels provided in the viz tag
– Timely only calls flatten() on the CV for consistent
ordering
lSpring Security enables users to plug in their
authentication mechanism and role provider
lWorkflow:
– User logs into Timely via /login HTTPS endpoint
– User authenticated via Spring Security
– HTTP secure session cookie returned for future API
calls
Security Configuration
lAnonymous access configurable
lSSL provider: JDK or OpenSSL
lSSL file locations and passwords
lSSL ciphers
lSession cookie expiration
lCORS properties
Transport Security
lHTTP Strict Transport Security (HSTS)
– Accessing via http will redirect to HTTPS
– Rule stored in browser for configured time
lHTTPS
lWSS
Modes of Operation
lAnonymous access enabled
– Unauthenticated users only see unlabled data
– Authenticated users see what they are allowed
lAnonymous access disabled
– Unauthenticated users receive an error message
– Authenticated users see what they are allowed
Roadmap
lSummarization of historical data
lNew Time Series API
– Move away from OpenTSDB API
– Add additional features
lTimely Client
– Make subscribing to data easier
– Enable analytics to be easily written
lEnrichment
– Allow for user supplied information about time series
lSupport Grafana annotations
Deploying Timely
lJava 8 required for Accumulo and Timely
lTested with Accumulo 1.7.x and Hadoop 2.6
lStandaloneMode
– Uses Mini Accumulo Cluster
– Useful for development and testing
– Data lost across restarts
lNon-Standalone Mode
– 1+ Timely Servers
Deployment #1
lSetup:
– 1 Timely Server
– Accumulo 1.7.1, 26 Tservers on single disk hosts
lTimely server receiving 2.75M metrics/min
l Inserting 20.3M keys/min (338K / sec)
– @10:1 ratio inserted to received
l2.2T keys in the metrics table
– 8.75TB unreplicated
– @ 4.3 bytes per key, ~ 40 bytes per metric
Deployments #2
lSetup:
– 2 Timely servers
– Accumulo 1.7.1, 31 TabletServers on single disk
hosts
lTimely servers receiving 10M metrics/minute
lInserting 71M keys/minute (1.18M / sec)
– @ 7:1 ratio inserted to received
l1.91T keys in the metrics table
– 7.47TB unreplicated
– @4.3 bytes per key, ~ 30 bytes per metric
Questions?

More Related Content

PDF
Accumulo Summit 2016: Cryptographically Enforcing Visibility Fields
PPTX
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
PDF
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
PPTX
Improving Organizational Knowledge with Natural Language Processing Enriched ...
PPTX
Lightning Fast Analytics with Hive LLAP and Druid
PPTX
Integrating Apache Phoenix with Distributed Query Engines
PDF
Reliable and Scalable Data Ingestion at Airbnb
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Accumulo Summit 2016: Cryptographically Enforcing Visibility Fields
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Improving Organizational Knowledge with Natural Language Processing Enriched ...
Lightning Fast Analytics with Hive LLAP and Druid
Integrating Apache Phoenix with Distributed Query Engines
Reliable and Scalable Data Ingestion at Airbnb
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive

What's hot (20)

PDF
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
PPTX
Flink Case Study: Bouygues Telecom
PDF
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
PPTX
IOT, Streaming Analytics and Machine Learning
PDF
Cooperative Data Exploration with iPython Notebook
PPTX
Assaf Araki – Real Time Analytics at Scale
PDF
Apache Metron in the Real World
PPSX
Apache metron - An Introduction
PPTX
What the #$* is a Business Catalog and why you need it
PPTX
Designing and Implementing your IOT Solutions with Open Source
PDF
Building Enterprise Grade Applications in Yarn with Apache Twill
PPTX
In Flux Limiting for a multi-tenant logging service
PPTX
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
PPTX
Detecting Hacks: Anomaly Detection on Networking Data
PDF
The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...
PDF
Data Science with the Help of Metadata
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PDF
Enterprise Metadata Integration
PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
Flink Case Study: Bouygues Telecom
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Lego-like building blocks of Storm and Spark Streaming Pipelines
IOT, Streaming Analytics and Machine Learning
Cooperative Data Exploration with iPython Notebook
Assaf Araki – Real Time Analytics at Scale
Apache Metron in the Real World
Apache metron - An Introduction
What the #$* is a Business Catalog and why you need it
Designing and Implementing your IOT Solutions with Open Source
Building Enterprise Grade Applications in Yarn with Apache Twill
In Flux Limiting for a multi-tenant logging service
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Detecting Hacks: Anomaly Detection on Networking Data
The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...
Data Science with the Help of Metadata
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Enterprise Metadata Integration
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Ad

Similar to Accumulo Summit 2016: Timely - Scalable Secure Time Series Database (20)

PDF
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
PDF
Modeling the IoT with TitanDB and Cassandra
PDF
Making sense of your data jug
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
PDF
Leveraging Scala and Akka to build NSDb
PDF
OSMC 2015: Grafana and Future of Metrics Visualization by Torkel Ödegaard
PDF
Drinking from the Firehose - Real-time Metrics
PDF
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
PDF
Chronix: A fast and efficient time series storage based on Apache Solr
PDF
A Fast and Efficient Time Series Storage Based on Apache Solr
PDF
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
PDF
Fall in Love with Graphs and Metrics using Grafana
PDF
Making sense of your data
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PPTX
Sqrrl and Accumulo
PDF
OpenTSDB 2.0
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
PDF
OpenTSDB: HBaseCon2017
PPTX
Real time fraud detection at 1+M scale on hadoop stack
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Modeling the IoT with TitanDB and Cassandra
Making sense of your data jug
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Leveraging Scala and Akka to build NSDb
OSMC 2015: Grafana and Future of Metrics Visualization by Torkel Ödegaard
Drinking from the Firehose - Real-time Metrics
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
Chronix: A fast and efficient time series storage based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Fall in Love with Graphs and Metrics using Grafana
Making sense of your data
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Sqrrl and Accumulo
OpenTSDB 2.0
Update on OpenTSDB and AsyncHBase
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
OpenTSDB: HBaseCon2017
Real time fraud detection at 1+M scale on hadoop stack
Ad

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Business Analytics and business intelligence.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Predictive modeling basics in data cleaning process
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Transcultural that can help you someday.
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Leprosy and NLEP programme community medicine
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Supervised vs unsupervised machine learning algorithms
Optimise Shopper Experiences with a Strong Data Estate.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Analytics and business intelligence.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Database Infoormation System (DBIS).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Predictive modeling basics in data cleaning process
[EN] Industrial Machine Downtime Prediction
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Transcultural that can help you someday.
Introduction to Knowledge Engineering Part 1
oil_refinery_comprehensive_20250804084928 (1).pptx
Leprosy and NLEP programme community medicine

Accumulo Summit 2016: Timely - Scalable Secure Time Series Database

  • 1. Scalable Secure Time Series Database https://NationalSecurityAgency.github.io/timely
  • 2. Overview lBuilt on Apache Accumulo – Proven Security, Scale & Reliability lUses Netty for communication protocols – Widely adopted, easy to integrate lProvides secure access to labeled data – Easily customized to meet unique architectures
  • 3. History lIntegrated OpenTSDB with Apache Accumulo – Using Eric Newtons shim code – Seemed to have issues with scale – FAIL - Could not get past StackOverflowError •(OpenTSDB issue #334) lDecided to write it from scratch – Keep Grafana – Use Grafana OpenTSDB datasource plugin lHad something working in 2 weeks
  • 4. Simple Architecture lInsert data points lSubscribe to data points lQuery for aggregated data points Timely Ingest Subscribe Time Series
  • 5. Application Interfaces lSupports multiple protocols – udp, tcp, https, websocket lOperations for storing data – All protocols, security tag optional lOperations for working with time series data – https and websocket lOperations for subscribing to data – websocket only
  • 6. Timely Input Format (Text) lSimple text based on OpenTSDB put format: put <metric> <timestamp> <value> <tag>[,<tag>...] lExample put sys.cpu.idle 1469735914000 25.0 host=s01n04 rack=s01 instance=0 lSupported in all protocols lviz tag used to label data – viz=private
  • 7. Timely Input Format (Binary) lBinary format uses Google FlatBuffers encoding lIDL file located in the source code lGenerate client code in multiple languages lCurrently supported in UDP and TCP protocols
  • 8. Sending Data to Timely lSend data directly from your application lCan use existing collection agents: – OpenTSDB Tcollector – CollectD lCan leverage StatsD servers also – HADOOP-12360 (StatsD Metrics2 sink)
  • 9. Storage Format lMeta Table – Stores unique metric and tag information lMetrics Table – Stores individual metric data – Each data point stored N ways, N = # tags lSeveral bytes to store each key – Run Length Encoding – Compression
  • 10. Visualizing Time Series Data lTimely built to work with Grafana lTimely App for Grafana – Drop it into the Grafana plugins directory – Provides Timely data source – Integrates security features into Grafana – Example dashboards provided
  • 11. Timely App – Data Sources lDefine Timely Data Sources lTest Connectivity
  • 12. Timely App – Menu Items •Login to defined data source lView Metric Names / Tags
  • 13. Timely App – Login lTop – Login using client certificates lBottom – Login using username / password
  • 14. Sample Dashboards lTimely App included dashboards: – Timely Status – System Overview – Hadoop Overview – Accumulo Overview
  • 22. Subscribing to Data lSubscription API over WebSocket protocol – WebSocket is a bi-directional protocol – Timely uses secure WebSockets (wss) lCreate connection and subscribe to: – Data for specific metric names – Data for a specific time window – Optionally, data that matches tag names and values lCan register multiple subscriptions lRemove subscriptions when appropriate
  • 23. Security - Implementation lTimely stores the labels provided in the viz tag – Timely only calls flatten() on the CV for consistent ordering lSpring Security enables users to plug in their authentication mechanism and role provider lWorkflow: – User logs into Timely via /login HTTPS endpoint – User authenticated via Spring Security – HTTP secure session cookie returned for future API calls
  • 24. Security Configuration lAnonymous access configurable lSSL provider: JDK or OpenSSL lSSL file locations and passwords lSSL ciphers lSession cookie expiration lCORS properties
  • 25. Transport Security lHTTP Strict Transport Security (HSTS) – Accessing via http will redirect to HTTPS – Rule stored in browser for configured time lHTTPS lWSS
  • 26. Modes of Operation lAnonymous access enabled – Unauthenticated users only see unlabled data – Authenticated users see what they are allowed lAnonymous access disabled – Unauthenticated users receive an error message – Authenticated users see what they are allowed
  • 27. Roadmap lSummarization of historical data lNew Time Series API – Move away from OpenTSDB API – Add additional features lTimely Client – Make subscribing to data easier – Enable analytics to be easily written lEnrichment – Allow for user supplied information about time series lSupport Grafana annotations
  • 28. Deploying Timely lJava 8 required for Accumulo and Timely lTested with Accumulo 1.7.x and Hadoop 2.6 lStandaloneMode – Uses Mini Accumulo Cluster – Useful for development and testing – Data lost across restarts lNon-Standalone Mode – 1+ Timely Servers
  • 29. Deployment #1 lSetup: – 1 Timely Server – Accumulo 1.7.1, 26 Tservers on single disk hosts lTimely server receiving 2.75M metrics/min l Inserting 20.3M keys/min (338K / sec) – @10:1 ratio inserted to received l2.2T keys in the metrics table – 8.75TB unreplicated – @ 4.3 bytes per key, ~ 40 bytes per metric
  • 30. Deployments #2 lSetup: – 2 Timely servers – Accumulo 1.7.1, 31 TabletServers on single disk hosts lTimely servers receiving 10M metrics/minute lInserting 71M keys/minute (1.18M / sec) – @ 7:1 ratio inserted to received l1.91T keys in the metrics table – 7.47TB unreplicated – @4.3 bytes per key, ~ 30 bytes per metric