Chronix as Long-Term Storage for Prometheus

Chronix as long term storage for Prometheus
Florian Lautenschlager, Moritz Kammerer
@flolaut, @phxql

Prometheus
Cloud Native Application
Real-time monitoring and alerting for cloud native apps to detect
anomalies close to their occurrence and to initiate measures.
TIMENOW 14 Days

Beyond real-time monitoring of cloud native apps?
Nothing more to do?

Prometheus
TIMENOW THEN
Real-time monitoring and alerting for cloud native apps to detect
anomalies close to their occurrence and to initiate measures.
Lossless long term storage to store
data forever allowing analyses
beyond real-time monitoring!
Chronix

Agenda
■ Some words about Chronix, its Architecture, its Features, and its Performance.
■ How did we built the integration with Prometheus.
■ Showcase: Prometheus, Chronix Ingester, Chronix, and Grafana

Chronix is more than just a simple time series database. It’s a
time series processing tool stack for all purposes.

Time Series Database: What’s that?
■ Definition 1: “A sample s is a tuple of {timestamp, value}, where the
value could be any kind of object.”
■ Definition 2: “A time series T is an arbitrary list of chronological
ordered samples of one value type”.
■ Definition 3: “A chunk C is a chronological ordered part of a time
series.”
■ Definition 4: “A time series database TSDB is a specialized database
for storing and retrieving time series in an efficient and optimized
way”.
s
{t,v}
1
T
{s1,s2}
T
CT
T1
C1,1
C1,2
TSDB
T3C2,2
T1 C2,1

Chronix’ architecture enables both efficient storage of time
series and millisecond range queries.
(1)
Semantic Transformation
(2)
Attributes and Chunks
(3)
Basic Compression
(4)
Multi-Dimensional
Storage
Record
data:<chunk>
attributes
Record
data:compressed
<chunk>
attributes
Record Storage
68 Billion Points
1 Mio. Chunks *
68.000 Points
~ 96% Compression
Optional

The key data type of Chronix is called a record.
It stores a compressed time series chunk and its attributes.
record{
data:compressed{<chunk>}
//technical fields
id: 3dce1de0−...−93fb2e806d19
version: 1501692859622883300
start: 1427457011238
end: 1427471159292
//optional attributes
host: prodI5
process: scheduler
group: jmx
metric: heapMemory.Usage.Used
max: 896.571
}
Data:compressed{<chunk of time series data>}
■ Time Series: timestamp, numeric value
■ Traces: calls, exceptions, …
■ Logs: access, method runtimes
■ Complex data: models, test coverage,
anything else…
Optional attributes
■ Arbitrary attributes for the time series
■ Attributes are indexed
■ Make the chunk searchable
■ Can contain pre-calculated values

Chronix provides specialized aggregations, transformations,
and analyses for time series that are commonly used.
Aggregations
■ Min / Max / Average / Sum / Count
■ Percentile
■ Standard Deviation
■ First / Last
■ Range
Analyses
■ Trend Analysis
Using a linear regression model
■ Outlier Analysis
Using the IQR
■ Frequency Analysis
Check occurrence within a time range
■ Fast Dynamic Time Warping
Time series similarity search
■ Symbolic Aggregate Approximation
Similarity and pattern search
Transformations
■ Bottom/Top n-values
■ Moving average
■ Divide / Scale
■ Downsampling
Many more
Many more

Only scalar values? One size fits all? No! What about logs,
traces, and others? No problem – Just do it yourself!
■ Chronix Time Series
■Time Series framework that is used by Chronix.
■Time Series Types:
■Numeric: Doubles (the time series known to be the default)
■More to come.
public interface TimeSeriesConverter<T> {
/**
* Shall create an object of type T from the given binary time series.
*/
T from(BinaryTimeSeries binaryTimeSeriesChunk, long queryStart, long queryEnd);
/**
* Shall do the conversation of the custom time series T into the binary time series that is
stored.
*/
BinaryTimeSeries to(T timeSeriesChunk);
}

That‘s the easiest way to play with Chronix. A single instance of
Chronix on a single node.
Java 8 (JRE)
Chronix - 0.4
Solr - 6.2.1
Lucene
Solr plugins
8983
Your Computer
Chronix-Query-Handler
Chronix-Ingestion-Handler
Chronix-Retention
OpenTSDB
Prometheus
KairosDB
HTTP
Chronix-Compaction-Handler
Chronix Client
InfluxDB
Graphite
Go
Java

Code-Slide: How to set up Chronix, ask for time series data, and
call some server-side aggregations in Java.
■ Create a connection to Solr and set up Chronix
■ Define and range query and stream its results
■ Call some aggregations
solr = new HttpSolrClient("http://localhost:8913/solr/chronix/")
chronix = new ChronixClient(new MetricTimeSeriesConverter<>(),
new ChronixSolrStorage(200, groupBy, reduce))
query = new SolrQuery("metric:*Load*")
chronix.stream(solr,query)
query.addFilterQuery("function=max,min,count,sdiff")
stream = chronix.stream(solr,query) Signed Difference:
First=20, Last=-100
 -80
Group chunks on a combination
of attributes and reduce them to
a time series.
Get all time series whose
metric contains Load

Compared to other time series databases Chronix‘ results for
our use case are outstanding.
■ We have evaluated Chronix with:
■InfluxDB, OpenTSDB, and KairosDB
■All databases are configured as single node
■ Storage demand for 108 GB of raw csv time
series data.
■Chronix (8.7 GB) saves 20% – 84% of the space
other time series databases.
■ Query times on imported data.
■73% – 92% faster on data retrieval.
■80% – 97% faster on a mix of analyses.
■ Memory footprint: after start, max during
import, max during query mix
■Chronix takes 1.6 times less memory than
the best alternative.

The hard facts. For more details I suggest you to read our
research paper about Chronix.
Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in
Operational Data
FAST 2017 (submitted)

17
Let‘s dig into Chronix Ingesters’ internals.
Image Credit: http://www.taringa.net/posts/ciencia-educacion/12656540/La-Filosofia-del-Dr-House-2.html

Big Picture. It’s a simply and scalable architecture.
Prometheus
Standard Prometheus
Installation
Chronix ServerChronix Ingester
• Collects metrics from
various services.
• Writes them to its
default storage
• Writes them using the
standard remote write
interface to Chronix
Ingester
• Collects samples in
batches and writes
them later to Chronix
with an ideal batch size
• Writes checkpoints to
disk to avoid loss of
data.
• Scales easily
• Lossless long term
storage
• Data distribution
(Apache Solr)
• Rich set of analyses
functions for data
analytics beyond real-
time monitoring.
Chronix Chronix

Single Host
Prometheus Chronix ServerChronix Ingester
In-Memory
Everything runs on a single machine. Small. Simple. Beautiful.
S S S B B B
S Sample: {t,v}
B Batch: [{t,v},{t,v},{t,v}]

Single Host
Prometheus
Chronix Server
Chronix Ingester
In-Memory
Once per Prometheus on a single host.
Chronix Ingester
In-Memory
Prometheus
S Sample: {t,v}

Single Host
Prometheus
In-Memory
Chronix Ingester Singleton ;-)
Prometheus
S Sample: {t,v}
B B B

Single Host
Prometheus
Chronix Server
Chronix Ingester
In-Memory
Chronix Ingester Cloud behind a proxy to serve multiple
Prometheus servers.
Prometheus
S Sample: {t,v}
N
G
I
N
X
Chronix Ingester
In-Memory
Prometheus
Prometheus

Single Host
Single Host
Single HostSingle Host
Prometheus
In-Memory
Cloud Mode: Multiple Prometheus Servers, One Chronix Ingester
per Host, A Chronix Server Cloud
Prometheus
N
G
I
N
X
Chronix Ingester
In-Memory
Prometheus
Prometheus Chronix Server Cloud
M
a
s
t
e
r

Architectural Key Factor: The Chronix Ingestor
■ Small Go Program
■Binary Size: 8.5 MB
■Lines of Code: ~ 720 LoC
■Scales easily: Copy, Execute
■ Handles writes from Prometheus
■Just a small configuration:
remote_write: url:
http://<host>:<port>/ingest
■ Batches samples in memory
■Prometheus sends single samples.
■Chronix needs large chunks (n single
samples) to work well
■Max Batch Age
■5M, 12H, ..
■ Crash and restart resilience
■In-memory is dangerous. The Ingester
holds some amount of transient state
■Regularly writes checkpoints of the entire
in-memory state to disk
■Latest checkpoint is loaded on restart

Chronix loves Chunks. Hence the Ingester batches samples.

The data models for Prometheus and Chronix are similar.
■ Prometheus
■Uses so called labels (key-value pairs) to store dimensional values
■Are added dynamically
■Stores samples (pairs of timestamp and scalar value)
■ Chronix
■Uses attributes (key-value pairs) to store dimensional values
■Schema, Schema less, Dynamic Fields, etc.
■Stores samples of timestamp an any value type: scalar, trace, string, etc.

An example Chronix schema to define the available fields.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="Chronix" version="1.5">
<types>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="binary" class="solr.BinaryField"/>
</types>
<fields>

<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="start" type="long" indexed="true" stored="true" required="true"/>
<field name="end" type="long" indexed="true" stored="true" required="true"/>
<field name="data" type="binary" indexed="true" stored="true" required="false"/>
<field name="metric" type="string" indexed="true" stored="true" required="true"/>

<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<solrQueryParser defaultOperator="OR"/>
</schema>
Definition of types
Available Fields
Prometheus labels are strings. Chronix Ingester creates them in
Chronix Server dynamically using the dynamicField *_s.
Prometheus_Label -> Chronix_Label
host -> host_s

Showcase: Prometheus, Chronix Ingester, Chronix and Grafana
Prometheus Chronix ServerChronix Ingester
In-Memory
S S S
Grafana
B B B

Disk usage: 11 Days of Data
112,815,835 Samples
Prometheus: ~ 786 MB (whole data directory)
Chronix: ~ 265 MB (without compaction)
A few words about performance in our showcase.

Compaction Effects.
Compaction
Points per
Chunk
Amount of
Records
Disk Usage in
MB
Compaction Time in
Seconds
no -1 610355 265 0
yes 100 1422369 357 134
yes 500 284815 187 75
yes 1000 142573 160 93
yes 5000 28850 131 69
yes 10000 14797 126 61
yes 25000 6408 123 61
yes 100000 2051 121 60
yes 500000 920 119 63
Contains about 112 points per chunk without compaction!

CPU usage: 4 Cores available (= 400 % Max)

Memory consumption (max. 8 G)
Ingester

Using the data source plugins for Chronix and Prometheus.

Ingester Health: Everything Green!

Short Term Data in Prometheus.
Long Term Data in Chronix.
See the difference?

Everything is open source and free to everyone.
The code is the truth.
Chronix Website: www.chronix.io
Chronix Github: https://github.com/ChronixDB
- Ingester: https://github.com/ChronixDB/chronix.ingester
Questions?
- Twitter: @ChronixDB, @flolaut, @phxql
- Slack: https://qaware.slack.com/messages/chronix/

Now it’s your turn.
Now it’s your turn.

Chronix as Long-Term Storage for Prometheus

More Related Content

What's hot (20)

Similar to Chronix as Long-Term Storage for Prometheus (20)

More from QAware GmbH (20)

Recently uploaded (20)

Chronix as Long-Term Storage for Prometheus