Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData

Sam Dillard, Sales Engineer
Optimizing InﬂuxDB
Performance

© 2017 InﬂuxData. All rights reserved.2 © 2017 InﬂuxData. All rights reserved.2
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda

© 2017 InfluxData. All rights reserved.3
Resource Utilization
• No Specific OS Tuning Required
• IOPS IOPS IOPS
• 70% cpu/mem utilization - need head room for:
• Peak periods
• Compactions
• Backfilling data

❖ Optimizing
➢ Write Method
➢ Schema
❖ Q&A
Agenda

Telegraf
• Lightweight; written in Go
• Plug-in driven
• Optimized for writing to InfluxDB
• Formatting
• Retries
• Modifiable batch sizes and jitter
• Tag sorting
• Preprocessing
• Converting tags to fields, fields to tags
• Regex transformations
• Renaming measurements, tags
• Aggregations (mean, min, max, count, variance, stddev, etc.)

Popular Plugins
Out-of-the-box Custom
Kubernetes (kubelet) HTTP/socket listener
Kube_inventory (apiserver) HTTP (formatted endpoints)
Kafka (consumer) Prometheus (/metrics)
RabbitMQ (consumer) Exec
AMQP (mq metadata) StatsD
Redis
Nginx
HAproxy
Jolokia2

Telegraf
CPU
Mem
Disk
Docker
Kubernetes
/metrics
Kafka
MySQL
Process
-transform
-decorate
-filter
Aggregate
-mean
-min,max
-count
-variance
-stddev
File
InfluxDB
Kafka
CloudWatch
CloudWatch

Parsing
● JSON
● CSV
● Graphite
● CollectD
● Dropwizard
● Form URL-encoded
● Grok

Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Message Queue Telegraf
Kafka
Rabbit
Active
NSQ
AWS Kinesis
Google PubSub
MQTT

Balanced ingestion helps....

Good...
Not so good...

❖ Optimizing
➢ Write Method
➢ Schema
❖ Q&A
Agenda

Schema Design Goals
• By reducing...
– Measurement/tag cardinality
– Information-encoding
– Key lengths
• You increase…
– Write performance
– Query performance
– Readability

“It’s a feature, not a bug...but
features require thinking”
- Richard Laskey, Wayfair

Line Protocol && Schema Insight
<measurement,tagset fieldset timestamp>
● A Measurement is a namespace for like metrics (SQL table)
● What to make a Measurement?
○ Logically-alike metrics; categorization
○ I.e., CPU has metrics has many metrics associated with it
○ I.e., Transactions
■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else…
● What to make a Tag?
○ Metadata; “things you need to `GROUP BY`”
● What to make a Field?
○ Actual metrics
■ Metrics you will visualize or operate on
○ Things that have high value variance...that you don’t need to group

Line Protocol Goals
1) Don’t encode data into Measurements or Tags; indicated by
valuesless key names (value, counter, gauge)
2) Write as many Fields per Line as you can; #1 allows for #2
3) Separate information into primitives; reduce regex grouping
4) Order Tags lexicographically
(Telegraf does all this for you, for the most part)

DON'T ENCODE DATA INTO THE MEASUREMENT NAME
Measurement names like:
Encode that information as tags:
Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000
cpu.server-6.us-west.usage_user value=40.0 1444234982000000000
mem.server-6.us-west.free value=25.0 1444234982000000000
cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000
cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000
mem,host=server-6,region=us-west mem_free=25.0 1444234982000000

DON’T OVERLOAD TAGS (separate into primitives)
BAD
GOOD: Separate out into diﬀerent tags:
xxx
cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000
cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000
cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000
cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000

Use Telegraf as a Graphite parser
Graphite like: cpu.usage.eu-west.idle.percentage 100
With a Telegraf conﬁguration like:
Results in following transformation:
cpu_usage,region=eu-east idle_percentage=100
[[inputs.http_listener_v2]]
data_format = “graphite”
separator = "_"
templates = [
"measurement.measurement.region.field*"
]

stock_prices,symbol=BP price=25.0 1
stock_prices,symbol=CVX price=35.0 1
stock_prices,symbol=XOM price=45.0 1

stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1
stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2

Also smaller payloads:
From:
cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp>
To:
cpu,region=us-west-1,host=hostA,container=containerA
usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0
,usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>

sam@inﬂuxdata.com @SDillard12
THANKS!

❖ Optimizing
➢ Write Method
➢ Schema
➢ Queries
➢ Configuration
❖ Q&A
Agenda

Queries and Shards
• Shard durations should be longer than your longest typical
query
• When thinking about balancing writes/reads:
High Query load Low Query Load
High Write Load Balanced duration Shorter duration
Bursty or Low Write Load Longer duration Balanced duration

Query Performance
• Streaming functions > batch functions
• Batch funcs
• percentile(), spread(), stddev(), median(), mode(), holt-winters
• Stream funcs
• mean(),bottom(),ﬁrst(),last(),max(),top(),count(),etc.
• Distributed functions (clusters only) > local functions
• Distributed
• ﬁrst(),last(),max(),min(),count(),mean(),sum()
• Local
• percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.

Query Performance
• Boundaries!
• Time-bounding and series-bounding with `WHERE` clause
• `SELECT *` generally not a best practice
• Agg functions instead of raw queries
• `SELECT mean(<ﬁeld>)` > `SELECT <ﬁeld>`
• Reduce `GROUP BY time` intervals
• Subqueries
• When appropriate, process data from an already processed subset of
data
• SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time
> now() - 5d GROUP BY time(5m))

Exercise
``` (1)
SELECT usage_user,usage_system
FROM cpu
WHERE time > now() - 7d
GROUP BY time(1m), host
```
``` (2)
FROM cpu
```
● Shard duration = 24h
● Hosts = 1,000
● Retention = 2w
Question: Which is faster?

Answer
● (1) = 10,080 time buckets * 2 Fields * 1,000
hosts = 20,160,000 series-time groups
● (2) = 2,016 time buckets * 2 Fields * 1,000
hosts = 4,032,000 series-time groups

Exercise
``` (1)
FROM cpu
GROUP BY time(1m), host,
container_id,app
```
``` (2)
SELECT *
FROM docker
```
● Shard duration = 7d
● Hosts = 10,000
● Retention = 2w
Question: Which is faster?

Answer
● It depends. Docker measurement has 100+ Fields.
● `SELECT * from query (2) is going to grab every single one of
those from every single host vs. the mere 2 from query (1)

❖ Optimizing
➢ Write Method
➢ Schema
➢ Queries
➢ Configuration
❖ Q&A
Agenda

Tuning Parameters
• Max-concurrent-queries
• Max-select-point
• Max-select-series
• Rate limiting compactions
• Max-concurrent-compactions
• Compact-full-write-cold-duration
• Compact-throughput
• Compact-throughput-burst

Tuning Parameters
• Cache-max-memory-size
• Cache-snapshot-memory-size
• Cache-snapshot-write-cold-duration
• Max-series-per-database
• Max-values-per-tag
• Fine Grained Auth instead of multiple databases

Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData

More Related Content

What's hot (20)

Similar to Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData (20)

More from InfluxData (20)

Recently uploaded (20)

Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData