SlideShare a Scribd company logo
Sam Dillard, Sales Engineer
Optimizing InfluxDB
Performance
© 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.3
Resource Utilization
• No Specific OS Tuning Required
• IOPS IOPS IOPS
• 70% cpu/mem utilization - need head room for:
• Peak periods
• Compactions
• Backfilling data
© 2017 InfluxData. All rights reserved.4 © 2017 InfluxData. All rights reserved.4
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.5 © 2017 InfluxData. All rights reserved.5
© 2017 InfluxData. All rights reserved.6
Telegraf
• Lightweight; written in Go
• Plug-in driven
• Optimized for writing to InfluxDB
• Formatting
• Retries
• Modifiable batch sizes and jitter
• Tag sorting
• Preprocessing
• Converting tags to fields, fields to tags
• Regex transformations
• Renaming measurements, tags
• Aggregations (mean, min, max, count, variance, stddev, etc.)
© 2017 InfluxData. All rights reserved.7
Popular Plugins
Out-of-the-box Custom
Kubernetes (kubelet) HTTP/socket listener
Kube_inventory (apiserver) HTTP (formatted endpoints)
Kafka (consumer) Prometheus (/metrics)
RabbitMQ (consumer) Exec
AMQP (mq metadata) StatsD
Redis
Nginx
HAproxy
Jolokia2
© 2017 InfluxData. All rights reserved.8
Telegraf
CPU
Mem
Disk
Docker
Kubernetes
/metrics
Kafka
MySQL
Process
-transform
-decorate
-filter
Aggregate
-mean
-min,max
-count
-variance
-stddev
File
InfluxDB
Kafka
CloudWatch
CloudWatch
© 2017 InfluxData. All rights reserved.9
Parsing
● JSON
● CSV
● Graphite
● CollectD
● Dropwizard
● Form URL-encoded
● Grok
© 2017 InfluxData. All rights reserved.10
Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Message Queue Telegraf
Kafka
Rabbit
Active
NSQ
AWS Kinesis
Google PubSub
MQTT
© 2017 InfluxData. All rights reserved.11
Balanced ingestion helps....
© 2017 InfluxData. All rights reserved.12
Good...
Not so good...
© 2017 InfluxData. All rights reserved.13
Good...
Not so good...
© 2017 InfluxData. All rights reserved.14 © 2017 InfluxData. All rights reserved.14
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2018 InfluxData. All rights reserved.15
Schema Design Goals
• By reducing...
– Measurement/tag cardinality
– Information-encoding
– Key lengths
• You increase…
– Write performance
– Query performance
– Readability
© 2018 InfluxData. All rights reserved.16
“It’s a feature, not a bug...but
features require thinking”
- Richard Laskey, Wayfair
© 2018 InfluxData. All rights reserved.17
Line Protocol && Schema Insight
<measurement,tagset fieldset timestamp>
● A Measurement is a namespace for like metrics (SQL table)
● What to make a Measurement?
○ Logically-alike metrics; categorization
○ I.e., CPU has metrics has many metrics associated with it
○ I.e., Transactions
■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else…
● What to make a Tag?
○ Metadata; “things you need to `GROUP BY`”
● What to make a Field?
○ Actual metrics
■ Metrics you will visualize or operate on
○ Things that have high value variance...that you don’t need to group
© 2018 InfluxData. All rights reserved.18
Line Protocol Goals
1) Don’t encode data into Measurements or Tags; indicated by
valuesless key names (value, counter, gauge)
2) Write as many Fields per Line as you can; #1 allows for #2
3) Separate information into primitives; reduce regex grouping
4) Order Tags lexicographically
(Telegraf does all this for you, for the most part)
© 2018 InfluxData. All rights reserved.19
DON'T ENCODE DATA INTO THE MEASUREMENT NAME
Measurement names like:
Encode that information as tags:
Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000
cpu.server-6.us-west.usage_user value=40.0 1444234982000000000
mem.server-6.us-west.free value=25.0 1444234982000000000
cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000
cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000
mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
© 2018 InfluxData. All rights reserved.20
DON’T OVERLOAD TAGS (separate into primitives)
BAD
GOOD: Separate out into different tags:
xxx
cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000
cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000
cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000
cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
© 2017 InfluxData. All rights reserved.21
Use Telegraf as a Graphite parser
Graphite like: cpu.usage.eu-west.idle.percentage 100
With a Telegraf configuration like:
Results in following transformation:
cpu_usage,region=eu-east idle_percentage=100
[[inputs.http_listener_v2]]
data_format = “graphite”
separator = "_"
templates = [
"measurement.measurement.region.field*"
]
© 2018 InfluxData. All rights reserved.22
© 2018 InfluxData. All rights reserved.23
© 2017 InfluxData. All rights reserved.24
stock_prices,symbol=BP price=25.0 1
stock_prices,symbol=CVX price=35.0 1
stock_prices,symbol=XOM price=45.0 1
© 2017 InfluxData. All rights reserved.25
stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1
stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
© 2018 InfluxData. All rights reserved.26
Also smaller payloads:
From:
cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp>
To:
cpu,region=us-west-1,host=hostA,container=containerA
usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0
,usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
sam@influxdata.com @SDillard12
THANKS!
© 2017 InfluxData. All rights reserved.28 © 2017 InfluxData. All rights reserved.28
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
➢ Queries
➢ Configuration
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.29 © 2017 InfluxData. All rights reserved.29
Queries and Shards
• Shard durations should be longer than your longest typical
query
• When thinking about balancing writes/reads:
High Query load Low Query Load
High Write Load Balanced duration Shorter duration
Bursty or Low Write Load Longer duration Balanced duration
© 2017 InfluxData. All rights reserved.30 © 2017 InfluxData. All rights reserved.30
Query Performance
• Streaming functions > batch functions
• Batch funcs
• percentile(), spread(), stddev(), median(), mode(), holt-winters
• Stream funcs
• mean(),bottom(),first(),last(),max(),top(),count(),etc.
• Distributed functions (clusters only) > local functions
• Distributed
• first(),last(),max(),min(),count(),mean(),sum()
• Local
• percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
© 2017 InfluxData. All rights reserved.31 © 2017 InfluxData. All rights reserved.31
Query Performance
• Boundaries!
• Time-bounding and series-bounding with `WHERE` clause
• `SELECT *` generally not a best practice
• Agg functions instead of raw queries
• `SELECT mean(<field>)` > `SELECT <field>`
• Reduce `GROUP BY time` intervals
• Subqueries
• When appropriate, process data from an already processed subset of
data
• SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time
> now() - 5d GROUP BY time(5m))
© 2017 InfluxData. All rights reserved.32 © 2017 InfluxData. All rights reserved.32
Exercise
``` (1)
SELECT usage_user,usage_system
FROM cpu
WHERE time > now() - 7d
GROUP BY time(1m), host
```
``` (2)
SELECT usage_user,usage_system
FROM cpu
WHERE time > now() - 7d
GROUP BY time(5m), host
```
● Shard duration = 24h
● Hosts = 1,000
● Retention = 2w
Question: Which is faster?
© 2017 InfluxData. All rights reserved.33 © 2017 InfluxData. All rights reserved.33
Answer
● (1) = 10,080 time buckets * 2 Fields * 1,000
hosts = 20,160,000 series-time groups
● (2) = 2,016 time buckets * 2 Fields * 1,000
hosts = 4,032,000 series-time groups
© 2017 InfluxData. All rights reserved.34 © 2017 InfluxData. All rights reserved.34
Exercise
``` (1)
SELECT usage_user,usage_system
FROM cpu
WHERE time > now() - 1d
GROUP BY time(1m), host,
container_id,app
```
``` (2)
SELECT *
FROM docker
WHERE time > now() - 1d
GROUP BY time(5m), host
```
● Shard duration = 7d
● Hosts = 10,000
● Retention = 2w
Question: Which is faster?
© 2017 InfluxData. All rights reserved.35 © 2017 InfluxData. All rights reserved.35
Answer
● It depends. Docker measurement has 100+ Fields.
● `SELECT * from query (2) is going to grab every single one of
those from every single host vs. the mere 2 from query (1)
© 2017 InfluxData. All rights reserved.36 © 2017 InfluxData. All rights reserved.36
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
➢ Queries
➢ Configuration
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.37
Tuning Parameters
• Max-concurrent-queries
• Max-select-point
• Max-select-series
• Rate limiting compactions
• Max-concurrent-compactions
• Compact-full-write-cold-duration
• Compact-throughput
• Compact-throughput-burst
© 2017 InfluxData. All rights reserved.38
Tuning Parameters
• Cache-max-memory-size
• Cache-snapshot-memory-size
• Cache-snapshot-write-cold-duration
• Max-series-per-database
• Max-values-per-tag
• Fine Grained Auth instead of multiple databases
sam@influxdata.com @SDillard12
THANKS!

More Related Content

PDF
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
PPTX
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
PDF
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
PDF
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
PDF
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
PDF
Inside the InfluxDB storage engine
PDF
InfluxDB Enterprise Architectural Patterns | Craig Hobbs | InfluxData
PPTX
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Inside the InfluxDB storage engine
InfluxDB Enterprise Architectural Patterns | Craig Hobbs | InfluxData
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...

What's hot (20)

PPTX
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
PDF
InfluxData Architecture for IoT | Noah Crowley | InfluxData
PDF
Catalogs - Turning a Set of Parquet Files into a Data Set
PDF
Introduction to InfluxDB
PDF
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
PDF
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
PPTX
IoT Architectural Overview - 3 use case studies from InfluxData
PDF
Kapacitor Stream Processing
PDF
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
PPTX
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
PDF
Setting up InfluxData for IoT
PDF
Virtual training Intro to Kapacitor
PDF
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...
PDF
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
PPTX
Michael DeSa [InfluxData] | Monitoring Methodologies | InfluxDays Virtual Exp...
PDF
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
PPTX
Kapacitor - Real Time Data Processing Engine
PDF
WRITING QUERIES (INFLUXQL AND TICK)
PDF
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
PPTX
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
InfluxData Architecture for IoT | Noah Crowley | InfluxData
Catalogs - Turning a Set of Parquet Files into a Data Set
Introduction to InfluxDB
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
IoT Architectural Overview - 3 use case studies from InfluxData
Kapacitor Stream Processing
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Setting up InfluxData for IoT
Virtual training Intro to Kapacitor
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
Michael DeSa [InfluxData] | Monitoring Methodologies | InfluxDays Virtual Exp...
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Kapacitor - Real Time Data Processing Engine
WRITING QUERIES (INFLUXQL AND TICK)
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Ad

Similar to Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData (20)

PPTX
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
PDF
OPTIMIZING THE TICK STACK
PDF
Optimizing Time Series Performance in the Real World
PDF
Intro to Kapacitor for Alerting and Anomaly Detection
PPTX
Performance Monitoring for the Cloud - Java2Days 2017
PDF
Argus Production Monitoring at Salesforce
PDF
Argus Production Monitoring at Salesforce
PDF
Virtual training intro to InfluxDB - June 2021
PDF
Iperconvergenza come migliora gli economics del tuo IT
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PPTX
Tracking Millions of Heartbeats on Zee's OTT Platform
PDF
OpenTSDB for monitoring @ Criteo
PDF
Macy's: Changing Engines in Mid-Flight
PPTX
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
PDF
What’s eating python performance
PDF
How to Suceed in Hadoop
PDF
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
PDF
Chronix Poster for the Poster Session FAST 2017
PPT
Everything You Need to Know About Sharding
PDF
JFokus Java 9 contended locking performance
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
OPTIMIZING THE TICK STACK
Optimizing Time Series Performance in the Real World
Intro to Kapacitor for Alerting and Anomaly Detection
Performance Monitoring for the Cloud - Java2Days 2017
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
Virtual training intro to InfluxDB - June 2021
Iperconvergenza come migliora gli economics del tuo IT
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Tracking Millions of Heartbeats on Zee's OTT Platform
OpenTSDB for monitoring @ Criteo
Macy's: Changing Engines in Mid-Flight
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
What’s eating python performance
How to Suceed in Hadoop
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
Chronix Poster for the Poster Session FAST 2017
Everything You Need to Know About Sharding
JFokus Java 9 contended locking performance
Ad

More from InfluxData (20)

PPTX
Announcing InfluxDB Clustered
PDF
Best Practices for Leveraging the Apache Arrow Ecosystem
PDF
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
PDF
Power Your Predictive Analytics with InfluxDB
PDF
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
PDF
Build an Edge-to-Cloud Solution with the MING Stack
PDF
Meet the Founders: An Open Discussion About Rewriting Using Rust
PDF
Introducing InfluxDB Cloud Dedicated
PDF
Gain Better Observability with OpenTelemetry and InfluxDB
PPTX
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
PDF
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
PPTX
Introducing InfluxDB’s New Time Series Database Storage Engine
PDF
Start Automating InfluxDB Deployments at the Edge with balena
PDF
Understanding InfluxDB’s New Storage Engine
PDF
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
PPTX
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
PDF
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Announcing InfluxDB Clustered
Best Practices for Leveraging the Apache Arrow Ecosystem
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
Power Your Predictive Analytics with InfluxDB
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
Build an Edge-to-Cloud Solution with the MING Stack
Meet the Founders: An Open Discussion About Rewriting Using Rust
Introducing InfluxDB Cloud Dedicated
Gain Better Observability with OpenTelemetry and InfluxDB
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
Introducing InfluxDB’s New Time Series Database Storage Engine
Start Automating InfluxDB Deployments at the Edge with balena
Understanding InfluxDB’s New Storage Engine
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf

Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData

  • 1. Sam Dillard, Sales Engineer Optimizing InfluxDB Performance
  • 2. © 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 3. © 2017 InfluxData. All rights reserved.3 Resource Utilization • No Specific OS Tuning Required • IOPS IOPS IOPS • 70% cpu/mem utilization - need head room for: • Peak periods • Compactions • Backfilling data
  • 4. © 2017 InfluxData. All rights reserved.4 © 2017 InfluxData. All rights reserved.4 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 5. © 2017 InfluxData. All rights reserved.5 © 2017 InfluxData. All rights reserved.5
  • 6. © 2017 InfluxData. All rights reserved.6 Telegraf • Lightweight; written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregations (mean, min, max, count, variance, stddev, etc.)
  • 7. © 2017 InfluxData. All rights reserved.7 Popular Plugins Out-of-the-box Custom Kubernetes (kubelet) HTTP/socket listener Kube_inventory (apiserver) HTTP (formatted endpoints) Kafka (consumer) Prometheus (/metrics) RabbitMQ (consumer) Exec AMQP (mq metadata) StatsD Redis Nginx HAproxy Jolokia2
  • 8. © 2017 InfluxData. All rights reserved.8 Telegraf CPU Mem Disk Docker Kubernetes /metrics Kafka MySQL Process -transform -decorate -filter Aggregate -mean -min,max -count -variance -stddev File InfluxDB Kafka CloudWatch CloudWatch
  • 9. © 2017 InfluxData. All rights reserved.9 Parsing ● JSON ● CSV ● Graphite ● CollectD ● Dropwizard ● Form URL-encoded ● Grok
  • 10. © 2017 InfluxData. All rights reserved.10 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Message Queue Telegraf Kafka Rabbit Active NSQ AWS Kinesis Google PubSub MQTT
  • 11. © 2017 InfluxData. All rights reserved.11 Balanced ingestion helps....
  • 12. © 2017 InfluxData. All rights reserved.12 Good... Not so good...
  • 13. © 2017 InfluxData. All rights reserved.13 Good... Not so good...
  • 14. © 2017 InfluxData. All rights reserved.14 © 2017 InfluxData. All rights reserved.14 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 15. © 2018 InfluxData. All rights reserved.15 Schema Design Goals • By reducing... – Measurement/tag cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  • 16. © 2018 InfluxData. All rights reserved.16 “It’s a feature, not a bug...but features require thinking” - Richard Laskey, Wayfair
  • 17. © 2018 InfluxData. All rights reserved.17 Line Protocol && Schema Insight <measurement,tagset fieldset timestamp> ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics; categorization ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata; “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ■ Metrics you will visualize or operate on ○ Things that have high value variance...that you don’t need to group
  • 18. © 2018 InfluxData. All rights reserved.18 Line Protocol Goals 1) Don’t encode data into Measurements or Tags; indicated by valuesless key names (value, counter, gauge) 2) Write as many Fields per Line as you can; #1 allows for #2 3) Separate information into primitives; reduce regex grouping 4) Order Tags lexicographically (Telegraf does all this for you, for the most part)
  • 19. © 2018 InfluxData. All rights reserved.19 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem.server-6.us-west.free value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
  • 20. © 2018 InfluxData. All rights reserved.20 DON’T OVERLOAD TAGS (separate into primitives) BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000 cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000 cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000 cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
  • 21. © 2017 InfluxData. All rights reserved.21 Use Telegraf as a Graphite parser Graphite like: cpu.usage.eu-west.idle.percentage 100 With a Telegraf configuration like: Results in following transformation: cpu_usage,region=eu-east idle_percentage=100 [[inputs.http_listener_v2]] data_format = “graphite” separator = "_" templates = [ "measurement.measurement.region.field*" ]
  • 22. © 2018 InfluxData. All rights reserved.22
  • 23. © 2018 InfluxData. All rights reserved.23
  • 24. © 2017 InfluxData. All rights reserved.24 stock_prices,symbol=BP price=25.0 1 stock_prices,symbol=CVX price=35.0 1 stock_prices,symbol=XOM price=45.0 1
  • 25. © 2017 InfluxData. All rights reserved.25 stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1 stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
  • 26. © 2018 InfluxData. All rights reserved.26 Also smaller payloads: From: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp> To: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0 ,usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
  • 28. © 2017 InfluxData. All rights reserved.28 © 2017 InfluxData. All rights reserved.28 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A Agenda
  • 29. © 2017 InfluxData. All rights reserved.29 © 2017 InfluxData. All rights reserved.29 Queries and Shards • Shard durations should be longer than your longest typical query • When thinking about balancing writes/reads: High Query load Low Query Load High Write Load Balanced duration Shorter duration Bursty or Low Write Load Longer duration Balanced duration
  • 30. © 2017 InfluxData. All rights reserved.30 © 2017 InfluxData. All rights reserved.30 Query Performance • Streaming functions > batch functions • Batch funcs • percentile(), spread(), stddev(), median(), mode(), holt-winters • Stream funcs • mean(),bottom(),first(),last(),max(),top(),count(),etc. • Distributed functions (clusters only) > local functions • Distributed • first(),last(),max(),min(),count(),mean(),sum() • Local • percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
  • 31. © 2017 InfluxData. All rights reserved.31 © 2017 InfluxData. All rights reserved.31 Query Performance • Boundaries! • Time-bounding and series-bounding with `WHERE` clause • `SELECT *` generally not a best practice • Agg functions instead of raw queries • `SELECT mean(<field>)` > `SELECT <field>` • Reduce `GROUP BY time` intervals • Subqueries • When appropriate, process data from an already processed subset of data • SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time > now() - 5d GROUP BY time(5m))
  • 32. © 2017 InfluxData. All rights reserved.32 © 2017 InfluxData. All rights reserved.32 Exercise ``` (1) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 7d GROUP BY time(1m), host ``` ``` (2) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 7d GROUP BY time(5m), host ``` ● Shard duration = 24h ● Hosts = 1,000 ● Retention = 2w Question: Which is faster?
  • 33. © 2017 InfluxData. All rights reserved.33 © 2017 InfluxData. All rights reserved.33 Answer ● (1) = 10,080 time buckets * 2 Fields * 1,000 hosts = 20,160,000 series-time groups ● (2) = 2,016 time buckets * 2 Fields * 1,000 hosts = 4,032,000 series-time groups
  • 34. © 2017 InfluxData. All rights reserved.34 © 2017 InfluxData. All rights reserved.34 Exercise ``` (1) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 1d GROUP BY time(1m), host, container_id,app ``` ``` (2) SELECT * FROM docker WHERE time > now() - 1d GROUP BY time(5m), host ``` ● Shard duration = 7d ● Hosts = 10,000 ● Retention = 2w Question: Which is faster?
  • 35. © 2017 InfluxData. All rights reserved.35 © 2017 InfluxData. All rights reserved.35 Answer ● It depends. Docker measurement has 100+ Fields. ● `SELECT * from query (2) is going to grab every single one of those from every single host vs. the mere 2 from query (1)
  • 36. © 2017 InfluxData. All rights reserved.36 © 2017 InfluxData. All rights reserved.36 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A Agenda
  • 37. © 2017 InfluxData. All rights reserved.37 Tuning Parameters • Max-concurrent-queries • Max-select-point • Max-select-series • Rate limiting compactions • Max-concurrent-compactions • Compact-full-write-cold-duration • Compact-throughput • Compact-throughput-burst
  • 38. © 2017 InfluxData. All rights reserved.38 Tuning Parameters • Cache-max-memory-size • Cache-snapshot-memory-size • Cache-snapshot-write-cold-duration • Max-series-per-database • Max-values-per-tag • Fine Grained Auth instead of multiple databases