SlideShare a Scribd company logo
Michael DeSa - Software Engineer
Downsampling your data
© 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2
✓ Continuous Queries (CQs)
✓ Create custom CQs
✓ Retention Policies (RPs)
✓ Create custom RPs
✓ Combine CQs and RPs to manage
downsampling and data retention
✓ Common issues with CQs and RPs
Agenda
© 2017 InfluxData. All rights reserved.3
Continuous Queries
¨ What are Continuous Queries (CQs)?
¨ InfluxQL queries that run automatically and periodically on real-time data and store
query results in a specified measurement
¨ Why would I use CQs?
¨ Automatic downsampling and pre-calculating expensive queries
An Overview
© 2017 InfluxData. All rights reserved.4
Continuous Queries
The Basic Syntax
CREATE CONTINUOUS QUERY <cq_name>
ON <database_name>
BEGIN
SELECT <function>(<stuff>)
INTO <destination_measurement>
FROM <source_measurement>
GROUP BY time(<interval>)
END
1
2
3
© 2017 InfluxData. All rights reserved.5
Continuous Queries
The Basic Syntax: In Practice
CREATE CONTINUOUS QUERY "average_cpu_usage" ON "telegraf" BEGIN
SELECT MEAN("usage_idle") INTO "ave_cpu" FROM "cpu" GROUP BY time( 1m)
END
> SELECT "usage_idle" FROM "cpu"
name: cpu
time usage_idle
---- ----------
2017-02-07T23:14:00Z 99.599599599
2017-02-07T23:14:10Z 99.699398797
2017-02-07T23:14:20Z 99.699699699
2017-02-07T23:14:30Z 99.600000000
2017-02-07T23:14:40Z 99.500000000
2017-02-07T23:14:50Z 99.799599198
2017-02-07T23:15:00Z 99.600000000
2017-02-07T23:15:10Z 99.400599400
2017-02-07T23:15:20Z 99.600000000
2017-02-07T23:15:30Z 99.699699699
2017-02-07T23:15:40Z 99.500000000
2017-02-07T23:15:50Z 99.699699699
> SELECT "mean" FROM "ave_cpu"
name: ave_cpu
time mean
---- ----
2017-02-07T23:14:00Z 99.649716215
2017-02-07T23:15:00Z 99.583333133
© 2017 InfluxData. All rights reserved.6
Continuous Queries
The Basic Syntax: In Practice
Feb 07 23:15:00 executing continuous query
average_cpu_usage
SELECT […] WHERE time >= '2017-02-07T23:14:00Z' AND
time < '2016-08-29T23:15:00Z' GROUP BY time(1h)
Feb 07 23:16:00 executing continuous query
average_cpu_usage
SELECT […] WHERE time >= '2017-02-07T23:15:00Z' AND
time < '2016-08-29T23:16:00Z' GROUP BY time(1h)
The basic CQ:
● Executes at the same interval as the GROUP BY time() interval.
● Executes a single query that covers the time range between
now() and now() minus the GROUP BY time() interval.
© 2017 InfluxData. All rights reserved.7
Continuous Queries
The Advanced Syntax
CREATE CONTINUOUS QUERY <cq_name>
ON <database_name>
RESAMPLE
EVERY <interval>
FOR <interval>
BEGIN
SELECT <function>(<stuff>)
INTO <destination_measurement>
FROM <source_measurement>
GROUP BY time(<interval>)
END
1
2
3
4
© 2017 InfluxData. All rights reserved.8
Continuous Queries
The Advanced Syntax: In Practice
CREATE CONTINUOUS QUERY "average_cpu_usage" ON "telegraf"
RESAMPLE EVERY 2m FOR 3m BEGIN
SELECT MEAN("usage_idle") INTO "ave_cpu" FROM “cpu" GROUP BY time(1m)
END
> SELECT "usage_idle" FROM "cpu"
name: cpu
time usage_idle
---- ----------
2017-02-07T23:14:00Z 99.599599599
2017-02-07T23:14:10Z 99.699398797
2017-02-07T23:14:20Z 99.699699699
2017-02-07T23:14:30Z 99.600000000
2017-02-07T23:14:40Z 99.500000000
2017-02-07T23:14:50Z 99.799599198
2017-02-07T23:15:00Z 99.600000000
2017-02-07T23:15:10Z 99.400599400
2017-02-07T23:15:20Z 99.600000000
2017-02-07T23:15:30Z 99.699699699
2017-02-07T23:15:40Z 99.500000000
2017-02-07T23:15:50Z 99.699699699
> SELECT "mean" FROM "ave_cpu"
name: ave_cpu
time mean
---- ----
2017-02-07T23:14:00Z 99.649716215
2017-02-07T23:15:00Z 99.583333133
© 2017 InfluxData. All rights reserved.9
Continuous Queries
The Advanced Syntax: In Practice
Feb 07 23:16:00 executing continuous query
average_cpu_usage
SELECT […] WHERE time >= '2017-02-07T23:13:00Z' AND
time < '2016-08-29T23:16:00Z' GROUP BY time(1h)
Feb 07 23:18:00 executing continuous query
average_cpu_usage
SELECT […] WHERE time >= '2017-02-07T23:15:00Z' AND
time < '2016-08-29T23:18:00Z' GROUP BY time(1h)
In the advanced syntax:
The EVERY interval determines how often InfluxDB executes the CQ.
The FOR interval determines the time range over which the CQ runs
queries.
© 2017 InfluxData. All rights reserved.10
Retention Policies (RPs)
¨ What What are RPs?
¨ The part of InfluxDB’s data structure that describe for how long InfluxDB keeps
data.
¨ Why would I use RPs?
¨ Expire unneeded data.
¨
An Overview
© 2017 InfluxData. All rights reserved.11
Retention Policies
The Syntax
CREATE RETENTION POLICY <retention_policy_name>
ON <database_name>
DURATION <duration>
REPLICATION <n>
[SHARD DURATION <duration>] [DEFAULT]
DURATION units:
u microseconds
ms milliseconds
s seconds
m minutes
h hours
d days
w weeks
INF
REPLICATION
settings:
Single instance:
no effect
Clustering: RF
<= data nodes
SHARD DURATION
settings:
https://docs.inf
luxdata.com/infl
uxdb/latest/conc
epts/schema_and_
data_layout/#sha
rd-group-duratio
n-management
© 2017 InfluxData. All rights reserved.12
Retention Policies
The Syntax: In Practice
> SHOW RETENTION POLICIES ON "telegraf"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 true
other 24h0m0s 1h0m0s 1 false
CREATE RETENTION POLICY "other" ON "telegraf" DURATION
1d REPLICATION 1
© 2017 InfluxData. All rights reserved.13
Retention Policies
The Syntax: In Practice
Rebalance a cluster:
https://docs.influxdata.com/enterprise/latest/guides/r
ebalance/
ALTER RETENTION POLICY <retention_policy_name>
ON <database_name>
[DURATION <duration> |
REPLICATION <n> |
SHARD DURATION <duration> |
DEFAULT]
© 2017 InfluxData. All rights reserved.14
Retention Policies
The Syntax: In Practice
> SHOW RETENTION POLICIES ON "telegraf"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 true
other 48h0m0s 1h0m0s 1 false
ALTER RETENTION POLICY "other" ON "telegraf" DURATION
2d
© 2017 InfluxData. All rights reserved.15
Continuous Queries & Retention Policies: A Case Study
¨ Downsample 10-second resolution Telegraf data to 5-minute resolution data
¨ Store the 10-second resolution data for one week
¨ Store the 5-minute resolution data for four weeks
¨ What you need
a. A working InfluxDB instance
© 2017 InfluxData. All rights reserved.16
Continuous Queries & Retention Policies
¨ Step 1: Create a database
A Case Study
> SHOW DATABASES
name: databases
name
----
telegraf
_internal
> CREATE DATABASE "telegraf"
>
© 2017 InfluxData. All rights reserved.17
Continuous Queries & Retention Policies
¨ Step 2: Create a one-week retention policy
A Case Study
> SHOW RETENTION POLICIES ON "telegraf"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 false
one_week 168h0m0s 1h0m0s 1 true
> CREATE RETENTION POLICY "one_week" ON "telegraf"
DURATION 1w REPLICATION 1 DEFAULT
© 2017 InfluxData. All rights reserved.18
Continuous Queries & Retention Policies
¨ Step 3: Create a four-week retention policy
A Case Study
> SHOW RETENTION POLICIES ON "telegraf"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 false
one_week 168h0m0s 24h0m0s 1 true
four_week 672h0m0s 24h0m0s 1 false
> CREATE RETENTION POLICY "four_week" ON "telegraf"
DURATION 4w REPLICATION 1
>
© 2017 InfluxData. All rights reserved.19
Continuous Queries & Retention Policies
¨ Step 4: Create a continuous query
A Case Study
Fully-qualify a measurement:
"<database>"."<retention_policy>"."<measurement>"
> CREATE CONTINUOUS QUERY "ave_usage" ON "telegraf"
BEGIN
SELECT MEAN("usage_idle")
INTO "telegraf"."four_week"."ave_cpu"
FROM "telegraf"."one_week"."cpu"
WHERE "cpu" = 'cpu-total' GROUP BY time(5m)
END
© 2017 InfluxData. All rights reserved.20
Continuous Queries & Retention Policies
¨ Step 5: Install Telegraf
A Case Study
$ service telegraf start
$ wget
https://dl.influxdata.com/telegraf/releases/telegraf_1
.2.1_amd64.deb
$ sudo dpkg -i telegraf_1.2.1_amd64.deb
© 2017 InfluxData. All rights reserved.21
Continuous Queries & Retention Policies
¨ Step 6: Wait a bit… (about five minutes)
A Case Study
© 2017 InfluxData. All rights reserved.22
Continuous Queries & Retention Policies
¨ Step 7: Confirm your downsample/data expiration
A CaseStudy
> SELECT "usage_idle" FROM
"telegraf"."one_week"."cpu" LIMIT 3
name: cpu
time usage_idle
---- ----------
2017-02-08T17:01:20Z 99.49949949949996
2017-02-08T17:01:20Z 99.49949949949996
2017-02-08T17:01:30Z 99.59879638916786
> SELECT "mean" FROM
"telegraf"."four_week"."ave_cpu" LIMIT 3
name: ave_cpu
time mean
---- ----
2017-02-08T17:30:00Z 99.20285887894339
2017-02-08T17:35:00Z 99.31273835165075
2017-02-08T17:40:00Z 99.29954474534229
> USE "telegraf"
Using database telegraf
> SHOW MEASUREMENTS
name: measurements
name
----
ave_cpu
cpu
disk
diskio
kernel
mem
processes
swap
system
© 2017 InfluxData. All rights reserved.23
Continuous Queries
¨ Issue 1: Working with historical data
Some Common Issues
> SELECT MAX("water_level")
INTO "maximums" FROM "h2o_feet"
WHERE time >= '2015-08-18T00:00:00Z' AND time <=
'2015-08-18T00:18:00Z'
GROUP BY time(12m)
name: result
time written
---- -------
1970-01-01T00:00:00Z 2
> SELECT * FROM "maximums"
name: maximums
time max
---- ---
2015-08-18T00:00:00Z 8.12
2015-08-18T00:12:00Z 7.887
> SELECT "water_level","location" FROM "h2o_feet"
LIMIT 8
name: h2o_feet
time water_level location
---- ----------- --------
2015-08-18T00:00:00Z 8.12 coyote_creek
2015-08-18T00:00:00Z 2.064 santa_monica
2015-08-18T00:06:00Z 8.005 coyote_creek
2015-08-18T00:06:00Z 2.116 santa_monica
2015-08-18T00:12:00Z 7.887 coyote_creek
2015-08-18T00:12:00Z 2.028 santa_monica
2015-08-18T00:18:00Z 7.762 coyote_creek
2015-08-18T00:18:00Z 2.126 santa_monica
© 2017 InfluxData. All rights reserved.24
Continuous Queries
¨ Issue 2: Missing data in CQ results
Some Common Issues
> SELECT * FROM "french_bulldogs"
name: french_bulldogs
---------------------
time color name
2016-06-26T20:17:59Z grey rumpelstiltskin
2016-06-26T20:17:59Z peach princess
> SELECT "name","color" INTO "dogs" FROM
"french_bulldogs"
name: result
------------
time written
1970-01-01T00:00:00Z 2
> SELECT * FROM "dogs"
name: dogs
----------
time color name
2016-06-26T20:17:59Z peach princess
1 2
3
© 2017 InfluxData. All rights reserved.25
Continuous Queries
¨ Issue 2: Missing data in CQ results
Some Common Issues
> SELECT * FROM "french_bulldogs"
name: french_bulldogs
---------------------
time color name
2016-06-26T20:17:59Z grey rumpelstiltskin
2016-06-26T20:17:59Z peach princess
> SELECT "name" INTO "dogs" FROM "french_bulldogs"
GROUP BY "color"
name: result
------------
time written
1970-01-01T00:00:00Z 2
> SELECT * FROM "dogs"
name: dogs
----------
time color name
2016-06-26T20:17:59Z grey rumpelstiltskin
2016-06-26T20:17:59Z peach princess
1 2
3
© 2017 InfluxData. All rights reserved.26
Continuous Queries
¨ Issue 3: Configuring the CQ schedule
Some Common Issues
SELECT MEAN("sunflowers")
FROM "flower_orders"
WHERE time >= '2016-08-29T18:00:00Z' AND time <=
'2016-08-29T19:45:00Z' GROUP BY time(1h)
name: flower_orders
-------------------
time sunflowers
2016-08-29T18:00:00Z 34
2016-08-29T18:15:00Z 28
2016-08-29T18:30:00Z 19
2016-08-29T18:45:00Z 20
2016-08-29T19:00:00Z 56
2016-08-29T19:15:00Z 76
2016-08-29T19:30:00Z 29
2016-08-29T19:45:00Z 90
2016-08-29T20:00:00Z 70
name: flower_orders
-------------------
time mean
2016-08-29T18:00:00Z 25.25
2016-08-29T19:00:00Z 62.75
© 2017 InfluxData. All rights reserved.27
Continuous Queries
¨ Issue 3: Configuring the CQ schedule
Some Common Issues
SELECT MEAN("sunflowers")
FROM "flower_orders"
WHERE time >= '2016-08-29T18:15:00Z' AND time <=
'2016-08-29T19:45:00Z' GROUP BY time(1h)
name: flower_orders
-------------------
time sunflowers
2016-08-29T18:00:00Z 34
2016-08-29T18:15:00Z 28
2016-08-29T18:30:00Z 19
2016-08-29T18:45:00Z 20
2016-08-29T19:00:00Z 56
2016-08-29T19:15:00Z 76
2016-08-29T19:30:00Z 29
2016-08-29T19:45:00Z 90
2016-08-29T20:00:00Z 70
name: flower_orders
-------------------
time mean
2016-08-29T18:00:00Z 25.25
2016-08-29T19:00:00Z 62.75
© 2017 InfluxData. All rights reserved.28
Continuous Queries
¨ Issue 3: Configuring the CQ schedule
Some Common Issues
SELECT MEAN("sunflowers")
FROM "flower_orders"
WHERE time >= '2016-08-29T18:15:00Z' AND time <=
'2016-08-29T19:45:00Z' GROUP BY time(1h,15m)
name: flower_orders
-------------------
time sunflowers
2016-08-29T18:00:00Z 34
2016-08-29T18:15:00Z 28
2016-08-29T18:30:00Z 19
2016-08-29T18:45:00Z 20
2016-08-29T19:00:00Z 56
2016-08-29T19:15:00Z 76
2016-08-29T19:30:00Z 29
2016-08-29T19:45:00Z 90
2016-08-29T20:00:00Z 70
name: flower_orders
-------------------
time mean
2016-08-29T18:15:00Z 30.75
2016-08-29T19:15:00Z 65
© 2017 InfluxData. All rights reserved.29
Retention Policies
¨ Issue 1: Writing to Retention Policies
Some Common Issues
HTTP API
$ curl -i -XPOST "http://localhost:8086/write?db=telegraf&rp=four_week" --data-binary
'mymeas,mytag=1 myfield=90'
CLI Option 1
> USE "telegraf"."four_week"
Using database telegraf
Using retention policy four_week
CLI Option 2
> INSERT INTO "four_week" mymeas,mytag=1 myfield=90
© 2017 InfluxData. All rights reserved.30
Retention Policies
¨ Issue 1: Writing to Retention Policies
Some Common Issues
HTTP API
$ curl -G "http://localhost:8086/query?db=telegraf" --data-urlencode "q=SELECT * FROM
telegraf.four_week.ave_cpu"
CLI Option 1
> USE "telegraf"."four_week"
Using database telegraf
Using retention policy four_week
CLI Option 2
> SELECT * FROM "telegraf"."four_week"."ave_cpu"
© 2017 InfluxData. All rights reserved.31 © 2017 InfluxData. All rights reserved.31
Using Kapacitor for Downsampling
© 2017 InfluxData. All rights reserved.32
¨ Implement if CQs are using too many host resources
¨ Kapacitor accepts either stream or batch tasks
¨ Writes data back into InfluxDB
¨
Downsample using Kapacitor
Offload computation to a separate host
© 2017 InfluxData. All rights reserved.33
Example
// batch_cpu.tick
batch
|query('''
SELECT mean("usage_user") AS usage_user
FROM "telegraf"."autogen"."cpu"
''')
.period(5m)
.every(5m)
|influxDBOut()
.database('telegraf')
.retenionPolicy('5m')
.tag('source', 'kapacitor')
● Downsample the data into 5m windows
● Store that data back into a the 5m retention
policy in the telegraf database
© 2017 InfluxData. All rights reserved.34
• https://docs.influxdata.com/influxd
b/latest/guides/downsampling_an
d_retention/
• community.influxdata.com
OtherKapacitor & TelegrafCQs & RPs
• https://docs.influxdata.com/influxd
b/latest/query_language/continuou
s_queries/
• https://docs.influxdata.com/influxd
b/latest/query_language/database
_management/#create-retention-p
olicies-with-create-retention-policy
• https://docs.influxdata.com/kap
acitor/latest/examples/continuo
us_queries/
• https://docs.influxdata.com/tele
graf/latest/concepts/aggregator
_processor_plugins/
Thank You
community.influxdata.com

More Related Content

PDF
Virtual training Intro to InfluxDB & Telegraf
PDF
InfluxDB & Kubernetes
PDF
Write your own telegraf plugin
PDF
Time Series Data with InfluxDB
PDF
Advanced kapacitor
PDF
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
PDF
OPTIMIZING THE TICK STACK
PDF
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
Virtual training Intro to InfluxDB & Telegraf
InfluxDB & Kubernetes
Write your own telegraf plugin
Time Series Data with InfluxDB
Advanced kapacitor
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
OPTIMIZING THE TICK STACK
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData

What's hot (20)

PDF
The Monitoring Playground
PDF
Inside the InfluxDB storage engine
PDF
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
PPTX
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
PDF
Setting up InfluxData for IoT
PDF
INFLUXQL & TICKSCRIPT
PDF
Virtual training Intro to Kapacitor
KEY
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
PDF
Monitoring InfluxEnterprise
PDF
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
PDF
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
PDF
WRITING QUERIES (INFLUXQL AND TICK)
PDF
Time Series Database and Tick Stack
PDF
Graph Everything
PPTX
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
PPTX
Kapacitor - Real Time Data Processing Engine
PPTX
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
PDF
How to Build a Telegraf Plugin by Noah Crowley
PDF
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
PDF
InfluxData Platform Future and Vision
The Monitoring Playground
Inside the InfluxDB storage engine
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
Setting up InfluxData for IoT
INFLUXQL & TICKSCRIPT
Virtual training Intro to Kapacitor
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Monitoring InfluxEnterprise
Finding OOMS in Legacy Systems with the Syslog Telegraf Plugin
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
WRITING QUERIES (INFLUXQL AND TICK)
Time Series Database and Tick Stack
Graph Everything
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
Kapacitor - Real Time Data Processing Engine
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
How to Build a Telegraf Plugin by Noah Crowley
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
InfluxData Platform Future and Vision
Ad

Similar to Downsampling your data October 2017 (20)

PPTX
Splunk Ninjas: New Features and Search Dojo
PDF
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
PDF
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
PDF
Cloud native development without the toil
PDF
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
PDF
DOES14: Scott Prugh, CSG - DevOps and Lean in Legacy Environments
PDF
DOWNSAMPLING DATA
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
PDF
Virtual training Intro to the Tick stack and InfluxEnterprise
PDF
Avoiding disaster recovery disasters
PDF
Avoiding disaster recovery disasters
PDF
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
PDF
Решение Cisco Collaboration Edge
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
PDF
Advanced Cassandra
PPTX
Citrix group policy troubleshooting for xen app and xendesktop
PPTX
CS_10_DR_CFD
PPTX
Policy as Code: IT Governance With HashiCorp Sentinel
PDF
Performance modeling and simulation for accumulo applications
PDF
Case Study: Datalink—Manage IT monitoring the MSP way
Splunk Ninjas: New Features and Search Dojo
JAX London 2021: Jumpstart Your Cloud Native Development: An Overview of Prac...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Cloud native development without the toil
GOTOpia 2/2021 "Cloud Native Development Without the Toil: An Overview of Pra...
DOES14: Scott Prugh, CSG - DevOps and Lean in Legacy Environments
DOWNSAMPLING DATA
Reducing large S3 API costs using Alluxio at Datasapiens
Virtual training Intro to the Tick stack and InfluxEnterprise
Avoiding disaster recovery disasters
Avoiding disaster recovery disasters
Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics
Решение Cisco Collaboration Edge
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Advanced Cassandra
Citrix group policy troubleshooting for xen app and xendesktop
CS_10_DR_CFD
Policy as Code: IT Governance With HashiCorp Sentinel
Performance modeling and simulation for accumulo applications
Case Study: Datalink—Manage IT monitoring the MSP way
Ad

More from InfluxData (20)

PPTX
Announcing InfluxDB Clustered
PDF
Best Practices for Leveraging the Apache Arrow Ecosystem
PDF
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
PDF
Power Your Predictive Analytics with InfluxDB
PDF
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
PDF
Build an Edge-to-Cloud Solution with the MING Stack
PDF
Meet the Founders: An Open Discussion About Rewriting Using Rust
PDF
Introducing InfluxDB Cloud Dedicated
PDF
Gain Better Observability with OpenTelemetry and InfluxDB
PPTX
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
PDF
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
PPTX
Introducing InfluxDB’s New Time Series Database Storage Engine
PDF
Start Automating InfluxDB Deployments at the Edge with balena
PDF
Understanding InfluxDB’s New Storage Engine
PDF
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
PPTX
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
PDF
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Announcing InfluxDB Clustered
Best Practices for Leveraging the Apache Arrow Ecosystem
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
Power Your Predictive Analytics with InfluxDB
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
Build an Edge-to-Cloud Solution with the MING Stack
Meet the Founders: An Open Discussion About Rewriting Using Rust
Introducing InfluxDB Cloud Dedicated
Gain Better Observability with OpenTelemetry and InfluxDB
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
Introducing InfluxDB’s New Time Series Database Storage Engine
Start Automating InfluxDB Deployments at the Edge with balena
Understanding InfluxDB’s New Storage Engine
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022

Recently uploaded (20)

PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
Digital Literacy And Online Safety on internet
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PPTX
artificial intelligence overview of it and more
PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
PPTX
QR Codes Qr codecodecodecodecocodedecodecode
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PDF
The Internet -By the Numbers, Sri Lanka Edition
DOCX
Unit-3 cyber security network security of internet system
PPTX
Internet___Basics___Styled_ presentation
PPTX
Introduction to Information and Communication Technology
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
Cloud-Scale Log Monitoring _ Datadog.pdf
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Digital Literacy And Online Safety on internet
PptxGenJS_Demo_Chart_20250317130215833.pptx
RPKI Status Update, presented by Makito Lay at IDNOG 10
artificial intelligence overview of it and more
Introuction about ICD -10 and ICD-11 PPT.pptx
QR Codes Qr codecodecodecodecocodedecodecode
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
The Internet -By the Numbers, Sri Lanka Edition
Unit-3 cyber security network security of internet system
Internet___Basics___Styled_ presentation
Introduction to Information and Communication Technology
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Design_with_Watersergyerge45hrbgre4top (1).ppt
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Module 1 - Cyber Law and Ethics 101.pptx

Downsampling your data October 2017

  • 1. Michael DeSa - Software Engineer Downsampling your data
  • 2. © 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2 ✓ Continuous Queries (CQs) ✓ Create custom CQs ✓ Retention Policies (RPs) ✓ Create custom RPs ✓ Combine CQs and RPs to manage downsampling and data retention ✓ Common issues with CQs and RPs Agenda
  • 3. © 2017 InfluxData. All rights reserved.3 Continuous Queries ¨ What are Continuous Queries (CQs)? ¨ InfluxQL queries that run automatically and periodically on real-time data and store query results in a specified measurement ¨ Why would I use CQs? ¨ Automatic downsampling and pre-calculating expensive queries An Overview
  • 4. © 2017 InfluxData. All rights reserved.4 Continuous Queries The Basic Syntax CREATE CONTINUOUS QUERY <cq_name> ON <database_name> BEGIN SELECT <function>(<stuff>) INTO <destination_measurement> FROM <source_measurement> GROUP BY time(<interval>) END 1 2 3
  • 5. © 2017 InfluxData. All rights reserved.5 Continuous Queries The Basic Syntax: In Practice CREATE CONTINUOUS QUERY "average_cpu_usage" ON "telegraf" BEGIN SELECT MEAN("usage_idle") INTO "ave_cpu" FROM "cpu" GROUP BY time( 1m) END > SELECT "usage_idle" FROM "cpu" name: cpu time usage_idle ---- ---------- 2017-02-07T23:14:00Z 99.599599599 2017-02-07T23:14:10Z 99.699398797 2017-02-07T23:14:20Z 99.699699699 2017-02-07T23:14:30Z 99.600000000 2017-02-07T23:14:40Z 99.500000000 2017-02-07T23:14:50Z 99.799599198 2017-02-07T23:15:00Z 99.600000000 2017-02-07T23:15:10Z 99.400599400 2017-02-07T23:15:20Z 99.600000000 2017-02-07T23:15:30Z 99.699699699 2017-02-07T23:15:40Z 99.500000000 2017-02-07T23:15:50Z 99.699699699 > SELECT "mean" FROM "ave_cpu" name: ave_cpu time mean ---- ---- 2017-02-07T23:14:00Z 99.649716215 2017-02-07T23:15:00Z 99.583333133
  • 6. © 2017 InfluxData. All rights reserved.6 Continuous Queries The Basic Syntax: In Practice Feb 07 23:15:00 executing continuous query average_cpu_usage SELECT […] WHERE time >= '2017-02-07T23:14:00Z' AND time < '2016-08-29T23:15:00Z' GROUP BY time(1h) Feb 07 23:16:00 executing continuous query average_cpu_usage SELECT […] WHERE time >= '2017-02-07T23:15:00Z' AND time < '2016-08-29T23:16:00Z' GROUP BY time(1h) The basic CQ: ● Executes at the same interval as the GROUP BY time() interval. ● Executes a single query that covers the time range between now() and now() minus the GROUP BY time() interval.
  • 7. © 2017 InfluxData. All rights reserved.7 Continuous Queries The Advanced Syntax CREATE CONTINUOUS QUERY <cq_name> ON <database_name> RESAMPLE EVERY <interval> FOR <interval> BEGIN SELECT <function>(<stuff>) INTO <destination_measurement> FROM <source_measurement> GROUP BY time(<interval>) END 1 2 3 4
  • 8. © 2017 InfluxData. All rights reserved.8 Continuous Queries The Advanced Syntax: In Practice CREATE CONTINUOUS QUERY "average_cpu_usage" ON "telegraf" RESAMPLE EVERY 2m FOR 3m BEGIN SELECT MEAN("usage_idle") INTO "ave_cpu" FROM “cpu" GROUP BY time(1m) END > SELECT "usage_idle" FROM "cpu" name: cpu time usage_idle ---- ---------- 2017-02-07T23:14:00Z 99.599599599 2017-02-07T23:14:10Z 99.699398797 2017-02-07T23:14:20Z 99.699699699 2017-02-07T23:14:30Z 99.600000000 2017-02-07T23:14:40Z 99.500000000 2017-02-07T23:14:50Z 99.799599198 2017-02-07T23:15:00Z 99.600000000 2017-02-07T23:15:10Z 99.400599400 2017-02-07T23:15:20Z 99.600000000 2017-02-07T23:15:30Z 99.699699699 2017-02-07T23:15:40Z 99.500000000 2017-02-07T23:15:50Z 99.699699699 > SELECT "mean" FROM "ave_cpu" name: ave_cpu time mean ---- ---- 2017-02-07T23:14:00Z 99.649716215 2017-02-07T23:15:00Z 99.583333133
  • 9. © 2017 InfluxData. All rights reserved.9 Continuous Queries The Advanced Syntax: In Practice Feb 07 23:16:00 executing continuous query average_cpu_usage SELECT […] WHERE time >= '2017-02-07T23:13:00Z' AND time < '2016-08-29T23:16:00Z' GROUP BY time(1h) Feb 07 23:18:00 executing continuous query average_cpu_usage SELECT […] WHERE time >= '2017-02-07T23:15:00Z' AND time < '2016-08-29T23:18:00Z' GROUP BY time(1h) In the advanced syntax: The EVERY interval determines how often InfluxDB executes the CQ. The FOR interval determines the time range over which the CQ runs queries.
  • 10. © 2017 InfluxData. All rights reserved.10 Retention Policies (RPs) ¨ What What are RPs? ¨ The part of InfluxDB’s data structure that describe for how long InfluxDB keeps data. ¨ Why would I use RPs? ¨ Expire unneeded data. ¨ An Overview
  • 11. © 2017 InfluxData. All rights reserved.11 Retention Policies The Syntax CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT] DURATION units: u microseconds ms milliseconds s seconds m minutes h hours d days w weeks INF REPLICATION settings: Single instance: no effect Clustering: RF <= data nodes SHARD DURATION settings: https://docs.inf luxdata.com/infl uxdb/latest/conc epts/schema_and_ data_layout/#sha rd-group-duratio n-management
  • 12. © 2017 InfluxData. All rights reserved.12 Retention Policies The Syntax: In Practice > SHOW RETENTION POLICIES ON "telegraf" name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 true other 24h0m0s 1h0m0s 1 false CREATE RETENTION POLICY "other" ON "telegraf" DURATION 1d REPLICATION 1
  • 13. © 2017 InfluxData. All rights reserved.13 Retention Policies The Syntax: In Practice Rebalance a cluster: https://docs.influxdata.com/enterprise/latest/guides/r ebalance/ ALTER RETENTION POLICY <retention_policy_name> ON <database_name> [DURATION <duration> | REPLICATION <n> | SHARD DURATION <duration> | DEFAULT]
  • 14. © 2017 InfluxData. All rights reserved.14 Retention Policies The Syntax: In Practice > SHOW RETENTION POLICIES ON "telegraf" name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 true other 48h0m0s 1h0m0s 1 false ALTER RETENTION POLICY "other" ON "telegraf" DURATION 2d
  • 15. © 2017 InfluxData. All rights reserved.15 Continuous Queries & Retention Policies: A Case Study ¨ Downsample 10-second resolution Telegraf data to 5-minute resolution data ¨ Store the 10-second resolution data for one week ¨ Store the 5-minute resolution data for four weeks ¨ What you need a. A working InfluxDB instance
  • 16. © 2017 InfluxData. All rights reserved.16 Continuous Queries & Retention Policies ¨ Step 1: Create a database A Case Study > SHOW DATABASES name: databases name ---- telegraf _internal > CREATE DATABASE "telegraf" >
  • 17. © 2017 InfluxData. All rights reserved.17 Continuous Queries & Retention Policies ¨ Step 2: Create a one-week retention policy A Case Study > SHOW RETENTION POLICIES ON "telegraf" name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 false one_week 168h0m0s 1h0m0s 1 true > CREATE RETENTION POLICY "one_week" ON "telegraf" DURATION 1w REPLICATION 1 DEFAULT
  • 18. © 2017 InfluxData. All rights reserved.18 Continuous Queries & Retention Policies ¨ Step 3: Create a four-week retention policy A Case Study > SHOW RETENTION POLICIES ON "telegraf" name duration shardGroupDuration replicaN default ---- -------- ------------------ -------- ------- autogen 0s 168h0m0s 1 false one_week 168h0m0s 24h0m0s 1 true four_week 672h0m0s 24h0m0s 1 false > CREATE RETENTION POLICY "four_week" ON "telegraf" DURATION 4w REPLICATION 1 >
  • 19. © 2017 InfluxData. All rights reserved.19 Continuous Queries & Retention Policies ¨ Step 4: Create a continuous query A Case Study Fully-qualify a measurement: "<database>"."<retention_policy>"."<measurement>" > CREATE CONTINUOUS QUERY "ave_usage" ON "telegraf" BEGIN SELECT MEAN("usage_idle") INTO "telegraf"."four_week"."ave_cpu" FROM "telegraf"."one_week"."cpu" WHERE "cpu" = 'cpu-total' GROUP BY time(5m) END
  • 20. © 2017 InfluxData. All rights reserved.20 Continuous Queries & Retention Policies ¨ Step 5: Install Telegraf A Case Study $ service telegraf start $ wget https://dl.influxdata.com/telegraf/releases/telegraf_1 .2.1_amd64.deb $ sudo dpkg -i telegraf_1.2.1_amd64.deb
  • 21. © 2017 InfluxData. All rights reserved.21 Continuous Queries & Retention Policies ¨ Step 6: Wait a bit… (about five minutes) A Case Study
  • 22. © 2017 InfluxData. All rights reserved.22 Continuous Queries & Retention Policies ¨ Step 7: Confirm your downsample/data expiration A CaseStudy > SELECT "usage_idle" FROM "telegraf"."one_week"."cpu" LIMIT 3 name: cpu time usage_idle ---- ---------- 2017-02-08T17:01:20Z 99.49949949949996 2017-02-08T17:01:20Z 99.49949949949996 2017-02-08T17:01:30Z 99.59879638916786 > SELECT "mean" FROM "telegraf"."four_week"."ave_cpu" LIMIT 3 name: ave_cpu time mean ---- ---- 2017-02-08T17:30:00Z 99.20285887894339 2017-02-08T17:35:00Z 99.31273835165075 2017-02-08T17:40:00Z 99.29954474534229 > USE "telegraf" Using database telegraf > SHOW MEASUREMENTS name: measurements name ---- ave_cpu cpu disk diskio kernel mem processes swap system
  • 23. © 2017 InfluxData. All rights reserved.23 Continuous Queries ¨ Issue 1: Working with historical data Some Common Issues > SELECT MAX("water_level") INTO "maximums" FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:18:00Z' GROUP BY time(12m) name: result time written ---- ------- 1970-01-01T00:00:00Z 2 > SELECT * FROM "maximums" name: maximums time max ---- --- 2015-08-18T00:00:00Z 8.12 2015-08-18T00:12:00Z 7.887 > SELECT "water_level","location" FROM "h2o_feet" LIMIT 8 name: h2o_feet time water_level location ---- ----------- -------- 2015-08-18T00:00:00Z 8.12 coyote_creek 2015-08-18T00:00:00Z 2.064 santa_monica 2015-08-18T00:06:00Z 8.005 coyote_creek 2015-08-18T00:06:00Z 2.116 santa_monica 2015-08-18T00:12:00Z 7.887 coyote_creek 2015-08-18T00:12:00Z 2.028 santa_monica 2015-08-18T00:18:00Z 7.762 coyote_creek 2015-08-18T00:18:00Z 2.126 santa_monica
  • 24. © 2017 InfluxData. All rights reserved.24 Continuous Queries ¨ Issue 2: Missing data in CQ results Some Common Issues > SELECT * FROM "french_bulldogs" name: french_bulldogs --------------------- time color name 2016-06-26T20:17:59Z grey rumpelstiltskin 2016-06-26T20:17:59Z peach princess > SELECT "name","color" INTO "dogs" FROM "french_bulldogs" name: result ------------ time written 1970-01-01T00:00:00Z 2 > SELECT * FROM "dogs" name: dogs ---------- time color name 2016-06-26T20:17:59Z peach princess 1 2 3
  • 25. © 2017 InfluxData. All rights reserved.25 Continuous Queries ¨ Issue 2: Missing data in CQ results Some Common Issues > SELECT * FROM "french_bulldogs" name: french_bulldogs --------------------- time color name 2016-06-26T20:17:59Z grey rumpelstiltskin 2016-06-26T20:17:59Z peach princess > SELECT "name" INTO "dogs" FROM "french_bulldogs" GROUP BY "color" name: result ------------ time written 1970-01-01T00:00:00Z 2 > SELECT * FROM "dogs" name: dogs ---------- time color name 2016-06-26T20:17:59Z grey rumpelstiltskin 2016-06-26T20:17:59Z peach princess 1 2 3
  • 26. © 2017 InfluxData. All rights reserved.26 Continuous Queries ¨ Issue 3: Configuring the CQ schedule Some Common Issues SELECT MEAN("sunflowers") FROM "flower_orders" WHERE time >= '2016-08-29T18:00:00Z' AND time <= '2016-08-29T19:45:00Z' GROUP BY time(1h) name: flower_orders ------------------- time sunflowers 2016-08-29T18:00:00Z 34 2016-08-29T18:15:00Z 28 2016-08-29T18:30:00Z 19 2016-08-29T18:45:00Z 20 2016-08-29T19:00:00Z 56 2016-08-29T19:15:00Z 76 2016-08-29T19:30:00Z 29 2016-08-29T19:45:00Z 90 2016-08-29T20:00:00Z 70 name: flower_orders ------------------- time mean 2016-08-29T18:00:00Z 25.25 2016-08-29T19:00:00Z 62.75
  • 27. © 2017 InfluxData. All rights reserved.27 Continuous Queries ¨ Issue 3: Configuring the CQ schedule Some Common Issues SELECT MEAN("sunflowers") FROM "flower_orders" WHERE time >= '2016-08-29T18:15:00Z' AND time <= '2016-08-29T19:45:00Z' GROUP BY time(1h) name: flower_orders ------------------- time sunflowers 2016-08-29T18:00:00Z 34 2016-08-29T18:15:00Z 28 2016-08-29T18:30:00Z 19 2016-08-29T18:45:00Z 20 2016-08-29T19:00:00Z 56 2016-08-29T19:15:00Z 76 2016-08-29T19:30:00Z 29 2016-08-29T19:45:00Z 90 2016-08-29T20:00:00Z 70 name: flower_orders ------------------- time mean 2016-08-29T18:00:00Z 25.25 2016-08-29T19:00:00Z 62.75
  • 28. © 2017 InfluxData. All rights reserved.28 Continuous Queries ¨ Issue 3: Configuring the CQ schedule Some Common Issues SELECT MEAN("sunflowers") FROM "flower_orders" WHERE time >= '2016-08-29T18:15:00Z' AND time <= '2016-08-29T19:45:00Z' GROUP BY time(1h,15m) name: flower_orders ------------------- time sunflowers 2016-08-29T18:00:00Z 34 2016-08-29T18:15:00Z 28 2016-08-29T18:30:00Z 19 2016-08-29T18:45:00Z 20 2016-08-29T19:00:00Z 56 2016-08-29T19:15:00Z 76 2016-08-29T19:30:00Z 29 2016-08-29T19:45:00Z 90 2016-08-29T20:00:00Z 70 name: flower_orders ------------------- time mean 2016-08-29T18:15:00Z 30.75 2016-08-29T19:15:00Z 65
  • 29. © 2017 InfluxData. All rights reserved.29 Retention Policies ¨ Issue 1: Writing to Retention Policies Some Common Issues HTTP API $ curl -i -XPOST "http://localhost:8086/write?db=telegraf&rp=four_week" --data-binary 'mymeas,mytag=1 myfield=90' CLI Option 1 > USE "telegraf"."four_week" Using database telegraf Using retention policy four_week CLI Option 2 > INSERT INTO "four_week" mymeas,mytag=1 myfield=90
  • 30. © 2017 InfluxData. All rights reserved.30 Retention Policies ¨ Issue 1: Writing to Retention Policies Some Common Issues HTTP API $ curl -G "http://localhost:8086/query?db=telegraf" --data-urlencode "q=SELECT * FROM telegraf.four_week.ave_cpu" CLI Option 1 > USE "telegraf"."four_week" Using database telegraf Using retention policy four_week CLI Option 2 > SELECT * FROM "telegraf"."four_week"."ave_cpu"
  • 31. © 2017 InfluxData. All rights reserved.31 © 2017 InfluxData. All rights reserved.31 Using Kapacitor for Downsampling
  • 32. © 2017 InfluxData. All rights reserved.32 ¨ Implement if CQs are using too many host resources ¨ Kapacitor accepts either stream or batch tasks ¨ Writes data back into InfluxDB ¨ Downsample using Kapacitor Offload computation to a separate host
  • 33. © 2017 InfluxData. All rights reserved.33 Example // batch_cpu.tick batch |query(''' SELECT mean("usage_user") AS usage_user FROM "telegraf"."autogen"."cpu" ''') .period(5m) .every(5m) |influxDBOut() .database('telegraf') .retenionPolicy('5m') .tag('source', 'kapacitor') ● Downsample the data into 5m windows ● Store that data back into a the 5m retention policy in the telegraf database
  • 34. © 2017 InfluxData. All rights reserved.34 • https://docs.influxdata.com/influxd b/latest/guides/downsampling_an d_retention/ • community.influxdata.com OtherKapacitor & TelegrafCQs & RPs • https://docs.influxdata.com/influxd b/latest/query_language/continuou s_queries/ • https://docs.influxdata.com/influxd b/latest/query_language/database _management/#create-retention-p olicies-with-create-retention-policy • https://docs.influxdata.com/kap acitor/latest/examples/continuo us_queries/ • https://docs.influxdata.com/tele graf/latest/concepts/aggregator _processor_plugins/