SlideShare a Scribd company logo
Thom Crowe
Community Manager
InfluxData
Why You Should NOT Be
Using an RDBMS for Time-
Stamped Data
© 2019 InfluxData. All rights reserved.2
Why Time-
Series for
Monitoring,
Metrics, Real-
Time Analytics
and IoT/Sensor
Data
● What is time series data
● Differences between Time Series
Databases (TSDBs)
● InfluxDB Data model
© 2019 InfluxData. All rights reserved.3
What is time series data?
© 2019 InfluxData. All rights reserved.4
Time series data is
made from the same source over a time interval.
Plot the points on a
graph and one of
your axes would
always be time.
a sequence of data points,
typically consisting of successive measurements
© 2019 InfluxData. All rights reserved.5
Time Series data is...
© 2019 InfluxData. All rights reserved.6
Not Time Series data...
© 2019 InfluxData. All rights reserved.7
Time Series data...
© 2019 InfluxData. All rights reserved.8
Not Time Series data...
© 2019 InfluxData. All rights reserved.9
Time Series data...
© 2019 InfluxData. All rights reserved.10
Not Time Series data...
© 2019 InfluxData. All rights reserved.11
Time Series data...
© 2019 InfluxData. All rights reserved.12
Time Series data...
© 2019 InfluxData. All rights reserved.13
Not Time Series data...
Studen
t
Grades
Time spent studying
© 2019 InfluxData. All rights reserved.14 © 2019 InfluxData. All rights reserved.14
Regular vs Irregular Time Series
Metrics (Regular)
Events (Irregular)
Measurements
gathered at
regular time
intervals
Measurements
gathered at irregular
time intervals
© 2019 InfluxData. All rights reserved.15 © 2019 InfluxData. All rights reserved.15
Regular Time Series Irregular Time Series
Measurements
gathered at regular
time intervals
Measurements
gathered at irregular
time intervals
Metrics Events
© 2019 InfluxData. All rights reserved.16 © 2019 InfluxData. All rights reserved.16
Summarization of Events
Events become regular time intervals, for example
Summarizing the average
trade price of Apple stock
every 10 minutes over the
course of a day
Summarizing the average
response time for
requests in an application
over 1 minute intervals
© 2019 InfluxData. All rights reserved.17
What is a time series database?
© 2019 InfluxData. All rights reserved.18
Characteristics
of the Data
• All Time-stamped data
• Generated in regular (Metric) and
irregular (Event) time periods
• Huge volumes of data
• Real-time and time sensitive
© 2019 InfluxData. All rights reserved.19
Time Series
databases are
optimized for
collecting, storing,
retrieving &
processing of Time
Series data
Compare this to
• Document databases
Optimized for storing JSON documents
• Search databases
Optimized for full-text searches
• Traditional relational
Databases optimized for the tabular
storage of related data in rows & columns
© 2019 InfluxData. All rights reserved.20 © 2019 InfluxData. All rights reserved.20
Some Databases used for Time Series
Other DB
Types
Time Series
DBs
© 2019 InfluxData. All rights reserved.21
DB-Engines Results
Source: DB-Engines Source: DB-Engines
~138% increase in DB-Engines score over 24 months
Trend of the Last 24 Months
CATEGORY/BRAND Q1
© 2019 InfluxData. All rights reserved.22
Time-series use cases
© 2019 InfluxData. All rights reserved.23 © 2019 InfluxData. All rights reserved.23
Primary Use Cases
Custom monitoring
solutions to track
servers, VMs, applications,
users or events
Industrial settings:
factories, oil & gas,
agriculture, smart roads &
infrastructure
Consumer: wearables,
consumer devices & trackers
Apps that instrument
business, social or
development metrics in
real-time
IoT DevOps Real-Time
Analytics
© 2019 InfluxData. All rights reserved.24 © 2019 InfluxData. All rights reserved.24
Platform Strategy:
Be The Platform of Choice for All Metrics and Event Workloads
Common Metrics
and Events
Platform
Application,
Custom Logs &
Traces
Business
Metrics
Infrastructure
& Application
Metrics
IoT
Sensor
Events
© 2019 InfluxData. All rights reserved.25 © 2019 InfluxData. All rights reserved.25
InfluxDB Platform Features
INSTRUMENT OBSERVE
AUTOMATE LEARN
✓ Quickly ingest data from
everywhere
✓ Efficiently store (Compress)
the data at scale
✓ Support real-time query,
analysis and visualization of
large data sets
✓ Provide time-based functions
for “change over time”
analysis and control
✓ Provide automation and
control functions
✓ Evict and down-sample data
✓ Facilitate machine learning
and anomaly detection
algorithms
✓ Provide streaming analytics
for data in motion
METRICS
EVENTS
© 2019 InfluxData. All rights reserved.26 © 2019 InfluxData. All rights reserved.26
InfluxDB Products and Offerings
© 2019 InfluxData. All rights reserved.27 © 2019 InfluxData. All rights reserved.27
© 2019 InfluxData. All rights reserved.28
Why Choose
InfluxDB
• Easy to get started with
• Familiar query syntax
• No external dependencies
• Allows for regular and irregular time
series
• Horizontally scalable
• Member of a cohesive time series
platform
© 2019 InfluxData. All rights reserved.29
InfluxDB Data Model
© 2019 InfluxData. All rights reserved.30
A typical time series graph
© 2019 InfluxData. All rights reserved.31
The Label
We call this the measurement
© 2019 InfluxData. All rights reserved.32
The Legend (metadata)
We call these tags. Tags are indexed.
© 2019 InfluxData. All rights reserved.33
Collection of all tags
We call this the tagset.
ticker=A,market=NASDAQ
ticker=AA,market=NYSE
ticker=AAPL,market=NASDAQ
© 2019 InfluxData. All rights reserved.34
Y-Axis Values
We call these fields. Note that the values that the field stores can be
floats, ints, strings, or bools.
© 2019 InfluxData. All rights reserved.35
The Collection of Fields
We call this the fieldset. Note that in this case, there's only one
field. (There could be many.)
price=177.03
price=32.10
price=35.52
© 2019 InfluxData. All rights reserved.36
X-Axis Value
We call this the timestamp.
© 2019 InfluxData. All rights reserved.37
How do we represent points
textually?
© 2019 InfluxData. All rights reserved.38
Points in InfluxDB look like…
stock_price,ticker=A,market=NASDAQ price=177.03 1445299200000000000
stock_price,ticker=AA,market=NYSE price=32.10 1445299200000000000
stock_price,ticker=AAPL,market=NASDAQ price=45 1445299200000000000a
© 2019 InfluxData. All rights reserved.39
The Line protocol
measurement,tagset fieldset timestamp
stock_price,ticker=A,market=NASDAQ price=177.03
14452992000000000
© 2018 InfluxData. All rights reserved.40
Concepts: Time Series Database Schema
Data Ingestion Format
• Points are written to InfluxDB using the Line Protocol, which
follows the following format:
<measurement>[,<tag-key>=<tag-value>] [<field-key>=<field-value>]
[unix-nano-timestamp]
cpu_load,hostname=server02,az=us_west temp=24.5,volts=7
1234567890000000
Reference: https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/
Measurement Tag Set Field Set
Timestamp
© 2019 InfluxData. All rights reserved.41
A Series in InfluxDB
measurement + tagset = the series as a whole
measurement + tagset + timestamp = single point
© 2019 InfluxData. All rights reserved.42
Examples of points in Line Protocol
cpu,host=server1 value=100 1445299200000000000
temperature,zipcode=94107,country=usa value=75,humidity=10
1445299200000000000
response_time,method=GET,precision=ms value=12i
1445299200000000000
© 2019 InfluxData. All rights reserved.43
Use Cases
© 2018 InfluxData. All rights reserved.44 © 2018 InfluxData. All rights reserved.44
“We analyze over 70,000 hours of
data every night, half a billion data
points, to produce alerts for our
technicians. Having this real-time
data in the cloud makes it
possible to identify trends, usage
patterns & even detect problems
before they exist!”
David McLean
Lead Developer at BBOXX
© 2019 InfluxData. All rights reserved.45 © 2019 InfluxData. All rights reserved.45
© 2019 InfluxData. All rights reserved.46 © 2019 InfluxData. All rights reserved.46
© 2018 InfluxData. All rights reserved.47 © 2018 InfluxData. All rights reserved.47
Wayfair is using InfluxData to
monitor system metrics & events
across datacenters, and Real
User monitoring (RUM) to
understand user experience on
their e-commerce site. The goal
is to marry these with business
process events and provide
better business insight and
competitive advantage
© 2019 InfluxData. All rights reserved.48 © 2019 InfluxData. All rights reserved.48
• InfluxDB is a major component in Wayfair’s Cyber 5 Holiday Weekend monitoring
and alerting systems
– Data center metrics: 100s of apps sending metrics from multiple data
centers
– Real User Monitoring (Client side monitoring)
• Understand user experience - deploy 100s of code changes to the app, each
change has the potential to impact performance for better or worse
– Daily, 20 million RUM measurements across 8 stores, hundreds of page
types, & thousands of device types (phones, tablets, laptops & PCs)
• Wayfair is working strategically with InfluxData
– Ensure the Wayfair implementation is scalable, robust, and in line with the
future direction of InfluxDB
– Providing case studies to help drive InfluxDB enterprise feature requests
Measuring performance of a high-traffic
e-commerce site
© 2019 InfluxData. All rights reserved.49
Architecture
• Multi-layer pipeline with Telegraf
– Receive raw InfluxDB line protocol from
applications via UDP and forward it on to Kafka
– Consumes metrics from the Kafka buffer and
writes to InfluxDB
• Connects multiple data centers by mirroring Kafka
topics to shuttle metrics (vs through cross-datacenter
db replication)
• Ability to use UDP (fast, non-blocking) and TCP (more
transactionally robust)
• Ability to inject various processing hooks into the data
stream as the business needs evolve
• Easy to write the same data to multiple instances of
InfluxDB
• Multi-day tolerance against a severe network
connectivity incident
© 2019 InfluxData. All rights reserved.50
Graphite limitations
✓ No clustering
✓ Storage infrastructure that is
difficult to manage
✓ No out-of-the-box data pipeline
solution
✓ Graphite maxed out its
performance potential
✓ No shard relocation &
backup/restore
InfluxDB Advantages
✓ Rapid performance improvements &
hardening
✓ Supports true clustering
✓ Written in Go, a language that fully
exploits multi-core environments
✓ Built with an emphasis on efficient &
scalable storage
✓ Active community & an ecosystem
of related tools for building
enterprise-wide installations
✓ Growing customer base from a
diverse set of companies
Why replace Graphite w/ InfluxDB
© 2018 InfluxData. All rights reserved.51 © 2018 InfluxData. All rights reserved.51
InfluxDB is a major component to
providing observability across the
board at Playtech. It is better to
have simpler and faster
monitoring systems with tools for
post hoc analysis and avoid
“magic” systems that try to
automatically detect causality.
Aleksandr Tavgen
Technical Archtiect, Playtech
© 2019 InfluxData. All rights reserved.52 © 2019 InfluxData. All rights reserved.52
• 50+ multi-branded sites, distributed worldwide, multiple products & channels
– Handle financial transactions in strictly regulated markets
– There are also different variations in their back end
• InfluxDB is a major component to providing observability across the board
– Production System level Monitoring
• Low-level Monitoring—CPU, disk, memory usage, networking, garbage
collection cycles, etc. Cannot reflect business logic processes due to
the # of interdependencies between various components and services
• High-level Monitoring—KPIs (user sessions, payments, transaction
amounts, logins, etc.). Possible to indirectly monitor complex system
behavior, especially in the case of distributed systems.
– Organizational level Monitoring
• Map of all product portfolios, clients, sites, & brands tied to internal org
chart (who is responsible for a features, who wrote the code, etc)
Monitoring distributed systems
© 2019 InfluxData. All rights reserved.53
Architecture
• Service-oriented architecture
where its components have very
complex business logic and
states
• Natural way of defining the
boundaries and responsibilities
of each component; easier to
add features independently and
on-the-fly without the fear of
breaking something
• The microservices are executed
on their own Python VMs
• System has an event-driven
design. All communication goes
through the message queue and
the system works in an
asynchronous way
© 2019 InfluxData. All rights reserved.54
SHA limitations
✓ Black box – impossible to
change its settings or tune it
✓ Yields a huge amount of false
positives and false negatives
✓ SHA’s granularity is 15
minutes
✓ Hewlett-Packard’s SHA does
not work as expected - trying
to be a universal instrument
InfluxDB Advantages
✓ Time Series DB for any metric
✓ Widely adopted by the community
✓ Fast and reliable
✓ Comes with great support
Why replace HP Service Health
Analyzer (SHA) w/ InfluxDB
© 2018 InfluxData. All rights reserved.55 © 2018 InfluxData. All rights reserved.55
A&S Energie is a Belgian biomass
plant that supplies electricity by
burning non-recyclable wood.
They replaced their existing
solution with Factry Historian,
powered by open source
InfluxDB. This allowed A&S
Energie to increase use of the
process data to make their staff
more efficient as well as find
failures faster.
© 2019 InfluxData. All rights reserved.56 © 2019 InfluxData. All rights reserved.56
• A&S Energie presented a classic Industrial IoT challenge involving
high-cost, confined industrial equipment requiring a useful life of
many decades.
– Operators wrote down counters on paper (values of certain
measurements needed for reporting on the power plant’s status)
– Based on this paper-reliant reporting, Management reporting was done
with Excel
– The existing Historian was underutilized—its data was not being used to
extract meaning - Historical view only provided 2 weeks’ worth of data
• Factry Data Historian with InfluxDB
– OPC UA Connecter, InfluxDB for storage and Grafana for visualization
Measuring performance of a high-traffic
e-commerce site
© 2019 InfluxData. All rights reserved.57
Architecture
• Datasources
– From several sensor
sources (Siemens,
Rockwell
Automation,
Schneider Electric)
– Uses an OPC UA
collector
• Stores all time-stamped
values in InfluxDB
• All other contextual data
from multiple sources
• Grafana dashboards
provided to management
and operators
© 2019 InfluxData. All rights reserved.58
Existing Data Historian
✓ No easy access to long term
data
✓ Still too many disparate
databases including Excel and
paper
✓ Legacy software – Windows
2000
✓ Not accessible – in a closet –
really!
✓ Not easy to expand on the
data
InfluxDB Advantages
✓ Real-time and historical insights for
everyone, from machine operator to
plant manager
✓ Handling high write loads
✓ Platform-independent - Most
historians are based on proprietary
systems and are Windows-based
✓ Retention policy options - keeps the
data at its original resolution for as
long as needed
Why replace their Data Historian
© 2019 InfluxData. All rights reserved.59
Writing Data into InfluxDB
© 2019 InfluxData. All rights reserved.60
Creating a database
CREATE DATABASE mydb
Last login: Mon Oct 19 10:50:43 on ttys006
~$ influx
Connected to http://localhost:8086 version 0.9
InfluxDB shell 0.9
> create database mydb
>
© 2019 InfluxData. All rights reserved.61 © 2019 InfluxData. All rights reserved.61
Verifying that it was created
SHOW DATABASES
> show databases
name: databases
---------------
name
_internal
mydb
© 2019 InfluxData. All rights reserved.62 © 2019 InfluxData. All rights reserved.62
Using the database we just created
USE mydb
> use mydb
Using database mydb
>
© 2019 InfluxData. All rights reserved.63 © 2019 InfluxData. All rights reserved.63
Inserting data into the database
insert cpu,host=server1,location=us-west value=10
insert cpu,host=server1,location=london value=11
insert cpu,host=server2,location=us-west value=12
© 2019 InfluxData. All rights reserved.64 © 2019 InfluxData. All rights reserved.64
Verifying that the data was written
SELECT * FROM cpu
// PS. Be careful! This query can be very expensive.
SHOW SERIES
SHOW MEASUREMENTS
THANK YOU

More Related Content

PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
PPTX
Free Servers to Build Big Data System on: Bing’s Approach
PDF
Extending the Reach of R to the Enterprise with TERR and Spotfire
PDF
Ibm big data
PPTX
Lessons learned processing 70 billion data points a day using the hybrid cloud
PPTX
Big data analytics and machine intelligence v5.0
PDF
An Introduction to the MapR Converged Data Platform
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
Free Servers to Build Big Data System on: Bing’s Approach
Extending the Reach of R to the Enterprise with TERR and Spotfire
Ibm big data
Lessons learned processing 70 billion data points a day using the hybrid cloud
Big data analytics and machine intelligence v5.0
An Introduction to the MapR Converged Data Platform

What's hot (20)

PPTX
MapR Streams and MapR Converged Data Platform
PPTX
MapR on Azure: Getting Value from Big Data in the Cloud -
PPTX
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
PPTX
Airline reservations and routing: a graph use case
PDF
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
PDF
Zero Data Loss Recovery Appliance a good investment! Konrad Häfeli
PPTX
Highly configurable and extensible data processing framework at PubMatic
PPTX
Houston Energy Data Science Meet up_TIBCO Slides
PDF
Stream Scaling in Pravega
PPTX
Intro to MapReduce
PPTX
3 Benefits of Multi-Temperature Data Management for Data Analytics
PDF
From an experiment to a real production environment
PDF
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
PDF
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
PDF
HiTech Manufacturing Use Cases/Examples
PPTX
Smart Meter Data Analytic using Hadoop
PDF
Big Data Heterogeneous Mixture Learning on Spark
PPTX
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
PDF
The case of vehicle networking financial services accomplished by China Mobile
MapR Streams and MapR Converged Data Platform
MapR on Azure: Getting Value from Big Data in the Cloud -
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Airline reservations and routing: a graph use case
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
Evolving Beyond the Data Lake: A Story of Wind and Rain
Zero Data Loss Recovery Appliance a good investment! Konrad Häfeli
Highly configurable and extensible data processing framework at PubMatic
Houston Energy Data Science Meet up_TIBCO Slides
Stream Scaling in Pravega
Intro to MapReduce
3 Benefits of Multi-Temperature Data Management for Data Analytics
From an experiment to a real production environment
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
HiTech Manufacturing Use Cases/Examples
Smart Meter Data Analytic using Hadoop
Big Data Heterogeneous Mixture Learning on Spark
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
The case of vehicle networking financial services accomplished by China Mobile
Ad

Similar to Why You Should NOT Be Using an RDBMS for Time-stamped Data (20)

PDF
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
PDF
3 reasons to pick a time series platform for monitoring dev ops driven contai...
PDF
Why Open Source Works for DevOps Monitoring
PPTX
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
Announcing InfluxDB Clustered
PDF
InfluxDB Presentation for Aerospace 2025 Conference
PDF
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
PDF
Building Your Data Streams for all the IoT
PPTX
Time Series Databases and Pandas DataFrames
PDF
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
PPTX
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
PDF
Installing your influx enterprise cluster
PDF
Intro to Time Series
PPTX
Informix MQTT Streaming
PPTX
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PDF
Intro to InfluxDB
PDF
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
3 reasons to pick a time series platform for monitoring dev ops driven contai...
Why Open Source Works for DevOps Monitoring
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Announcing InfluxDB Clustered
InfluxDB Presentation for Aerospace 2025 Conference
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Building Your Data Streams for all the IoT
Time Series Databases and Pandas DataFrames
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Installing your influx enterprise cluster
Intro to Time Series
Informix MQTT Streaming
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
How to get Real-Time Value from your IoT Data - Datastax
Intro to InfluxDB
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Ad

More from DevOps.com (20)

PDF
Modernizing on IBM Z Made Easier With Open Source Software
PPTX
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
PPTX
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
PDF
Next Generation Vulnerability Assessment Using Datadog and Snyk
PPTX
Vulnerability Discovery in the Cloud
PDF
2021 Open Source Governance: Top Ten Trends and Predictions
PDF
A New Year’s Ransomware Resolution
PPTX
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
PDF
Don't Panic! Effective Incident Response
PDF
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
PDF
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
PDF
Monitoring Serverless Applications with Datadog
PDF
Deliver your App Anywhere … Publicly or Privately
PPTX
Securing medical apps in the age of covid final
PDF
How to Build a Healthy On-Call Culture
PPTX
The Evolving Role of the Developer in 2021
PDF
Service Mesh: Two Big Words But Do You Need It?
PPTX
Secure Data Sharing in OpenShift Environments
PPTX
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
PDF
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Modernizing on IBM Z Made Easier With Open Source Software
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Next Generation Vulnerability Assessment Using Datadog and Snyk
Vulnerability Discovery in the Cloud
2021 Open Source Governance: Top Ten Trends and Predictions
A New Year’s Ransomware Resolution
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Don't Panic! Effective Incident Response
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Monitoring Serverless Applications with Datadog
Deliver your App Anywhere … Publicly or Privately
Securing medical apps in the age of covid final
How to Build a Healthy On-Call Culture
The Evolving Role of the Developer in 2021
Service Mesh: Two Big Words But Do You Need It?
Secure Data Sharing in OpenShift Environments
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Why You Should NOT Be Using an RDBMS for Time-stamped Data

  • 1. Thom Crowe Community Manager InfluxData Why You Should NOT Be Using an RDBMS for Time- Stamped Data
  • 2. © 2019 InfluxData. All rights reserved.2 Why Time- Series for Monitoring, Metrics, Real- Time Analytics and IoT/Sensor Data ● What is time series data ● Differences between Time Series Databases (TSDBs) ● InfluxDB Data model
  • 3. © 2019 InfluxData. All rights reserved.3 What is time series data?
  • 4. © 2019 InfluxData. All rights reserved.4 Time series data is made from the same source over a time interval. Plot the points on a graph and one of your axes would always be time. a sequence of data points, typically consisting of successive measurements
  • 5. © 2019 InfluxData. All rights reserved.5 Time Series data is...
  • 6. © 2019 InfluxData. All rights reserved.6 Not Time Series data...
  • 7. © 2019 InfluxData. All rights reserved.7 Time Series data...
  • 8. © 2019 InfluxData. All rights reserved.8 Not Time Series data...
  • 9. © 2019 InfluxData. All rights reserved.9 Time Series data...
  • 10. © 2019 InfluxData. All rights reserved.10 Not Time Series data...
  • 11. © 2019 InfluxData. All rights reserved.11 Time Series data...
  • 12. © 2019 InfluxData. All rights reserved.12 Time Series data...
  • 13. © 2019 InfluxData. All rights reserved.13 Not Time Series data... Studen t Grades Time spent studying
  • 14. © 2019 InfluxData. All rights reserved.14 © 2019 InfluxData. All rights reserved.14 Regular vs Irregular Time Series Metrics (Regular) Events (Irregular) Measurements gathered at regular time intervals Measurements gathered at irregular time intervals
  • 15. © 2019 InfluxData. All rights reserved.15 © 2019 InfluxData. All rights reserved.15 Regular Time Series Irregular Time Series Measurements gathered at regular time intervals Measurements gathered at irregular time intervals Metrics Events
  • 16. © 2019 InfluxData. All rights reserved.16 © 2019 InfluxData. All rights reserved.16 Summarization of Events Events become regular time intervals, for example Summarizing the average trade price of Apple stock every 10 minutes over the course of a day Summarizing the average response time for requests in an application over 1 minute intervals
  • 17. © 2019 InfluxData. All rights reserved.17 What is a time series database?
  • 18. © 2019 InfluxData. All rights reserved.18 Characteristics of the Data • All Time-stamped data • Generated in regular (Metric) and irregular (Event) time periods • Huge volumes of data • Real-time and time sensitive
  • 19. © 2019 InfluxData. All rights reserved.19 Time Series databases are optimized for collecting, storing, retrieving & processing of Time Series data Compare this to • Document databases Optimized for storing JSON documents • Search databases Optimized for full-text searches • Traditional relational Databases optimized for the tabular storage of related data in rows & columns
  • 20. © 2019 InfluxData. All rights reserved.20 © 2019 InfluxData. All rights reserved.20 Some Databases used for Time Series Other DB Types Time Series DBs
  • 21. © 2019 InfluxData. All rights reserved.21 DB-Engines Results Source: DB-Engines Source: DB-Engines ~138% increase in DB-Engines score over 24 months Trend of the Last 24 Months CATEGORY/BRAND Q1
  • 22. © 2019 InfluxData. All rights reserved.22 Time-series use cases
  • 23. © 2019 InfluxData. All rights reserved.23 © 2019 InfluxData. All rights reserved.23 Primary Use Cases Custom monitoring solutions to track servers, VMs, applications, users or events Industrial settings: factories, oil & gas, agriculture, smart roads & infrastructure Consumer: wearables, consumer devices & trackers Apps that instrument business, social or development metrics in real-time IoT DevOps Real-Time Analytics
  • 24. © 2019 InfluxData. All rights reserved.24 © 2019 InfluxData. All rights reserved.24 Platform Strategy: Be The Platform of Choice for All Metrics and Event Workloads Common Metrics and Events Platform Application, Custom Logs & Traces Business Metrics Infrastructure & Application Metrics IoT Sensor Events
  • 25. © 2019 InfluxData. All rights reserved.25 © 2019 InfluxData. All rights reserved.25 InfluxDB Platform Features INSTRUMENT OBSERVE AUTOMATE LEARN ✓ Quickly ingest data from everywhere ✓ Efficiently store (Compress) the data at scale ✓ Support real-time query, analysis and visualization of large data sets ✓ Provide time-based functions for “change over time” analysis and control ✓ Provide automation and control functions ✓ Evict and down-sample data ✓ Facilitate machine learning and anomaly detection algorithms ✓ Provide streaming analytics for data in motion METRICS EVENTS
  • 26. © 2019 InfluxData. All rights reserved.26 © 2019 InfluxData. All rights reserved.26 InfluxDB Products and Offerings
  • 27. © 2019 InfluxData. All rights reserved.27 © 2019 InfluxData. All rights reserved.27
  • 28. © 2019 InfluxData. All rights reserved.28 Why Choose InfluxDB • Easy to get started with • Familiar query syntax • No external dependencies • Allows for regular and irregular time series • Horizontally scalable • Member of a cohesive time series platform
  • 29. © 2019 InfluxData. All rights reserved.29 InfluxDB Data Model
  • 30. © 2019 InfluxData. All rights reserved.30 A typical time series graph
  • 31. © 2019 InfluxData. All rights reserved.31 The Label We call this the measurement
  • 32. © 2019 InfluxData. All rights reserved.32 The Legend (metadata) We call these tags. Tags are indexed.
  • 33. © 2019 InfluxData. All rights reserved.33 Collection of all tags We call this the tagset. ticker=A,market=NASDAQ ticker=AA,market=NYSE ticker=AAPL,market=NASDAQ
  • 34. © 2019 InfluxData. All rights reserved.34 Y-Axis Values We call these fields. Note that the values that the field stores can be floats, ints, strings, or bools.
  • 35. © 2019 InfluxData. All rights reserved.35 The Collection of Fields We call this the fieldset. Note that in this case, there's only one field. (There could be many.) price=177.03 price=32.10 price=35.52
  • 36. © 2019 InfluxData. All rights reserved.36 X-Axis Value We call this the timestamp.
  • 37. © 2019 InfluxData. All rights reserved.37 How do we represent points textually?
  • 38. © 2019 InfluxData. All rights reserved.38 Points in InfluxDB look like… stock_price,ticker=A,market=NASDAQ price=177.03 1445299200000000000 stock_price,ticker=AA,market=NYSE price=32.10 1445299200000000000 stock_price,ticker=AAPL,market=NASDAQ price=45 1445299200000000000a
  • 39. © 2019 InfluxData. All rights reserved.39 The Line protocol measurement,tagset fieldset timestamp stock_price,ticker=A,market=NASDAQ price=177.03 14452992000000000
  • 40. © 2018 InfluxData. All rights reserved.40 Concepts: Time Series Database Schema Data Ingestion Format • Points are written to InfluxDB using the Line Protocol, which follows the following format: <measurement>[,<tag-key>=<tag-value>] [<field-key>=<field-value>] [unix-nano-timestamp] cpu_load,hostname=server02,az=us_west temp=24.5,volts=7 1234567890000000 Reference: https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/ Measurement Tag Set Field Set Timestamp
  • 41. © 2019 InfluxData. All rights reserved.41 A Series in InfluxDB measurement + tagset = the series as a whole measurement + tagset + timestamp = single point
  • 42. © 2019 InfluxData. All rights reserved.42 Examples of points in Line Protocol cpu,host=server1 value=100 1445299200000000000 temperature,zipcode=94107,country=usa value=75,humidity=10 1445299200000000000 response_time,method=GET,precision=ms value=12i 1445299200000000000
  • 43. © 2019 InfluxData. All rights reserved.43 Use Cases
  • 44. © 2018 InfluxData. All rights reserved.44 © 2018 InfluxData. All rights reserved.44 “We analyze over 70,000 hours of data every night, half a billion data points, to produce alerts for our technicians. Having this real-time data in the cloud makes it possible to identify trends, usage patterns & even detect problems before they exist!” David McLean Lead Developer at BBOXX
  • 45. © 2019 InfluxData. All rights reserved.45 © 2019 InfluxData. All rights reserved.45
  • 46. © 2019 InfluxData. All rights reserved.46 © 2019 InfluxData. All rights reserved.46
  • 47. © 2018 InfluxData. All rights reserved.47 © 2018 InfluxData. All rights reserved.47 Wayfair is using InfluxData to monitor system metrics & events across datacenters, and Real User monitoring (RUM) to understand user experience on their e-commerce site. The goal is to marry these with business process events and provide better business insight and competitive advantage
  • 48. © 2019 InfluxData. All rights reserved.48 © 2019 InfluxData. All rights reserved.48 • InfluxDB is a major component in Wayfair’s Cyber 5 Holiday Weekend monitoring and alerting systems – Data center metrics: 100s of apps sending metrics from multiple data centers – Real User Monitoring (Client side monitoring) • Understand user experience - deploy 100s of code changes to the app, each change has the potential to impact performance for better or worse – Daily, 20 million RUM measurements across 8 stores, hundreds of page types, & thousands of device types (phones, tablets, laptops & PCs) • Wayfair is working strategically with InfluxData – Ensure the Wayfair implementation is scalable, robust, and in line with the future direction of InfluxDB – Providing case studies to help drive InfluxDB enterprise feature requests Measuring performance of a high-traffic e-commerce site
  • 49. © 2019 InfluxData. All rights reserved.49 Architecture • Multi-layer pipeline with Telegraf – Receive raw InfluxDB line protocol from applications via UDP and forward it on to Kafka – Consumes metrics from the Kafka buffer and writes to InfluxDB • Connects multiple data centers by mirroring Kafka topics to shuttle metrics (vs through cross-datacenter db replication) • Ability to use UDP (fast, non-blocking) and TCP (more transactionally robust) • Ability to inject various processing hooks into the data stream as the business needs evolve • Easy to write the same data to multiple instances of InfluxDB • Multi-day tolerance against a severe network connectivity incident
  • 50. © 2019 InfluxData. All rights reserved.50 Graphite limitations ✓ No clustering ✓ Storage infrastructure that is difficult to manage ✓ No out-of-the-box data pipeline solution ✓ Graphite maxed out its performance potential ✓ No shard relocation & backup/restore InfluxDB Advantages ✓ Rapid performance improvements & hardening ✓ Supports true clustering ✓ Written in Go, a language that fully exploits multi-core environments ✓ Built with an emphasis on efficient & scalable storage ✓ Active community & an ecosystem of related tools for building enterprise-wide installations ✓ Growing customer base from a diverse set of companies Why replace Graphite w/ InfluxDB
  • 51. © 2018 InfluxData. All rights reserved.51 © 2018 InfluxData. All rights reserved.51 InfluxDB is a major component to providing observability across the board at Playtech. It is better to have simpler and faster monitoring systems with tools for post hoc analysis and avoid “magic” systems that try to automatically detect causality. Aleksandr Tavgen Technical Archtiect, Playtech
  • 52. © 2019 InfluxData. All rights reserved.52 © 2019 InfluxData. All rights reserved.52 • 50+ multi-branded sites, distributed worldwide, multiple products & channels – Handle financial transactions in strictly regulated markets – There are also different variations in their back end • InfluxDB is a major component to providing observability across the board – Production System level Monitoring • Low-level Monitoring—CPU, disk, memory usage, networking, garbage collection cycles, etc. Cannot reflect business logic processes due to the # of interdependencies between various components and services • High-level Monitoring—KPIs (user sessions, payments, transaction amounts, logins, etc.). Possible to indirectly monitor complex system behavior, especially in the case of distributed systems. – Organizational level Monitoring • Map of all product portfolios, clients, sites, & brands tied to internal org chart (who is responsible for a features, who wrote the code, etc) Monitoring distributed systems
  • 53. © 2019 InfluxData. All rights reserved.53 Architecture • Service-oriented architecture where its components have very complex business logic and states • Natural way of defining the boundaries and responsibilities of each component; easier to add features independently and on-the-fly without the fear of breaking something • The microservices are executed on their own Python VMs • System has an event-driven design. All communication goes through the message queue and the system works in an asynchronous way
  • 54. © 2019 InfluxData. All rights reserved.54 SHA limitations ✓ Black box – impossible to change its settings or tune it ✓ Yields a huge amount of false positives and false negatives ✓ SHA’s granularity is 15 minutes ✓ Hewlett-Packard’s SHA does not work as expected - trying to be a universal instrument InfluxDB Advantages ✓ Time Series DB for any metric ✓ Widely adopted by the community ✓ Fast and reliable ✓ Comes with great support Why replace HP Service Health Analyzer (SHA) w/ InfluxDB
  • 55. © 2018 InfluxData. All rights reserved.55 © 2018 InfluxData. All rights reserved.55 A&S Energie is a Belgian biomass plant that supplies electricity by burning non-recyclable wood. They replaced their existing solution with Factry Historian, powered by open source InfluxDB. This allowed A&S Energie to increase use of the process data to make their staff more efficient as well as find failures faster.
  • 56. © 2019 InfluxData. All rights reserved.56 © 2019 InfluxData. All rights reserved.56 • A&S Energie presented a classic Industrial IoT challenge involving high-cost, confined industrial equipment requiring a useful life of many decades. – Operators wrote down counters on paper (values of certain measurements needed for reporting on the power plant’s status) – Based on this paper-reliant reporting, Management reporting was done with Excel – The existing Historian was underutilized—its data was not being used to extract meaning - Historical view only provided 2 weeks’ worth of data • Factry Data Historian with InfluxDB – OPC UA Connecter, InfluxDB for storage and Grafana for visualization Measuring performance of a high-traffic e-commerce site
  • 57. © 2019 InfluxData. All rights reserved.57 Architecture • Datasources – From several sensor sources (Siemens, Rockwell Automation, Schneider Electric) – Uses an OPC UA collector • Stores all time-stamped values in InfluxDB • All other contextual data from multiple sources • Grafana dashboards provided to management and operators
  • 58. © 2019 InfluxData. All rights reserved.58 Existing Data Historian ✓ No easy access to long term data ✓ Still too many disparate databases including Excel and paper ✓ Legacy software – Windows 2000 ✓ Not accessible – in a closet – really! ✓ Not easy to expand on the data InfluxDB Advantages ✓ Real-time and historical insights for everyone, from machine operator to plant manager ✓ Handling high write loads ✓ Platform-independent - Most historians are based on proprietary systems and are Windows-based ✓ Retention policy options - keeps the data at its original resolution for as long as needed Why replace their Data Historian
  • 59. © 2019 InfluxData. All rights reserved.59 Writing Data into InfluxDB
  • 60. © 2019 InfluxData. All rights reserved.60 Creating a database CREATE DATABASE mydb Last login: Mon Oct 19 10:50:43 on ttys006 ~$ influx Connected to http://localhost:8086 version 0.9 InfluxDB shell 0.9 > create database mydb >
  • 61. © 2019 InfluxData. All rights reserved.61 © 2019 InfluxData. All rights reserved.61 Verifying that it was created SHOW DATABASES > show databases name: databases --------------- name _internal mydb
  • 62. © 2019 InfluxData. All rights reserved.62 © 2019 InfluxData. All rights reserved.62 Using the database we just created USE mydb > use mydb Using database mydb >
  • 63. © 2019 InfluxData. All rights reserved.63 © 2019 InfluxData. All rights reserved.63 Inserting data into the database insert cpu,host=server1,location=us-west value=10 insert cpu,host=server1,location=london value=11 insert cpu,host=server2,location=us-west value=12
  • 64. © 2019 InfluxData. All rights reserved.64 © 2019 InfluxData. All rights reserved.64 Verifying that the data was written SELECT * FROM cpu // PS. Be careful! This query can be very expensive. SHOW SERIES SHOW MEASUREMENTS