SlideShare a Scribd company logo
Shashi Raina
Partner Solution Architect, AWS
Al Sargent,
Sr. Director, Product Marketing,
InfluxData
Build modern
monitoring with
InfluxDB and AWS
Monitoring AWS: Best Practices
Monitoring AWS with InfluxDB
Configure Telegraf to monitor AWS
Demo
Best Practices for monitoring AWS
© 2020 InfluxData. All rights reserved. 4
So Why Monitor In the First Place?
To Gain Insights!
• Customer Experience
• Performance & Cost
• Trends
• Troubleshooting & Remediation
• Learning & Improvement
© 2020 InfluxData. All rights reserved. 5
What Goes Into a Monitoring Plan?
Alerts
System
Knowledge
People
Actions
Tools
© 2020 InfluxData. All rights reserved. 6
Alerting Best Practices
• Break alert crafting into batches. Highest Priority First
• Refine quickly.
• Alert to prompt an action
• Descriptive alerts to aid in prompt resolution
• Don’t only use email
© 2020 InfluxData. All rights reserved. 7
© 2020 InfluxData. All rights reserved. 8
© 2020 InfluxData. All rights reserved. 9
Summary
- Check your monitoring approach
- Is it user-centric?
- Are you measuring the right things?
- Write a monitoring plan
- Start monitoring, test and iterate
The reason operations exists is to support the needs of the business.
Monitoring AWS with InfluxDB
© 2019 InfluxData. All rights reserved.11
Accumulate Act
Telegraf Inputs
CloudWatch plugin
ECS plugin
System plugin
Docker plugin
Kubernetes plugins
Kinesis plugin
MQTT & Modbus plugins
Flux
RDS joins
93 AWS services
AWS ECS &
Fargate
AWS EC2
AWS EKS
AWS Kinesis
IoT devices &
sensors
AWS RDS
MariaDB, MySQL,
Postgres
Analyze
AWS Global Infrastructure
InfluxDB Cloud
InfluxDB
Enterprise
InfluxDB
Purpose-built Time Series Database
Realtime Data Stream Processing
Visualization & Dashboarding
Data Analysis & Anomaly Detection
Alerting & Notifications
Alerting Systems
PagerDuty
Slack
Webhooks
Telegraf Outputs
AWS Kinesis
AWS CloudWatch
Grafana
Client Libraries
AWS Marketplace Billing
© 2020 InfluxData. All rights reserved. 12
Telegraf input plugins on GitHub
Telegraf Inputs
CloudWatch plugin
ECS plugin
System plugin
Docker plugin
Kubernetes plugins
Kinesis plugin
MQTT & Modbus plugins
github.com/influxdata/telegraf/tree/master/plugins/inputs
Setup telegraf.conf for CloudWatch
© 2019 InfluxData. All rights reserved.14
Accumulate Act
Telegraf Inputs
ECS plugin
System plugin
Docker plugin
Kubernetes plugins
Kinesis plugin
MQTT & Modbus plugins
Flux
RDS joins
AWS ECS &
Fargate
AWS EC2
AWS EKS
AWS Kinesis
IoT devices &
sensors
AWS RDS
MariaDB, MySQL,
Postgres
Analyze
AWS Global Infrastructure
InfluxDB Cloud
InfluxDB
Enterprise
InfluxDB
Purpose-built Time Series Database
Realtime Data Stream Processing
Visualization & Dashboarding
Data Analysis & Anomaly Detection
Alerting & Notifications
Alerting Systems
PagerDuty
Slack
Webhooks
Telegraf Outputs
AWS Kinesis
AWS CloudWatch
Grafana
Client Libraries
AWS Marketplace Billing
CloudWatch plugin93 AWS services
© 2020 InfluxData. All rights reserved. 15
First, the overall agent config
[agent]
# Run telegraf with debug log messages?
debug = false
# How often to collect metrics
interval = "30s"
# Default flushing interval for all outputs
flush_interval = "10s"
# How many metrics to cache
metric_buffer_limit = 50000
© 2020 InfluxData. All rights reserved. 16
Specify your cloud region
[[inputs.cloudwatch]]
# Specify your AWS Region
# region = "eu-central-1" # Frankfurt
# region = "eu-north-1" # Stockholm
# region = "eu-west-1" # Dublin
# region = "eu-west-2" # London
# region = "eu-west-3" # Paris
# region = "eu-south-1" # Milan
# region = "us-east-1" # Virginia
# region = "us-east-2" # Ohio
# region = "us-west-1" # Northern California
region = "us-west-2" # Oregon
# https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
© 2020 InfluxData. All rights reserved. 17
Specify your AWS credentials
# AWS credentials
# Credentials are loaded in the following order
# 1) Assumed credentials via STS if role_arn is specified
# 2) explicit credentials from 'access_key' and 'secret_key'
# 3) shared profile from 'profile'
# 4) environment variables
# 5) shared credentials file
# 6) EC2 Instance Profile
# access_key = ""
# secret_key = ""
# token = ""
# role_arn = ""
# profile = ""
shared_credential_file = "./credentials"
[default]
aws_access_key_id = AKIAI53FASDFP7J3KQ
aws_secret_access_key = 4EZ7As/Lmr2d1JgUaIdakr+58hpBJ
credentials
© 2020 InfluxData. All rights reserved. 18
Specify your collection timing
# Requested CloudWatch aggregation Period (required - must be a multiple of 60s)
period = "1m"
# Collection Delay (required - must account for metrics availability via CloudWatch API)
delay = "5m"
# Recommended: use metric 'interval' that is a multiple of 'period' to avoid
# gaps or overlap in pulled data
interval = "1m"
© 2020 InfluxData. All rights reserved. 19
Configure your metric namespaces
# Metric Statistic Namespace (required)
# List of namespaces:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch-
metrics.html
namespace = "AWS/EC2"
# Maximum requests per second. Note that the global default AWS rate limit is
# 50 reqs/sec, so if you define multiple namespaces, these should add up to a
# maximum of 50.
ratelimit = 25
© 2020 InfluxData. All rights reserved. 20
docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch-metrics.html
© 2020 InfluxData. All rights reserved. 21
Add tags to find your metrics more easily
# Optional tags that you can add
[inputs.cloudwatch.tags]
plugin = 'cloudwatch'
aws_service = 'ec2'
© 2020 InfluxData. All rights reserved. 22
Add tags to find your metrics more easily
# Optional tags that you can add
[inputs.cloudwatch.tags]
plugin = 'cloudwatch'
aws_service = 'ec2'
© 2020 InfluxData. All rights reserved. 23
Specify the metrics you want Telegraf to pull
[[inputs.cloudwatch.metrics]]
# List of EC2 metrics available:
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
names = ["StatusCheckFailed","EBSReadBytes","CPUSurplusCreditsCharged","EBSByteBalance%",
"StatusCheckFailed_System","EBSWriteBytes","NetworkIn","CPUCreditUsage",
"EBSIOBalance%","EBSReadOps","CPUCreditBalance","StatusCheckFailed_Instance",
"CPUUtilization","NetworkOut"]
© 2020 InfluxData. All rights reserved. 24
List of metrics varies by AWS service
© 2020 InfluxData. All rights reserved. 25
Specify your instance
[[inputs.cloudwatch.metrics.dimensions]]
name = "InstanceId"
# This will be unique for each AWS instance
value = "i-06025f2c26acfbf47"
© 2020 InfluxData. All rights reserved. 26
Best practice: Have Telegraf monitor itself
# Collect metrics on Telegraf itself
[[inputs.internal]]
collect_memstats = true
# Tag stats with the metric name for easier retrieval
[inputs.internal.tags]
plugin = 'internal'
© 2020 InfluxData. All rights reserved. 27
Send data to your InfluxDB instance
[[outputs.influxdb_v2]]
# Location of your InfluxDB Cloud instance
# Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/
urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
# Store token in an environment variable called US_WEST_2_1
token = "$US_WEST_2_1"
# Your org is the email you signed up with
organization = "asargent@influxdata.com"
bucket = "aws"
# About 5x faster
content_encoding = "gzip"
© 2020 InfluxData. All rights reserved. 28
You can dual-write to multiple instances
[[outputs.influxdb_v2]]
# Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/
urls = ["https://eu-central-1-1.aws.cloud2.influxdata.com"]
# Store token in an environment variable called EU_CENTRAL_1_1
token = "$EU_CENTRAL_1_1"
organization = "asargent+aws-eu-central-1@influxdata.com"
bucket = "aws"
content_encoding = "gzip"
© 2020 InfluxData. All rights reserved. 29
To troubleshoot, write line protocol to stdout
[[outputs.file]]
files = ["stdout"]
data_format = "influx"
Demo
© 2020 InfluxData. All rights reserved. 31
github.com/alsargent/telegraf-cloudwatch
© 2019 InfluxData. All rights reserved.32
Accumulate Act
Telegraf Inputs
CloudWatch plugin
ECS plugin
System plugin
Docker plugin
Kubernetes plugins
Kinesis plugin
MQTT & Modbus plugins
Flux
RDS joins
93 AWS services
AWS ECS &
Fargate
AWS EC2
AWS EKS
AWS Kinesis
IoT devices &
sensors
AWS RDS
MariaDB, MySQL,
Postgres
Analyze
InfluxDB
Purpose-built Time Series Database
Realtime Data Stream Processing
Visualization & Dashboarding
Data Analysis & Anomaly Detection
Alerting & Notifications
Alerting Systems
PagerDuty
Slack
Webhooks
Telegraf Outputs
AWS Kinesis
AWS CloudWatch
Grafana
Client LibrariesAWS Global Infrastructure
InfluxDB Cloud
InfluxDB
Enterprise
AWS Marketplace Billing
© 2020 InfluxData. All rights reserved. 33
InfluxDB on AWS
© 2020 InfluxData. All rights reserved. 34
© 2020 InfluxData. All rights reserved. 35

More Related Content

PDF
Christoph Bussler [Google Cloud] | IoT Event Processing and Analytics with In...
PPTX
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
PPTX
Michael DeSa [InfluxData] | Monitoring Methodologies | InfluxDays Virtual Exp...
PDF
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
PDF
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
PPTX
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
PDF
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Christoph Bussler [Google Cloud] | IoT Event Processing and Analytics with In...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Michael DeSa [InfluxData] | Monitoring Methodologies | InfluxDays Virtual Exp...
How to Gain Visibility into Containers, VM’s and Multi-Cloud Environments Usi...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics

What's hot (20)

PDF
InfluxData Architecture for IoT | Noah Crowley | InfluxData
PPTX
Brandon Farmer [InfluxData] | Tools for Working with Flux Now and in the Futu...
PDF
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
PPTX
InfluxDB Cloud Product Update
PDF
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
PDF
Monitor Kubernetes in Rancher using InfluxData
PDF
InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
PDF
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
PDF
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
PDF
Catalogs - Turning a Set of Parquet Files into a Data Set
PDF
Bernard Paques & Kevin Polossat [AWS] | Combining the Power of InfluxDB and A...
PDF
Brian Gilmore [InfluxData] | InfluxDB in an IoT Application Architecture | In...
PDF
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
PDF
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
PDF
Kapacitor Stream Processing
PDF
Three Ways InfluxDB Enables You to Use Time Series Data Across Your Entire En...
PDF
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
PDF
Virtual training intro to InfluxDB - June 2021
PDF
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
InfluxData Architecture for IoT | Noah Crowley | InfluxData
Brandon Farmer [InfluxData] | Tools for Working with Flux Now and in the Futu...
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
InfluxDB Cloud Product Update
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
Monitor Kubernetes in Rancher using InfluxData
InfluxDB + Kepware: Start Monitoring Industrial Data Quickly
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Alex Nauda [Nobl9] | How Not to Build an SLO Platform | InfluxDays NA 2021
Catalogs - Turning a Set of Parquet Files into a Data Set
Bernard Paques & Kevin Polossat [AWS] | Combining the Power of InfluxDB and A...
Brian Gilmore [InfluxData] | InfluxDB in an IoT Application Architecture | In...
IT Monitoring in the Era of Containers | Luca Deri Founder & Project Lead | ntop
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
Tanny Ng, Nadeem Syed [WP Engine] | How WP Engine Transformed Monitoring Into...
Kapacitor Stream Processing
Three Ways InfluxDB Enables You to Use Time Series Data Across Your Entire En...
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
Virtual training intro to InfluxDB - June 2021
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...
Ad

Similar to Shashi Raina [AWS] & Al Sargent [InfluxData] | Build Modern Monitoring with InfluxDB and AWS | InfluxDays Virtual Experience London 2020 (20)

PDF
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
PPTX
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
PPTX
AWS Monitoring & Logging
PPTX
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
PDF
InfluxDB Live Product Training
PPTX
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
PDF
Monitoring InfluxEnterprise
PDF
Taming the Tiger: Tips and Tricks for Using Telegraf
PDF
Build a custom metrics on aws cloud
PDF
Cloudwatch: Monitoring your Services with Metrics and Alarms
PPTX
AWS Cloud Watch
PDF
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
PDF
Cloudwatch: Monitoring your AWS services with Metrics and Alarms
PDF
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
PPTX
pptforclass10kkkkkkkclasseee2eewsw10scienve
PPTX
Taming the Tiger: Tips and Tricks for Using Telegraf
PDF
Application & Account Monitoring in AWS
PDF
Virtual training Intro to InfluxDB & Telegraf
PPTX
The Art of Container Monitoring
PDF
Gilmore, Palani [InfluxData] | Use Case: Monitoring / Observability | InfluxD...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
AWS Monitoring & Logging
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
InfluxDB Live Product Training
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Monitoring InfluxEnterprise
Taming the Tiger: Tips and Tricks for Using Telegraf
Build a custom metrics on aws cloud
Cloudwatch: Monitoring your Services with Metrics and Alarms
AWS Cloud Watch
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...
Cloudwatch: Monitoring your AWS services with Metrics and Alarms
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
pptforclass10kkkkkkkclasseee2eewsw10scienve
Taming the Tiger: Tips and Tricks for Using Telegraf
Application & Account Monitoring in AWS
Virtual training Intro to InfluxDB & Telegraf
The Art of Container Monitoring
Gilmore, Palani [InfluxData] | Use Case: Monitoring / Observability | InfluxD...
Ad

More from InfluxData (20)

PPTX
Announcing InfluxDB Clustered
PDF
Best Practices for Leveraging the Apache Arrow Ecosystem
PDF
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
PDF
Power Your Predictive Analytics with InfluxDB
PDF
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
PDF
Build an Edge-to-Cloud Solution with the MING Stack
PDF
Meet the Founders: An Open Discussion About Rewriting Using Rust
PDF
Introducing InfluxDB Cloud Dedicated
PDF
Gain Better Observability with OpenTelemetry and InfluxDB
PPTX
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
PDF
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
PPTX
Introducing InfluxDB’s New Time Series Database Storage Engine
PDF
Start Automating InfluxDB Deployments at the Edge with balena
PDF
Understanding InfluxDB’s New Storage Engine
PDF
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
PPTX
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
PDF
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
PDF
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Announcing InfluxDB Clustered
Best Practices for Leveraging the Apache Arrow Ecosystem
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
Power Your Predictive Analytics with InfluxDB
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
Build an Edge-to-Cloud Solution with the MING Stack
Meet the Founders: An Open Discussion About Rewriting Using Rust
Introducing InfluxDB Cloud Dedicated
Gain Better Observability with OpenTelemetry and InfluxDB
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
Introducing InfluxDB’s New Time Series Database Storage Engine
Start Automating InfluxDB Deployments at the Edge with balena
Understanding InfluxDB’s New Storage Engine
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced IT Governance
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced Soft Computing BINUS July 2025.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced IT Governance
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
GamePlan Trading System Review: Professional Trader's Honest Take
Modernizing your data center with Dell and AMD
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced Soft Computing BINUS July 2025.pdf

Shashi Raina [AWS] & Al Sargent [InfluxData] | Build Modern Monitoring with InfluxDB and AWS | InfluxDays Virtual Experience London 2020

  • 1. Shashi Raina Partner Solution Architect, AWS Al Sargent, Sr. Director, Product Marketing, InfluxData Build modern monitoring with InfluxDB and AWS
  • 2. Monitoring AWS: Best Practices Monitoring AWS with InfluxDB Configure Telegraf to monitor AWS Demo
  • 3. Best Practices for monitoring AWS
  • 4. © 2020 InfluxData. All rights reserved. 4 So Why Monitor In the First Place? To Gain Insights! • Customer Experience • Performance & Cost • Trends • Troubleshooting & Remediation • Learning & Improvement
  • 5. © 2020 InfluxData. All rights reserved. 5 What Goes Into a Monitoring Plan? Alerts System Knowledge People Actions Tools
  • 6. © 2020 InfluxData. All rights reserved. 6 Alerting Best Practices • Break alert crafting into batches. Highest Priority First • Refine quickly. • Alert to prompt an action • Descriptive alerts to aid in prompt resolution • Don’t only use email
  • 7. © 2020 InfluxData. All rights reserved. 7
  • 8. © 2020 InfluxData. All rights reserved. 8
  • 9. © 2020 InfluxData. All rights reserved. 9 Summary - Check your monitoring approach - Is it user-centric? - Are you measuring the right things? - Write a monitoring plan - Start monitoring, test and iterate The reason operations exists is to support the needs of the business.
  • 11. © 2019 InfluxData. All rights reserved.11 Accumulate Act Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins 93 AWS services AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze AWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client Libraries AWS Marketplace Billing
  • 12. © 2020 InfluxData. All rights reserved. 12 Telegraf input plugins on GitHub Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins github.com/influxdata/telegraf/tree/master/plugins/inputs
  • 14. © 2019 InfluxData. All rights reserved.14 Accumulate Act Telegraf Inputs ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze AWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client Libraries AWS Marketplace Billing CloudWatch plugin93 AWS services
  • 15. © 2020 InfluxData. All rights reserved. 15 First, the overall agent config [agent] # Run telegraf with debug log messages? debug = false # How often to collect metrics interval = "30s" # Default flushing interval for all outputs flush_interval = "10s" # How many metrics to cache metric_buffer_limit = 50000
  • 16. © 2020 InfluxData. All rights reserved. 16 Specify your cloud region [[inputs.cloudwatch]] # Specify your AWS Region # region = "eu-central-1" # Frankfurt # region = "eu-north-1" # Stockholm # region = "eu-west-1" # Dublin # region = "eu-west-2" # London # region = "eu-west-3" # Paris # region = "eu-south-1" # Milan # region = "us-east-1" # Virginia # region = "us-east-2" # Ohio # region = "us-west-1" # Northern California region = "us-west-2" # Oregon # https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
  • 17. © 2020 InfluxData. All rights reserved. 17 Specify your AWS credentials # AWS credentials # Credentials are loaded in the following order # 1) Assumed credentials via STS if role_arn is specified # 2) explicit credentials from 'access_key' and 'secret_key' # 3) shared profile from 'profile' # 4) environment variables # 5) shared credentials file # 6) EC2 Instance Profile # access_key = "" # secret_key = "" # token = "" # role_arn = "" # profile = "" shared_credential_file = "./credentials" [default] aws_access_key_id = AKIAI53FASDFP7J3KQ aws_secret_access_key = 4EZ7As/Lmr2d1JgUaIdakr+58hpBJ credentials
  • 18. © 2020 InfluxData. All rights reserved. 18 Specify your collection timing # Requested CloudWatch aggregation Period (required - must be a multiple of 60s) period = "1m" # Collection Delay (required - must account for metrics availability via CloudWatch API) delay = "5m" # Recommended: use metric 'interval' that is a multiple of 'period' to avoid # gaps or overlap in pulled data interval = "1m"
  • 19. © 2020 InfluxData. All rights reserved. 19 Configure your metric namespaces # Metric Statistic Namespace (required) # List of namespaces: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch- metrics.html namespace = "AWS/EC2" # Maximum requests per second. Note that the global default AWS rate limit is # 50 reqs/sec, so if you define multiple namespaces, these should add up to a # maximum of 50. ratelimit = 25
  • 20. © 2020 InfluxData. All rights reserved. 20 docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch-metrics.html
  • 21. © 2020 InfluxData. All rights reserved. 21 Add tags to find your metrics more easily # Optional tags that you can add [inputs.cloudwatch.tags] plugin = 'cloudwatch' aws_service = 'ec2'
  • 22. © 2020 InfluxData. All rights reserved. 22 Add tags to find your metrics more easily # Optional tags that you can add [inputs.cloudwatch.tags] plugin = 'cloudwatch' aws_service = 'ec2'
  • 23. © 2020 InfluxData. All rights reserved. 23 Specify the metrics you want Telegraf to pull [[inputs.cloudwatch.metrics]] # List of EC2 metrics available: # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html names = ["StatusCheckFailed","EBSReadBytes","CPUSurplusCreditsCharged","EBSByteBalance%", "StatusCheckFailed_System","EBSWriteBytes","NetworkIn","CPUCreditUsage", "EBSIOBalance%","EBSReadOps","CPUCreditBalance","StatusCheckFailed_Instance", "CPUUtilization","NetworkOut"]
  • 24. © 2020 InfluxData. All rights reserved. 24 List of metrics varies by AWS service
  • 25. © 2020 InfluxData. All rights reserved. 25 Specify your instance [[inputs.cloudwatch.metrics.dimensions]] name = "InstanceId" # This will be unique for each AWS instance value = "i-06025f2c26acfbf47"
  • 26. © 2020 InfluxData. All rights reserved. 26 Best practice: Have Telegraf monitor itself # Collect metrics on Telegraf itself [[inputs.internal]] collect_memstats = true # Tag stats with the metric name for easier retrieval [inputs.internal.tags] plugin = 'internal'
  • 27. © 2020 InfluxData. All rights reserved. 27 Send data to your InfluxDB instance [[outputs.influxdb_v2]] # Location of your InfluxDB Cloud instance # Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/ urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"] # Store token in an environment variable called US_WEST_2_1 token = "$US_WEST_2_1" # Your org is the email you signed up with organization = "asargent@influxdata.com" bucket = "aws" # About 5x faster content_encoding = "gzip"
  • 28. © 2020 InfluxData. All rights reserved. 28 You can dual-write to multiple instances [[outputs.influxdb_v2]] # Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/ urls = ["https://eu-central-1-1.aws.cloud2.influxdata.com"] # Store token in an environment variable called EU_CENTRAL_1_1 token = "$EU_CENTRAL_1_1" organization = "asargent+aws-eu-central-1@influxdata.com" bucket = "aws" content_encoding = "gzip"
  • 29. © 2020 InfluxData. All rights reserved. 29 To troubleshoot, write line protocol to stdout [[outputs.file]] files = ["stdout"] data_format = "influx"
  • 30. Demo
  • 31. © 2020 InfluxData. All rights reserved. 31 github.com/alsargent/telegraf-cloudwatch
  • 32. © 2019 InfluxData. All rights reserved.32 Accumulate Act Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins 93 AWS services AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client LibrariesAWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise AWS Marketplace Billing
  • 33. © 2020 InfluxData. All rights reserved. 33 InfluxDB on AWS
  • 34. © 2020 InfluxData. All rights reserved. 34
  • 35. © 2020 InfluxData. All rights reserved. 35