SlideShare a Scribd company logo
Order from Chaos:
Automating Monitoring Configuration
Molly Duggan
Harvard – FAS Research Computing
https://rc.fas.harvard.edu
A Little Context
● 100,000 CPU cores on 3,000 nodes running 29 million jobs/year
● 40PB of storage on a variety of different systems
● 2 data centers
● 500+ lab groups with over 5500 users
● Cloud and VM infrastructure
○ Connected VMs for applications showing research data
○ Assorted DBs (researchers, museums)
○ Internal services (puppet, gitlab, etc)
Everything that’s not compute is a snowflake!
What We Monitor
Options
● Manual configuration through a dashboard with backups
○ Too much of a free-for-all
○ Not easy to see changes
○ Too hard to roll back updates
● Config Management
○ Old puppet version
○ Too much in one place
● Script against sensuctl
○ Not everything we wanted was implemented at the time we began this
project (during the beta)
○ Asset packaging needs
Hinoki
● A tiny command-line tool to manage Sensu configuration
● Advantages for our shop:
○ Discrete repo with audit trail
○ Easy contribution for everyone on the team - just a git push
○ Flexibility moving forward
● Equal parts code and convention with CI/CD integration
● Import definitions via Sensu API
● Quick-start provisioning of an empty cluster
● Ship and package assets, add hash to configs
● pip install hinoki
Demo #1: Updating Settings
Demo #2: Initialize New Cluster
Outstanding Issues
● Add more features!
● We still repeat ourselves
● It can still be confusing to understand what you might be touching if
you alter a check
Thank You!
Molly Duggan
Harvard – FAS Research Computing
https://rc.fas.harvard.edu
Github:
@exitquote
https://github.com/fasrc/hinoki

More Related Content

PPTX
Pull, Don't Push! Sensu Summit 2018 Talk
PDF
Keynote: Scaling Sensu Go
PDF
Keynote: Sensu as a multi-cloud monitoring control plane
PPTX
Herding cats & catching fire: Workday's telemetry & middleware
PDF
7 Years of Sensu: Then, Now, and Soon
PDF
The Bonsai Asset Index : A new way for the community to share resources
PPTX
PPB's Sensu Journey
PDF
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
Pull, Don't Push! Sensu Summit 2018 Talk
Keynote: Scaling Sensu Go
Keynote: Sensu as a multi-cloud monitoring control plane
Herding cats & catching fire: Workday's telemetry & middleware
7 Years of Sensu: Then, Now, and Soon
The Bonsai Asset Index : A new way for the community to share resources
PPB's Sensu Journey
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...

What's hot (20)

PDF
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
PDF
Tapjoy OpenStack Summit Paris Breakout Session
PDF
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
PDF
OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...
PDF
Masterless puppet
PPTX
OpenContrail Implementations
PDF
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
PDF
PuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, Puppet
PDF
Season 7 Episode 1 - Tools for Data Scientists
PPTX
Cloudera migration oozie_hadoop_ci_cd_pipeline
PDF
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
PDF
OpenNebulaConf2018 - We use OpenNebula everywhere now - Florian Heigl and Tho...
PDF
Monitoring Kubernetes with Prometheus
PDF
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
PDF
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
PDF
Incremental steps -- Lighting Talk
PPTX
K8s@Pollfish - Can you run a monolith on k8s?
ODP
PDF
Embracing Serverless with Google
PDF
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
Tapjoy OpenStack Summit Paris Breakout Session
OpenNebula Conf 2014 | Bootstrapping a virtual infrastructure using OpenNebul...
OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...
Masterless puppet
OpenContrail Implementations
Monitoring Uptime on the NeCTAR Research Cloud - Andy Botting, University of ...
PuppetConf 2017: Cloud, Containers, Puppet and You- Carl Caum, Puppet
Season 7 Episode 1 - Tools for Data Scientists
Cloudera migration oozie_hadoop_ci_cd_pipeline
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
OpenNebulaConf2018 - We use OpenNebula everywhere now - Florian Heigl and Tho...
Monitoring Kubernetes with Prometheus
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
Incremental steps -- Lighting Talk
K8s@Pollfish - Can you run a monolith on k8s?
Embracing Serverless with Google
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
Ad

Similar to Order from chaos: automating monitoring configuration (20)

PDF
Enabling Presto Caching at Uber with Alluxio
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
2021.02 new in Ceph Pacific Dashboard
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
PDF
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
PDF
Task migration using CRIU
PDF
Scheduling a fuller house - Talk at QCon NY 2016
PDF
Netflix Container Scheduling and Execution - QCon New York 2016
PDF
What's New with Ceph - Ceph Day Silicon Valley
PDF
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
PDF
How to Develop and Operate Cloud First Data Platforms
PDF
Data Science in the Cloud @StitchFix
PDF
Deploy Eclipse hawBit in Production
PDF
[WSO2Con EU 2018] Architecting for a Container Native Environment
PPTX
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
PDF
Integrating Puppet and Gitolite for sysadmins cooperations
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
PDF
Infrastructure as code
PDF
NetflixOSS Meetup season 3 episode 1
PDF
Modern Computing System & Beyond
Enabling Presto Caching at Uber with Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
2021.02 new in Ceph Pacific Dashboard
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Task migration using CRIU
Scheduling a fuller house - Talk at QCon NY 2016
Netflix Container Scheduling and Execution - QCon New York 2016
What's New with Ceph - Ceph Day Silicon Valley
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
How to Develop and Operate Cloud First Data Platforms
Data Science in the Cloud @StitchFix
Deploy Eclipse hawBit in Production
[WSO2Con EU 2018] Architecting for a Container Native Environment
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Integrating Puppet and Gitolite for sysadmins cooperations
How to Develop and Operate Cloud Native Data Platforms and Applications
Infrastructure as code
NetflixOSS Meetup season 3 episode 1
Modern Computing System & Beyond
Ad

More from Sensu Inc. (16)

PPTX
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
PDF
Monitoring Graceful Failure
PDF
Testing and monitoring and broken things
PDF
Keynote: Measuring the right things
PDF
AIOps & Observability to Lead Your Digital Transformation
PDF
Ecosystem session: Sensu + Puppet
PPTX
Pull, don’t push: Architectures for monitoring and configuration in a microse...
PPTX
Assets in Sensu 2.0
PPTX
The Box.com success story: migrating 350K Nagios objects to Sensu
PPTX
Project 3M: Meaningful Monitoring and Messaging
PPTX
Sharing Sensu with Multiple Teams using Ansible
PPTX
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
PDF
Reimagining Sensu
PPTX
Alert Fatigue: Avoidance and Course Correction
PDF
Sensu and Kubernetes 1.x
PDF
Sensu and Puppet
Introducing GoAlert: a brand-new on-call scheduling and notification open sou...
Monitoring Graceful Failure
Testing and monitoring and broken things
Keynote: Measuring the right things
AIOps & Observability to Lead Your Digital Transformation
Ecosystem session: Sensu + Puppet
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Assets in Sensu 2.0
The Box.com success story: migrating 350K Nagios objects to Sensu
Project 3M: Meaningful Monitoring and Messaging
Sharing Sensu with Multiple Teams using Ansible
Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu
Reimagining Sensu
Alert Fatigue: Avoidance and Course Correction
Sensu and Kubernetes 1.x
Sensu and Puppet

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Machine Learning_overview_presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Machine Learning_overview_presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Programs and apps: productivity, graphics, security and other tools
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Assigned Numbers - 2025 - Bluetooth® Document
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Order from chaos: automating monitoring configuration

  • 1. Order from Chaos: Automating Monitoring Configuration Molly Duggan Harvard – FAS Research Computing https://rc.fas.harvard.edu
  • 2. A Little Context ● 100,000 CPU cores on 3,000 nodes running 29 million jobs/year ● 40PB of storage on a variety of different systems ● 2 data centers ● 500+ lab groups with over 5500 users ● Cloud and VM infrastructure ○ Connected VMs for applications showing research data ○ Assorted DBs (researchers, museums) ○ Internal services (puppet, gitlab, etc) Everything that’s not compute is a snowflake!
  • 4. Options ● Manual configuration through a dashboard with backups ○ Too much of a free-for-all ○ Not easy to see changes ○ Too hard to roll back updates ● Config Management ○ Old puppet version ○ Too much in one place ● Script against sensuctl ○ Not everything we wanted was implemented at the time we began this project (during the beta) ○ Asset packaging needs
  • 5. Hinoki ● A tiny command-line tool to manage Sensu configuration ● Advantages for our shop: ○ Discrete repo with audit trail ○ Easy contribution for everyone on the team - just a git push ○ Flexibility moving forward ● Equal parts code and convention with CI/CD integration ● Import definitions via Sensu API ● Quick-start provisioning of an empty cluster ● Ship and package assets, add hash to configs ● pip install hinoki
  • 6. Demo #1: Updating Settings
  • 7. Demo #2: Initialize New Cluster
  • 8. Outstanding Issues ● Add more features! ● We still repeat ourselves ● It can still be confusing to understand what you might be touching if you alter a check
  • 9. Thank You! Molly Duggan Harvard – FAS Research Computing https://rc.fas.harvard.edu Github: @exitquote https://github.com/fasrc/hinoki