SlideShare a Scribd company logo
Monitor all the thingz
February 2018
@yaronidan
@SolutoEng
Monitor all the thingz slideshare
Let’s think about it...
● Does monitoring = production?
● Can alerts suck less?
● Who carries the pager?
Monitor all the thingz slideshare
Our old monitoring system -
https://www.ks-soft.net/hostmon.eng/
And then we grew…
● Ops became a bottleneck
● No monitoring for new tech
● SLA is now a thing
Monitor all the thingz slideshare
We realized we need a better solution
 Self service
 Scalable
 Open source with a wide community
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Where do we keep our monitoring data?
● TSDB – Graphite
Where do we keep our monitoring data?
● TSDB – Graphite
Where do we keep our monitoring data?
● TSDB – Graphite/Prometheus
● ELK - powered by logz.io
Where do we keep our monitoring data?
● TSDB – Graphite/Prometheus
● ELK - powered by logz.io
● Any other data source
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Let’s take a look at the engine
● Monitoring-as-code
● Scalable
● Modularity
● We can monitor anything we can think of
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Questions so far?
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Who do we alert?
One person
Rotating on-call duty
Rotating on-call duty per team
Escalation
Did we fulfill what we set out to achieve?
 Self-service
 Scalable
 Open source with a wide community
Everybody contributes (Commit statistics for production Feb 20 - Jan 29)
2000commits in 343days
Everybody contributes (Commit statistics for production Feb 20 - Jan 29)
2000commits in 343days
5.8commits per day
Everybody contributes (Commit statistics for Feb 20 - Jan 29)
2000commits in 343days
5.8commits per day
Contributed by73authors
Monitoring business
● We’re also using mixpanel for monitoring business metrics
● Different than traditional monitoring
● Gives us a broad view of the products condition
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Monitor all the thingz slideshare
Back to where we started...
● Does monitroing = production?
● Can alerts suck less?
● Who carries the pager?
What’s next?
● More prometheus
● Distribute Icinga
● Tighter coupling with code
Questions?
@yaronidan
@SolutoEng
#MeetupAtSoluto
Thank You!
@yaronidan
@SolutoEng
#MeetupAtSoluto
Obviously, we’re hiring.
Sources
Monitoring blog post at Soluto’s engineering blog
https://github.com/Soluto/nagios-plugins
https://github.com/Soluto/check-redis-plugin
https://github.com/Icinga/puppet-icinga2
Tips and Tricks
Don’t let alert fatigue get you! Adjust alerts to only fire when disaster strikes
Learn from outages - make sure alerts were firing, and if they don’t - make‘em!
Choose a solution that allows versioning, preferable using monitoring-as-code
Use templating for apps that share the same monitoring patterns
Share the joy of holding a pager with your fellow developers
Focus your monitoring efforts on metrics that can harm your business

More Related Content

PDF
いわき情報技術研究会20170513
PPTX
A Good Software Product What and How - Part2
PDF
Monitoria@reversim
PDF
Week5
PDF
Pakistan Advertiser's Society- Digit12 Report
PDF
Webinar - High Availability and Distributed Monitoring with Icinga2
PDF
2018 alldaydevops presentation
PDF
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
いわき情報技術研究会20170513
A Good Software Product What and How - Part2
Monitoria@reversim
Week5
Pakistan Advertiser's Society- Digit12 Report
Webinar - High Availability and Distributed Monitoring with Icinga2
2018 alldaydevops presentation
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019

Similar to Monitor all the thingz slideshare (20)

PPTX
Monitoria@Icinga camp berlin
PPTX
Industrialiser spark
PPTX
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
PDF
Mozilla Foundation Metrics - presentation to engineers
PDF
Monitoring in the cloud with Puppet
PDF
Open Source Monitoring in 2019
PDF
Open Source Monitoring in 2015
PDF
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
PDF
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
PDF
Predicting Startup Market Trends based on the news and social media - Albert ...
PDF
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
ODP
Microservices - the lean way
ODP
From MonitoringSucks to Monitoring Love , 2016 Edition
PDF
QCon'17 talk: CI/CD at scale - lessons from LinkedIn and Mockito
PDF
Altic's big analytics stack, Charly Clairmont, Altic.
 
PDF
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
PDF
stackconf 2022: The Role of GitOps in IT Strategy
PDF
The Role of GitOps in IT-Strategy v2 - July 2022 - Schlomo Schapiro
PDF
Processing Twitter Stream with Oracle Event Processing (OEP)
PDF
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoria@Icinga camp berlin
Industrialiser spark
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Mozilla Foundation Metrics - presentation to engineers
Monitoring in the cloud with Puppet
Open Source Monitoring in 2019
Open Source Monitoring in 2015
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
Predicting Startup Market Trends based on the news and social media - Albert ...
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
Microservices - the lean way
From MonitoringSucks to Monitoring Love , 2016 Edition
QCon'17 talk: CI/CD at scale - lessons from LinkedIn and Mockito
Altic's big analytics stack, Charly Clairmont, Altic.
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
stackconf 2022: The Role of GitOps in IT Strategy
The Role of GitOps in IT-Strategy v2 - July 2022 - Schlomo Schapiro
Processing Twitter Stream with Oracle Event Processing (OEP)
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Ad

More from Soluto (20)

PPTX
Solving trust issues at scale - AppSec California
PPTX
Solving trust issues at scale
PPTX
Things I wish someone had told me about Istio, Omer Levi Hevroni
PPTX
Can Kubernetes Keep a Secret? - Women in AppSec Webinar
PPTX
FTRD - Can Kubernetes Keep a Secret?
PPTX
The Dark Side of Monitoring
PPTX
Hacking like a FED
PPTX
Can Kubernetes Keep a Secret?
PPTX
Kamus intro
PPTX
Secure Your Pipeline
PDF
React new features and intro to Hooks
PPTX
Secure the Pipeline - OWASP Poland Day 2018
PPTX
Languages don't matter anymore!
PPTX
Security Testing for Containerized Applications
PPTX
Owasp glue
PPTX
Unify logz with fluentd
PPTX
Storing data in Redis like a pro
PPTX
Authentication without Authentication - AppSec California
PPTX
Authentication without Authentication - Peerlyst meetup
PPTX
Security Testing with Zap
Solving trust issues at scale - AppSec California
Solving trust issues at scale
Things I wish someone had told me about Istio, Omer Levi Hevroni
Can Kubernetes Keep a Secret? - Women in AppSec Webinar
FTRD - Can Kubernetes Keep a Secret?
The Dark Side of Monitoring
Hacking like a FED
Can Kubernetes Keep a Secret?
Kamus intro
Secure Your Pipeline
React new features and intro to Hooks
Secure the Pipeline - OWASP Poland Day 2018
Languages don't matter anymore!
Security Testing for Containerized Applications
Owasp glue
Unify logz with fluentd
Storing data in Redis like a pro
Authentication without Authentication - AppSec California
Authentication without Authentication - Peerlyst meetup
Security Testing with Zap
Ad

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Spectroscopy.pptx food analysis technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Spectroscopy.pptx food analysis technology

Monitor all the thingz slideshare