SlideShare a Scribd company logo
Principles of Observability
Jānis Orlovs
Riga DevOPS Meetup
22th November, 2017
About Me
• I come from OPS side
• For me most interesting part is
to make operations boring
• Mostly have been working in
Financial Services
• 4finance
• Swedbank
• KPMG
• T
• Playing basketball in amateur
level for ages
Complexity of Systems
Level of System Distribution
simple monolith
modular monolith
complex modular monolith or microservices
Complexity
Failures of Complex Systems
• Complex systems are
intrinsically hazardous systems
• Catastrophe is always just
around the corner
• All practitioner actions are
gambles.
Paper URL:
https://goo.gl/sTvJw8
Failures as Mysteries
https://twitter.com/honest_update
Monitoring System
Monitoring system complexes should address two questions:
what’s broken, and why? ...
“What” versus “why” is one of the most important distinctions in
writing good monitoring with maximum signal and minimum
noise
Source: Service Reliability Engineering Book
General Monitoring House Rules
• Metrics and Checks that catch real incidents most often should be as
simple, predictable, and reliable as possible.
• Data collection, aggregation, and alerting configuration that is rarely
exercised should be up for removal.
• Signals that are collected, but not exposed in any prebaked dashboard
nor used by any alert, are candidates for removal.
Monitoring Approaches
Principles of System Observability
Blackbox Approach: Checks Monitoring
• Checks, not metrics.
• Simple, yes/no questions.
• First generation of monitoring systems
• Not suitable what’s actually happening under the hood, without
guessing
Whitebox Monitoring
Whitebox Approach: Metrics Monitoring
• Addreses known failure vectors.
• There is needed to be developed instrumentation for exposing
data to monitoring
• Proper monitoring is mixture technical data with business data
• Too much monitoring is noise
Principles of System Observability
Whitebox Approach: Logging
• Valuable insigth: place where starts are investigations
• View of Request
• View of System
• Easy to collect data, from data points.
• Plain text
• Structured
• Binary
• LogAll vs LogActionalbe data
• Data sets bloats, large scale ingestions of data tricky
Principles of System Observability
Tracing
• Most challenging part to implement from historical point-of-
view
• Tracing captures the lifetime of requests as they flow through
the various components of a distributed system
• Recent developments in tracing tools gives brigth look in future:
• Dtrace and BFP framework
• OpenTracing: http://opentracing.io/
Principles of System Observability
Observability
In control theory, observability is a measure of how well internal states
of a system can be inferred from knowledge of its external outputs. The
observability and controllability of a system are mathematical duals.
Source: Wikipedia
Choosing Rigth Observability Tools
Principles of System Observability
Privacy and Observability
• Starting 25th May, 2018 EU personal data protection directive or
GDPR will be fully in place.
• Drastic accountability measures:
• Up to 10m EUR or 2% global turnover for the first audit fail
• Up to 20m EUR or 4% global turnover for the second audit fail
• Observability tools are silent huge personal data collectors
• Include in your Company’s data protection Sscope or anonymize
data
Conclusions
• Reliability of systems makes money (not loosing it)
• In distributed systems all teams involved in systems
development has to commit to making systems observable
• For one type of tasks choose one tool
• Review what data you collect, visualize your data
• Pick your own Observability target based on the requirements
of your service.
Principles of Observability
Jānis Orlovs
Riga DevOPS Meetup
22th November, 2017

More Related Content

PDF
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
PPTX
"Introduction to FinOps" – Greg VanderWel at Chicago AWS user group
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PDF
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
PDF
Improve Monitoring and Observability for Kubernetes with OSS tools
PDF
Observability
PDF
Combining Logs, Metrics, and Traces for Unified Observability
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
"Introduction to FinOps" – Greg VanderWel at Chicago AWS user group
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Improve Monitoring and Observability for Kubernetes with OSS tools
Observability
Combining Logs, Metrics, and Traces for Unified Observability
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

What's hot (20)

PDF
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
PDF
Infrastructure & System Monitoring using Prometheus
PDF
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
PDF
Amazon EKS로 간단한 웹 애플리케이션 구축하기 - 김주영 (AWS) :: AWS Community Day Online 2021
PDF
Security on AWS :: 이경수 솔루션즈아키텍트
PDF
클라우드 비용, 어떻게 줄일 수 있을까? - 구본민, AWS 클라우드 파이넌셜 매니저 :: AWS Builders 100
PDF
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
PDF
AWS기반 서버리스 데이터레이크 구축하기 - 김진웅 (SK C&C) :: AWS Community Day 2020
PDF
Combining logs, metrics, and traces for unified observability
PDF
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
PDF
Terraform: An Overview & Introduction
PPTX
Introduction to kubernetes
PDF
ksqlDB로 시작하는 스트림 프로세싱
PDF
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
PDF
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
PPTX
Elastic search overview
PDF
Chicago FinOps Meet-Up - 11.19.2019
PPTX
AWS CloudFront 가속 및 DDoS 방어
PDF
AWS Fargate on EKS 실전 사용하기
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
Infrastructure & System Monitoring using Prometheus
AWS 고객이 주로 겪는 운영 이슈에 대한 해법-AWS Summit Seoul 2017
Amazon EKS로 간단한 웹 애플리케이션 구축하기 - 김주영 (AWS) :: AWS Community Day Online 2021
Security on AWS :: 이경수 솔루션즈아키텍트
클라우드 비용, 어떻게 줄일 수 있을까? - 구본민, AWS 클라우드 파이넌셜 매니저 :: AWS Builders 100
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
AWS기반 서버리스 데이터레이크 구축하기 - 김진웅 (SK C&C) :: AWS Community Day 2020
Combining logs, metrics, and traces for unified observability
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
Terraform: An Overview & Introduction
Introduction to kubernetes
ksqlDB로 시작하는 스트림 프로세싱
데이터 분석플랫폼을 위한 데이터 전처리부터 시각화까지 한번에 보기 - 노인철 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Elastic search overview
Chicago FinOps Meet-Up - 11.19.2019
AWS CloudFront 가속 및 DDoS 방어
AWS Fargate on EKS 실전 사용하기
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Ad

Similar to Principles of System Observability (20)

PPTX
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
PPTX
What is Platform Observability? An Overview
PPTX
DockerCon SF 2019 - TDD is Dead
PDF
RedisConf17 - Observability and the Glorious Future
PDF
Short Data Rules for Observability.pdf
PPTX
From SLO to GOTY
PDF
Observability at Scale
PDF
Achieving observability-in-modern-applications
PDF
stackconf 2025 | Evolving Shift Left: Integrating Observability into Modern S...
PDF
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
PDF
Demystifying observability
PDF
DevOps Observability & Monitoring_ Ultimate Guide.pdf
PDF
beginners-guide-to-observability.pdf
PPTX
Solving the Hidden Costs of Kubernetes with Observability
PDF
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
PPTX
Distributed Tracing: New DevOps Foundation
PDF
Final observability starts_with_data
PDF
Observability: Beyond the Three Pillars with Spring
PDF
A Comprehensive Look at Application Observability_ What it is and Why it Matt...
PPTX
Deden Fathurahman - Observability Within Your DevOps Environment
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
What is Platform Observability? An Overview
DockerCon SF 2019 - TDD is Dead
RedisConf17 - Observability and the Glorious Future
Short Data Rules for Observability.pdf
From SLO to GOTY
Observability at Scale
Achieving observability-in-modern-applications
stackconf 2025 | Evolving Shift Left: Integrating Observability into Modern S...
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
Demystifying observability
DevOps Observability & Monitoring_ Ultimate Guide.pdf
beginners-guide-to-observability.pdf
Solving the Hidden Costs of Kubernetes with Observability
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
Distributed Tracing: New DevOps Foundation
Final observability starts_with_data
Observability: Beyond the Three Pillars with Spring
A Comprehensive Look at Application Observability_ What it is and Why it Matt...
Deden Fathurahman - Observability Within Your DevOps Environment
Ad

Recently uploaded (20)

PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced IT Governance
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
GamePlan Trading System Review: Professional Trader's Honest Take
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Advanced Soft Computing BINUS July 2025.pdf
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
Advanced IT Governance
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Principles of System Observability

  • 1. Principles of Observability Jānis Orlovs Riga DevOPS Meetup 22th November, 2017
  • 2. About Me • I come from OPS side • For me most interesting part is to make operations boring • Mostly have been working in Financial Services • 4finance • Swedbank • KPMG • T • Playing basketball in amateur level for ages
  • 3. Complexity of Systems Level of System Distribution simple monolith modular monolith complex modular monolith or microservices Complexity
  • 4. Failures of Complex Systems • Complex systems are intrinsically hazardous systems • Catastrophe is always just around the corner • All practitioner actions are gambles. Paper URL: https://goo.gl/sTvJw8
  • 6. Monitoring System Monitoring system complexes should address two questions: what’s broken, and why? ... “What” versus “why” is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise Source: Service Reliability Engineering Book
  • 7. General Monitoring House Rules • Metrics and Checks that catch real incidents most often should be as simple, predictable, and reliable as possible. • Data collection, aggregation, and alerting configuration that is rarely exercised should be up for removal. • Signals that are collected, but not exposed in any prebaked dashboard nor used by any alert, are candidates for removal.
  • 10. Blackbox Approach: Checks Monitoring • Checks, not metrics. • Simple, yes/no questions. • First generation of monitoring systems • Not suitable what’s actually happening under the hood, without guessing
  • 12. Whitebox Approach: Metrics Monitoring • Addreses known failure vectors. • There is needed to be developed instrumentation for exposing data to monitoring • Proper monitoring is mixture technical data with business data • Too much monitoring is noise
  • 14. Whitebox Approach: Logging • Valuable insigth: place where starts are investigations • View of Request • View of System • Easy to collect data, from data points. • Plain text • Structured • Binary • LogAll vs LogActionalbe data • Data sets bloats, large scale ingestions of data tricky
  • 16. Tracing • Most challenging part to implement from historical point-of- view • Tracing captures the lifetime of requests as they flow through the various components of a distributed system • Recent developments in tracing tools gives brigth look in future: • Dtrace and BFP framework • OpenTracing: http://opentracing.io/
  • 18. Observability In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals. Source: Wikipedia
  • 21. Privacy and Observability • Starting 25th May, 2018 EU personal data protection directive or GDPR will be fully in place. • Drastic accountability measures: • Up to 10m EUR or 2% global turnover for the first audit fail • Up to 20m EUR or 4% global turnover for the second audit fail • Observability tools are silent huge personal data collectors • Include in your Company’s data protection Sscope or anonymize data
  • 22. Conclusions • Reliability of systems makes money (not loosing it) • In distributed systems all teams involved in systems development has to commit to making systems observable • For one type of tasks choose one tool • Review what data you collect, visualize your data • Pick your own Observability target based on the requirements of your service.
  • 23. Principles of Observability Jānis Orlovs Riga DevOPS Meetup 22th November, 2017