SlideShare a Scribd company logo
GLOBAL SOFTWARE CONSULTANCY
- By Balvinder Khurana & Sarang Shinde
1
© 2020 ThoughtWorks
Observability
In Real time at Scale
GLOBAL SOFTWARE CONSULTANCY
- By Balvinder Khurana & Sarang Shinde
2
© 2020 ThoughtWorks
3
1. What is Observability
2. How it is different from Monitoring
3. Why do we need Observability
4. Why out of the box tools are not solving all the problems
5. Our approach to Observability
6. What is real time Observability
7. How to handle Scale
8. Design principles and Gotchas
9. Questions
© 2020 ThoughtWorks
Agenda
Discovering the information about System and User Behaviour
by looking at its inputs and outputs
that leads to Customer and Business Impact
4
© 2020 ThoughtWorks
The KNOWN Zone
The WEIRD Zone
Monitoring
Observability
Anticipated
Unanticipated
Anticipated,
Unexpected
5
© 2020 ThoughtWorks
Monitoring vs Observability
Discovering the information about System and User Behaviour
by looking at its inputs and outputs
that leads to Customer and Business Impact
6
© 2020 ThoughtWorks
The Evolution
© 2020 ThoughtWorks
7
The Evolution
© 2020 ThoughtWorks
8
Explosion of Data Sources
© 2020 ThoughtWorks
System
Customer
Support
System
Level
Metrics
User Click
Stream
Market
Data
Social
Media
Other
Sources
Logs
Mobile
Application
Events
9
Explosion of Personas & Explosion of Requirements
Deployment status
Load on a service
Downtime for
Service
Developers
Service Availability
Service Traceability
Routing
Security Team
Revenue Generated
Price Sensitivity
Number of Users
Business
Data Scientists
© 2020 ThoughtWorks
Observability (as we know) is not a Silver Bullet
● Unified architectures (of consuming platform) are not possible
● Siloed and locked useful data into various tools
● Learn individual tools
● Difficult to Correlate Quickly for Monitoring or finding Business relevance
● Limited scope for exploration
● Who is focussed on Customer happiness
● Do we want 99.99% availability but still pissed off customer
© 2020 ThoughtWorks
11
We need to RETHINK Observability
Platform which enables people to use their skills,
extend their senses, support their intuitions.
Make it quick and easy to explore a hypothesis
(business or technical), accept or disprove it, and move
on to find the root cause
12
© 2020 ThoughtWorks
12
Make it Self Service
13
© 2020 ThoughtWorks
● Data/Access Patterns
● ETL/EL(T)
● Lambda/Kappa
● Raw layer
● Compaction
● Partitioning
Questions to ask
© 2020 ThoughtWorks
15
Technical Architecture
Application
Data (Kafka)
Infrastructure
(prometheus)
Each Customer
Journey
(Jaeger)
Request
Routing
(ISTIO)
Retries (DLQ) LOGS (EFK)
Clickstream
(e.g. GTM)
Data Sources
Real Time/Interactive Query
Engine
Real Time (Historical Dashboard)
Historical Data Processing
Data Lake
Observability Platform
Raw Layer CDM/Staging Layer
© 2020 ThoughtWorks
Schema
Registry
Data
Encryption
Configurations
Stream data
producer
Real Time
events
16
© 2020 ThoughtWorks
Points to consider while building
● Designing messaging systems
● Schema registry for versioning
● Contract testing for consumers
● Reprocessing of data in failures
● Track all the metadata
● Enforce the retention policy
● Do not make exceptions
● Build it incrementally
● CI/CD
● Alerting
Observability in Real Time
© 2020 ThoughtWorks
Dev-Ops
- Monitor
Production
Systems
- Debug Errors
Analysts
- Micro Trends
- Anomalies
End Users
- Fraud
Detection
< 5 min
Stream
Processing
< 1 sec
Database
End Users
- OLTP
Systems
< 1 hour
Incremental
Mini-batch Processing
Business Users
- Run ad hoc reports
- KPI monitoring
- Functionality Validation
Analysts / Data Scientists
- Feature extraction
- Model validations
- Quick hypothesis validation
17
CXO
- Weekly/Monthly report
- Baseline Measures
ML Engineers
- Experimentation
- Training models
Batch Processing
Ingestion
How to ingest data at scale ?
Data Lake
Storage and Processing
How to store and process data at scale?
Visualization
How to query and visualise
data at scale ?
Observability at Scale
© 2020 ThoughtWorks
18
19
Messaging Technologies
Streaming Technologies
Querying Tools
Reporting Tools
© 2020 ThoughtWorks
Observability in real time at scale
21
© 2020 ThoughtWorks
Making Observability a success
● Upfront capacity planning
● Standardization using events
● Configurable datasource onboarding
● Clarity on KPIs, Metrices
● Not fixating on visualization
● Metadata management - Business, Technical
and Operational
● Collaboration / Promote your work
● Teach Basic Skills
● Measure Usage
● Track Need
Thanks !
Questions ?
22
© 2020 ThoughtWorks

More Related Content

PPTX
Real time insights for better products, customer experience and resilient pla...
PPTX
Observability in real time at scale
PDF
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
PDF
Monitoring modern applications using Elastic
PDF
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
PDF
Microservices And Fast Data: Industry And Architecture Trends [with 451 Resea...
PPTX
Dsc 2021 presentation_radovan_bacovic
PDF
AI Data Acquisition and Governance: Considerations for Success
Real time insights for better products, customer experience and resilient pla...
Observability in real time at scale
Ready for Fast Data: How Lightbend Enables Teams To Build Real-Time, Streamin...
Monitoring modern applications using Elastic
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Microservices And Fast Data: Industry And Architecture Trends [with 451 Resea...
Dsc 2021 presentation_radovan_bacovic
AI Data Acquisition and Governance: Considerations for Success

What's hot (20)

PPTX
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
PDF
Why the database is at the heart of DevOps success
PDF
Flash session -streaming--ses1243-lon
PDF
Building data "Py-pelines"
PDF
Opening Keynote: Why Elastic?
PDF
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
PDF
Webinar - Design Thinking for Platform Engineering
PDF
An Ounce of Prevention: Forging Healthy BI
PDF
Using Elastic @ Elastic: InfoSec and Elastic Security
PDF
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
PDF
Reducing Technology Risks Through Prototyping
PDF
Elasticsearch: From development to production in 15 minutes
PPTX
Self Service Reporting & Analytics For an Enterprise
PDF
Cloud Modernization and Data as a Service Option
PDF
Brokering Data: Accelerating Data Evaluation with Databricks White Label
PDF
Modern Data Management for Federal Modernization
PDF
Embedding Insight through Prediction Driven Logistics
PDF
Commercializing Alternative Data
PDF
Architecting for Real-Time Big Data Analytics
PDF
基調講演:より優れた、高速で簡単な検索
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Why the database is at the heart of DevOps success
Flash session -streaming--ses1243-lon
Building data "Py-pelines"
Opening Keynote: Why Elastic?
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Webinar - Design Thinking for Platform Engineering
An Ounce of Prevention: Forging Healthy BI
Using Elastic @ Elastic: InfoSec and Elastic Security
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Reducing Technology Risks Through Prototyping
Elasticsearch: From development to production in 15 minutes
Self Service Reporting & Analytics For an Enterprise
Cloud Modernization and Data as a Service Option
Brokering Data: Accelerating Data Evaluation with Databricks White Label
Modern Data Management for Federal Modernization
Embedding Insight through Prediction Driven Logistics
Commercializing Alternative Data
Architecting for Real-Time Big Data Analytics
基調講演:より優れた、高速で簡単な検索
Ad

Similar to Observability in real time at scale (20)

PPTX
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
PPTX
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
PDF
beginners-guide-to-observability.pdf
PDF
Observable Microservices (O'Reilly SACon London 2018)
PDF
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
PPTX
Modus Operandi
PDF
Time Difference: How Tomorrow's Companies Will Outpace Today's
PPTX
Observability in serverless solutions
PPTX
A Journey to build an Modern AI Platform.pptx
PDF
2023 Top 10 Trends for Business, Data and Analysis
PDF
2023 Top 10 Trends for Business, Data and Analysis
PDF
Observable microservices (O'Reilly SACon NY 2018)
PDF
2023 Top 10 Trends for Business, Data and Analysis
PDF
Observability: Challenges, Priorities, Solutions, and the Role of OpenTelemetry
PDF
Final observability starts_with_data
PDF
Engineering Effectiveness - why the time is now - Max Griffiths
PPTX
Do You Really Need to Evolve From Monitoring to Observability?
PDF
Building a resilient business model @ HITEC Webinar
PDF
Why Modern Systems Require a New Approach to Observability
PPTX
Observability – the good, the bad, and the ugly
Migrating Monitoring to Observability – How to Transform DevOps from being Re...
Agile Mumbai 2022 - Balvinder Kaur & Sushant Joshi | Real-Time Insights and A...
beginners-guide-to-observability.pdf
Observable Microservices (O'Reilly SACon London 2018)
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
Modus Operandi
Time Difference: How Tomorrow's Companies Will Outpace Today's
Observability in serverless solutions
A Journey to build an Modern AI Platform.pptx
2023 Top 10 Trends for Business, Data and Analysis
2023 Top 10 Trends for Business, Data and Analysis
Observable microservices (O'Reilly SACon NY 2018)
2023 Top 10 Trends for Business, Data and Analysis
Observability: Challenges, Priorities, Solutions, and the Role of OpenTelemetry
Final observability starts_with_data
Engineering Effectiveness - why the time is now - Max Griffiths
Do You Really Need to Evolve From Monitoring to Observability?
Building a resilient business model @ HITEC Webinar
Why Modern Systems Require a New Approach to Observability
Observability – the good, the bad, and the ugly
Ad

Recently uploaded (20)

PDF
Introduction to the R Programming Language
PPTX
IMPACT OF LANDSLIDE.....................
DOCX
Factor Analysis Word Document Presentation
PPTX
A Complete Guide to Streamlining Business Processes
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Global Data and Analytics Market Outlook Report
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
[EN] Industrial Machine Downtime Prediction
Introduction to the R Programming Language
IMPACT OF LANDSLIDE.....................
Factor Analysis Word Document Presentation
A Complete Guide to Streamlining Business Processes
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
ISS -ESG Data flows What is ESG and HowHow
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Global Data and Analytics Market Outlook Report
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Pilar Kemerdekaan dan Identi Bangsa.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
[EN] Industrial Machine Downtime Prediction

Observability in real time at scale

  • 1. GLOBAL SOFTWARE CONSULTANCY - By Balvinder Khurana & Sarang Shinde 1 © 2020 ThoughtWorks
  • 2. Observability In Real time at Scale GLOBAL SOFTWARE CONSULTANCY - By Balvinder Khurana & Sarang Shinde 2 © 2020 ThoughtWorks
  • 3. 3 1. What is Observability 2. How it is different from Monitoring 3. Why do we need Observability 4. Why out of the box tools are not solving all the problems 5. Our approach to Observability 6. What is real time Observability 7. How to handle Scale 8. Design principles and Gotchas 9. Questions © 2020 ThoughtWorks Agenda
  • 4. Discovering the information about System and User Behaviour by looking at its inputs and outputs that leads to Customer and Business Impact 4 © 2020 ThoughtWorks
  • 5. The KNOWN Zone The WEIRD Zone Monitoring Observability Anticipated Unanticipated Anticipated, Unexpected 5 © 2020 ThoughtWorks Monitoring vs Observability
  • 6. Discovering the information about System and User Behaviour by looking at its inputs and outputs that leads to Customer and Business Impact 6 © 2020 ThoughtWorks
  • 7. The Evolution © 2020 ThoughtWorks 7
  • 8. The Evolution © 2020 ThoughtWorks 8
  • 9. Explosion of Data Sources © 2020 ThoughtWorks System Customer Support System Level Metrics User Click Stream Market Data Social Media Other Sources Logs Mobile Application Events 9
  • 10. Explosion of Personas & Explosion of Requirements Deployment status Load on a service Downtime for Service Developers Service Availability Service Traceability Routing Security Team Revenue Generated Price Sensitivity Number of Users Business Data Scientists © 2020 ThoughtWorks
  • 11. Observability (as we know) is not a Silver Bullet ● Unified architectures (of consuming platform) are not possible ● Siloed and locked useful data into various tools ● Learn individual tools ● Difficult to Correlate Quickly for Monitoring or finding Business relevance ● Limited scope for exploration ● Who is focussed on Customer happiness ● Do we want 99.99% availability but still pissed off customer © 2020 ThoughtWorks 11 We need to RETHINK Observability
  • 12. Platform which enables people to use their skills, extend their senses, support their intuitions. Make it quick and easy to explore a hypothesis (business or technical), accept or disprove it, and move on to find the root cause 12 © 2020 ThoughtWorks 12
  • 13. Make it Self Service 13 © 2020 ThoughtWorks
  • 14. ● Data/Access Patterns ● ETL/EL(T) ● Lambda/Kappa ● Raw layer ● Compaction ● Partitioning Questions to ask © 2020 ThoughtWorks
  • 15. 15 Technical Architecture Application Data (Kafka) Infrastructure (prometheus) Each Customer Journey (Jaeger) Request Routing (ISTIO) Retries (DLQ) LOGS (EFK) Clickstream (e.g. GTM) Data Sources Real Time/Interactive Query Engine Real Time (Historical Dashboard) Historical Data Processing Data Lake Observability Platform Raw Layer CDM/Staging Layer © 2020 ThoughtWorks Schema Registry Data Encryption Configurations Stream data producer Real Time events
  • 16. 16 © 2020 ThoughtWorks Points to consider while building ● Designing messaging systems ● Schema registry for versioning ● Contract testing for consumers ● Reprocessing of data in failures ● Track all the metadata ● Enforce the retention policy ● Do not make exceptions ● Build it incrementally ● CI/CD ● Alerting
  • 17. Observability in Real Time © 2020 ThoughtWorks Dev-Ops - Monitor Production Systems - Debug Errors Analysts - Micro Trends - Anomalies End Users - Fraud Detection < 5 min Stream Processing < 1 sec Database End Users - OLTP Systems < 1 hour Incremental Mini-batch Processing Business Users - Run ad hoc reports - KPI monitoring - Functionality Validation Analysts / Data Scientists - Feature extraction - Model validations - Quick hypothesis validation 17 CXO - Weekly/Monthly report - Baseline Measures ML Engineers - Experimentation - Training models Batch Processing
  • 18. Ingestion How to ingest data at scale ? Data Lake Storage and Processing How to store and process data at scale? Visualization How to query and visualise data at scale ? Observability at Scale © 2020 ThoughtWorks 18
  • 19. 19 Messaging Technologies Streaming Technologies Querying Tools Reporting Tools © 2020 ThoughtWorks
  • 21. 21 © 2020 ThoughtWorks Making Observability a success ● Upfront capacity planning ● Standardization using events ● Configurable datasource onboarding ● Clarity on KPIs, Metrices ● Not fixating on visualization ● Metadata management - Business, Technical and Operational ● Collaboration / Promote your work ● Teach Basic Skills ● Measure Usage ● Track Need
  • 22. Thanks ! Questions ? 22 © 2020 ThoughtWorks