SlideShare a Scribd company logo
>>> Realtime Monitoring
Storing 2TB of logs a day in
Elasticsearch
@aliostad
Ali Kheyrollahi, ASOS
@aliostad
@aliostad
The joy of hitting F5
@aliostad
The joy of a single process
@aliostad
The joy of a having a
production-size database
locally
@aliostad
The joy of having a
dev machine build
running all services
@aliostad
/// What if your systems
is “Microservices”?
@aliostad
@aliostad
- +40 platform teams
- 1oos of microservices
- some services >10k rps
@aliostad
> stackoverflow
> ÂŁ1.5 bln
global fashion
destination
> 35% year-on-year
@aliostad
/// elements of
observability
@aliostad
/// observability
>>> Control Theory
“a measure for how well internal states of a
system can be inferred by knowledge of its
external outputs”
@aliostad
Logging
Telemetry
Tracing
/// mixed concerns
@aliostad
Tracing
Logging
Credit: Peter Bourgon
Events
Aggregations
Request Scope
Telemetry
Alerting
/// Scope
@aliostad
Logging Telemetry Tracing Alerting
Log4Net ✓
Time-series DBs ✓ ✓ ✓
Zipkin ✓ ✓ ✓ ✓
Prometheus ✓ ✓ ✓
Elasticsearch ✓ ✓ ✓ ✓
New Relic* ✓ ✓ ✓
Circonus* ✓ ✓ ✓
* paid services
/// comparison
@aliostad
1
2
3
At source (perf counters)
At the storage (Circonus)
In the visualisation tool (Kibana)
/// aggregations
4 In the pipeline (Riemann)
@aliostad
/// data
sources
@aliostad
/// use cases
‱ Metrics (Visualisation)
‱ CPU, number of errors
‱ Response time percentiles
‱ Full-text search capability (logs and errors)
‱ Correlating across services
‱ Alerting when there is an SLO breach
@aliostad
/// azure logs
‱ Azure Diagnostics (WADLogs table)
‱ IIS logs
‱ VM Windows Event Logs
‱ Performance Counters (standard + custom)
@aliostad
/// application logs
Microservice
ETW
SLAB Azure
Table Sink
ETW
Application
Logs
EC
e.g. CRIT_ORD_API_DatabaseDown
@aliostad
/// instrumentation logs
Microservice Perf Counters
SLAB Azure
Table Sink
ETW
Instrumentation
Logs
Azure Performance
Counter Logs
PerfIt
Azure
Agent
@aliostad
/// ingest
process
@aliostad
/// pull vs push
@aliostad
/// logstash
QUEUE
VM
Logstash
collectd
syslog
Logstash
app logs
nginx
To
Elasticsearch
UDP
File-tailing
@aliostad
/// ConveyorBelt
Performance
Counters
ConveyorBelt
Azure
WAD logs
ETW Logs
Elasticsearch
Instrumentation
Logs
IIS Logs
Woodpecker
Outputs (Pull Logs)
Sources Config
Up to 2TB/day
@aliostad
/// ConveyorBelt
Source
Source
Source ConïŹg
Scheduler
Parser
units of
work
To Elasticsearch
Source
Actor
Actor
Actor
@aliostad
/// Woodpecker
Source
Source
Source ConïŹg
Pull
Telemetry
record
Azure
Table
(Regular
Intervals)
Source
@aliostad
/// elasticsearch
intro
@aliostad
/// elasticsearch
‱ Linearly-scalable and HA* search (and visualisation)
‱ ELK Stack
‱ Open Source (enterprise features require license)
‱ Speaks JSON
‱ REST API and very developer-friendly
@aliostad
/// cluster
‱ Cluster: No ZK
‱ Gossip / discovery
‱ Node type:
‱ master - leader election
‱ data
‱ client
@aliostad
/// data hierarchy
‱ Index
‱ Shard
‱ Replica
‱ Type/Mapping
‱ Document: JSON, immutable,‹
versioned
INDEX
MAPPING
MAPPING
MAPPING


Document
Document
Document


@aliostad
/// data types
‱ JSON data types: bool, long, ïŹ‚oat, string*, datetime
‱ Array?
‱ String tokenisation/analysers
‱ Best of both world?
‱ Object
‱ nested
{
“a”: {
“b” : {
“c”: 42
}
}
}
@aliostad
/// doc operations
‱ Upsert
‱ Delete
‱ Partial Update
‱ Search (JSON-based query DSL)
@aliostad
/// DEMO 1
@aliostad
/// more
advanced
@aliostad
/// index shard/replica
Write Read Master
Shard Shard
Replica Replica
Index Index
@aliostad
/// index
‱ Daily indices
‱ Hot/Cold with index alias
‱ Creation => templates
‱ Settings:
‱ refresh_interval
@aliostad
/// mapping/type
‱ Schema
‱ How many mappings per index?
‱ Dynamic mapping
‱ Operations
‱ Upsert
‱ Delete
@aliostad
/// templates
PUT https://es_cluster:9200/_template/my_template
{
“template”: “my_index_*”,
“settings”: {
}
“mappings”: {
“mapping_1”: {
},
“mapping_2”: {
}
}
}
@aliostad
/// bulk api
‱ Always use Bulk API to index documents
‱ Batches of 1K-5K documents
‱ Watch-out for error 429 and back-oïŹ€ pattern
‱ Check bulk rejects [change bulk queue length]
@aliostad
/// DEMO 2
@aliostad
/// physical
architecture
@aliostad
/// resources
node type
‱ Data: Disk, RAM, CPU, Network
‱ Master: CPU, Network, (RAM)
‱ Client: Network, CPU, (RAM)
‱ Kibana: CPU, Network, RAM
@aliostad
/// simple
data/master/kibana
‱ CPU
‱ RAM
‱ Disk
‱ Network
@aliostad
/// next level
d a t a / m a s t e r
kibana
@aliostad
c l i e n t
k i b a n a
m a s t e r
d a t a
trafïŹc
trafïŹc
@aliostad
3x client
2x kibana
20x data (hot)
trafïŹc
trafïŹc
10x data (warm)
/// our setup
3x master
ARM
Template
Desire State
ConïŹguration
@aliostad
/// hot/warm
‱ Hot => CPU, Warm => Memory
‱ Index Allocation/Routing
‱ At the index:‹
"index.routing.allocation.require.box_type" : "warm"
‱ At the node (elasticsearch.yml)‹
box_type: warm
@aliostad
/// security
‱ x-pack: SSL + username/password security (basic,
Kerberos)
‱ No Federated Authentication
‱ Proxy (nginx, apache, etc)
‱ IP-whitelisting
@aliostad
/// administration
‱ Like all: logs, slow query logs, etc
‱ top, htop, iostat
‱ collectd + local logstash
‱ two clusters, each watching the other
‱ curator for hot/cold and deleting old indices
@aliostad
/// tracing
@aliostad
/// ActivityId
Microservice
Id
IdId Thread Local Storage
Id
To Other APIs
Id
Event
@aliostad
/// alerting
@aliostad
/// watcher
‱ Trigger
‱ Input
‱ Condition
‱ Action
@aliostad
/// watcher notes
‱ All watches get executed on the active master
‱ Use Action Throttling to limit alerts
‱ Use watch templates when you see common patterns
‱ Use transforms and metadata to include context in
actions/emails
@aliostad
/// lessons
learnt
@aliostad
/// Do you speak CAP?
‱ Consistency? Treat all data dispensable. Back up data
that gets mastered in Elasticsearch. Not a document db.
‱ Highly-Available? For >99.9% availability use redundancy
‱ Partition-Intolerance? Node intercommunication highly
chatty, ideally keep in the same data centre and even in
the same VPC (aws)/VNet (azure)
@aliostad
/// Beware
‱ Split brain common
‱ Data corruption possible
‱ Backup data that gets mastered in ES (kibana indices)
‱ It seems safest High Availability is redundancy
(expensive)
@aliostad
Thank you :)
Questions
?
@aliostad
Credits
‱ Picture: Embroidery thread macro - https://www.ïŹ‚ickr.com/photos/39908901@N06/
‱ Picture: Calculate Red - https://www.ïŹ‚ickr.com/photos/93277085@N08/10398245145
‱ Picture: 1950's wristwatch workings - https://www.ïŹ‚ickr.com/photos/
134832191@N08/27301612554/
‱ Picture: Tokyo Tower_58 - https://www.ïŹ‚ickr.com/photos/ajari/2756645901
‱ Picture: 1Bamboo and Rust - https://www.ïŹ‚ickr.com/photos/hammershaug/5816522126/
‱ Picture: IMG_1899 - https://www.ïŹ‚ickr.com/photos/johnas/9650255412/
‱ Picture: fan2 https://www.ïŹ‚ickr.com/photos/sidelong/444054290/
‱ Picture: Glass jar ïŹlled with pasta https://www.ïŹ‚ickr.com/photos/76588981@N02/16766079567/
‱ Picture: Rusty cogs https://www.ïŹ‚ickr.com/photos/paperpariah/25375888671/
‱ Picture: Do you see the world in different colours? https://www.ïŹ‚ickr.com/photos/luopl/6012467435/
‱ Picture: danger https://www.ïŹ‚ickr.com/photos/armydre2008/9650951334/
‱ Link: ETW equivalent for Linux http://blogs.microsoft.co.il/sasha/2017/04/02/tracing-net-core-on-linux-
with-usdt-and-bcc/

More Related Content

PDF
5 must have patterns for your microservice - techorama
PDF
Buildstuff - what do you need to know about RPC comeback
PDF
Deep learning for developers - oredev
PPTX
The ELK Stack - Get to Know Logs
PDF
Search Analytics with ELK (Elastic Stack)
 
PDF
Elastic Data Analytics Platform @Datadog
PDF
Logging with Elasticsearch, Logstash & Kibana
PDF
Elk devops
 
5 must have patterns for your microservice - techorama
Buildstuff - what do you need to know about RPC comeback
Deep learning for developers - oredev
The ELK Stack - Get to Know Logs
Search Analytics with ELK (Elastic Stack)
 
Elastic Data Analytics Platform @Datadog
Logging with Elasticsearch, Logstash & Kibana
Elk devops
 

What's hot (20)

PDF
Elastic{ON} 2016 Review - êč€ìą…ëŻŒ 님
PDF
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
PPTX
More kibana
PDF
elk_stack_alexander_szalonnas
PDF
Log analysis with the elk stack
PDF
JOSA TechTalk: Realtime monitoring and alerts
PDF
Introducing log analysis to your organization
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PDF
Monitoring with Graylog - a modern approach to monitoring?
PDF
Security monitoring log management-describe logstash,kibana,elastic slidshare
PDF
Managing your Black Friday Logs NDC Oslo
PDF
Scaling monitoring with Datadog
PDF
Introducing ELK
PDF
Managing your black friday logs Voxxed Luxembourg
PDF
LogStash in action
PDF
Provisioning Datadog with Terraform
PPTX
ELK Stack
PPTX
Perfecting Your Streaming Skills with Spark and Real World IoT Data
PPTX
Microservices, Continuous Delivery, and Elasticsearch at Capital One
PPTX
RedisConf18 - Implementing a New Data Structure for Redis
Elastic{ON} 2016 Review - êč€ìą…ëŻŒ 님
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
More kibana
elk_stack_alexander_szalonnas
Log analysis with the elk stack
JOSA TechTalk: Realtime monitoring and alerts
Introducing log analysis to your organization
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Monitoring with Graylog - a modern approach to monitoring?
Security monitoring log management-describe logstash,kibana,elastic slidshare
Managing your Black Friday Logs NDC Oslo
Scaling monitoring with Datadog
Introducing ELK
Managing your black friday logs Voxxed Luxembourg
LogStash in action
Provisioning Datadog with Terraform
ELK Stack
Perfecting Your Streaming Skills with Spark and Real World IoT Data
Microservices, Continuous Delivery, and Elasticsearch at Capital One
RedisConf18 - Implementing a New Data Structure for Redis
Ad

Similar to Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch (20)

PPTX
Logs @ OVHcloud
PDF
Black friday logs - Scaling Elasticsearch
PDF
Elasticsearch Introduction at BigData meetup
PDF
Log Analytics with AWS
PDF
Architecture at Scale
PPTX
The Elastic Stack as a SIEM
PDF
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
PDF
Is your Elastic Cluster Stable and Production Ready?
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
PDF
Using AWS Elasticsearch for fast feedback on business data
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PDF
What’s Evolving in the Elastic Stack
PDF
Keep Calm And Serilog Elasticsearch Kibana on .NET Core
PDF
Elasticsearch speed is key
PDF
Introduction to Elasticsearch
PDF
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
PDF
Big data on aws
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
Security Monitoring for big Infrastructures without a Million Dollar budget
PPTX
Intro elasticsearch taswarbhatti
Logs @ OVHcloud
Black friday logs - Scaling Elasticsearch
Elasticsearch Introduction at BigData meetup
Log Analytics with AWS
Architecture at Scale
The Elastic Stack as a SIEM
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Is your Elastic Cluster Stable and Production Ready?
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Using AWS Elasticsearch for fast feedback on business data
Managing Security At 1M Events a Second using Elasticsearch
What’s Evolving in the Elastic Stack
Keep Calm And Serilog Elasticsearch Kibana on .NET Core
Elasticsearch speed is key
Introduction to Elasticsearch
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Big data on aws
Elasticsearch for Logs & Metrics - a deep dive
Security Monitoring for big Infrastructures without a Million Dollar budget
Intro elasticsearch taswarbhatti
Ad

More from Ali Kheyrollahi (16)

PDF
Autonomous agents with deep reinforcement learning - Oredev 2018
PDF
Microservice Architecture at ASOS - DevSum 2017
PDF
5 must-have patterns for your microservice - buildstuff
PDF
From Power Chord to the Power of Models - Oredev
PDF
From Hard Science to Baseless Opinions - Oredev
PDF
5 must have patterns for your microservice
PDF
From hard science to baseless opinions
PDF
Microservice architecture at ASOS
PDF
Us Elections 2016 - Iran Elections 2005
PDF
5 Anti-Patterns in Api Design - NDC London 2016
PDF
From power chords to the power of models
PDF
5 Anti-Patterns in Api Design - buildstuff
PDF
5 Anti-Patterns in API Design - DDD East Anglia 2015
PDF
5 Anti-Patterns in API Design
PDF
Topic Modelling and APIs
PDF
Http caching 101 and a bit of CacheCow
Autonomous agents with deep reinforcement learning - Oredev 2018
Microservice Architecture at ASOS - DevSum 2017
5 must-have patterns for your microservice - buildstuff
From Power Chord to the Power of Models - Oredev
From Hard Science to Baseless Opinions - Oredev
5 must have patterns for your microservice
From hard science to baseless opinions
Microservice architecture at ASOS
Us Elections 2016 - Iran Elections 2005
5 Anti-Patterns in Api Design - NDC London 2016
From power chords to the power of models
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design
Topic Modelling and APIs
Http caching 101 and a bit of CacheCow

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
L1 - Introduction to python Backend.pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
top salesforce developer skills in 2025.pdf
PDF
System and Network Administraation Chapter 3
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
history of c programming in notes for students .pptx
PPT
Introduction Database Management System for Course Database
PDF
medical staffing services at VALiNTRY
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
AI in Product Development-omnex systems
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
L1 - Introduction to python Backend.pptx
How to Migrate SBCGlobal Email to Yahoo Easily
ISO 45001 Occupational Health and Safety Management System
top salesforce developer skills in 2025.pdf
System and Network Administraation Chapter 3
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How Creative Agencies Leverage Project Management Software.pdf
history of c programming in notes for students .pptx
Introduction Database Management System for Course Database
medical staffing services at VALiNTRY
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Upgrade and Innovation Strategies for SAP ERP Customers
AI in Product Development-omnex systems
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf

Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch