SlideShare a Scribd company logo
Ines Sombra
Director of Engineering
The burden of a successful feature: 

Scaling our real time logging platform
presents
Today’s Agenda
A delightful
demo &
context
A deep dive
into logging
Challenges
& future
But first… A bit of context
Observability
tl;dr
https://vimeo.com/267641392
Fresh from Altitude NYC
—Peter Bourgon, Altitude NYC
Observability is an umbrella term. There are different
techniques to achieve observability in a system.
Peter’s classification of Observability
TECHNIQUES SYSTEMS
* Lovingly stolen from Peter Bourgon
SYSTEMS
* Lovingly stolen from Peter Bourgon
TODAY 👉
Peter’s classification of Observability
TECHNIQUES
STOP
Demo
Time!
The burden of a successful feature: Scaling our real time logging platform
The burden of a successful feature: Scaling our real time logging platform
The burden of a successful feature: Scaling our real time logging platform
But Why?
This pipeline is one of the oldest systems at Fastly
Born out of our dissatisfaction w the status quo
We wanted something that would send you logs
extremely fast (stream them near realtime) to
anywhere you want (many endpoints)
Log Streaming 

at Fastly
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
Logging pipeline is Stateless
We don’t batch your logs
We don’t store your logs
We stream your logs in
near real-time to your
defined endpoints
We really don’t want your
logs on disk
Logging @ Fastly
Caches + Senders Aggregators
Varnish
Varnish
Varnish
Varnish
Varnish
Varnish
Varnish
Varnish
Logging @ Fastly
Caches + Senders Aggregators
Varnish
Varnish
Varnish
Varnish
Logging @ Fastly
Caches + Senders Aggregators
Varnish
Varnish
Varnish
Logging @ Fastly
Caches + Senders Aggregators
Varnish
Logging pipeline is Best Effort
We try our best to send logs to
your defined endpoint
Your endpoint must be up &
healthy in order for us to be
able to send data to it
We have minimal buffering
Pipeline optimized for log
streaming speed
Logging Endpoints
We don’t limit the number
of endpoints or log lines
per request
~8.6K active endpoints
Ecosystem of endpoints in
different stages of
evolution
Aggregators
Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging Streams data
File-based endpoints (time ranged)
Streaming endpoints (protocol or http-requests)
s3 gcs ftp sftp
syslog
sumologic
bigquery logentries papertrail
splunk scalyr honeycomb
Logging Growth (2014-2015)
😳
~430K LPS ~1.2K endpoints ~ 2GBps
Logging Growth (2014-2015)
😳
~430K LPS ~1.2K endpoints ~ 2GBps
Logging Growth (2017-2018)
~3M LPS ~8.6K endpoints ~4GBps
Logging Growth (2017-2018)
~3M LPS ~8.6K endpoints ~4GBps
Logging Growth (8X!!)
~3M LPS ~8.6K endpoints ~4GBps
Logging Endpoints
We send a lot of data continuously to
our supported endpoints
Syslog continues to be our most
popular endpoint but S3 & GCS have
the highest volume
The 70's are still alive with a very
respectable 13 MBps to ftp and 74
kBps to sftp*
* for the non-millennials
Logging Endpoints
Challenges & 

Lessons learned
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
Logging @ Fastly
Caches Aggregators Endpoints
Volume Challenges
No hard limits to what you
can log, this can be
challenging
System is multi-tenant. Noisy
neighbors can affect delivery
Consider sampling for high
volume logging
Burden of many
endpoints
Classic integrations
challenges (each endpoint is
a downstream dependency)
Standard endpoint clients
often don’t meet our needs
Having our own clients
affords us extra optimizations
Endpoints & Health
Some endpoints have known
limitations (infamous
examples: S3, BigQuery, GCS)
Difficult to infer if an
endpoint is working or not
(Hard to test setup too)
Structured logging (JSON via
VCL) is challenging
Service Isolation
Prioritize delivery of content over
log retention
An aggregator discards the oldest
logs it has when it can’t deliver
them fast enough
In a cache node we are our own
customers so senders do the
same when they can’t reach
aggregators fast enough
Expectation Mismatch
Burden of a system that works so well is that it
makes you believe you have strong guarantees
Design constraints determine the SLA of the
pipeline
General advice: Understand the design choices of
the systems you use because they limit what is
possible to guarantee *
The Future of
Logging
The team have been Busy bees
H2
H1
Platform performance
& addressing the
challenges of
individual endpoints
We are getting fancy!
Platform Performance
Reducing lock contention & CPU usage
Smarter memory allocation &
management
Overhauling all endpoints
Halving the time it takes for a log line to
be processed (from sender read to
aggregator line preparation)
Getting fancy
BigQuery improvements
New endpoints: Kafka
More integrations with
cloud services
Make endpoints easier to
debug
The burden of a successful feature: Scaling our real time logging platform
Want More?
Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Want More?
Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
tl;dr LOGGING
Fastly lets you extend the
visibility of your system to the
edge & gain meaningful insights
in near real-time
Is a pipeline with very specific
constraints & guarantees
Exciting things are coming!
(l,d)ogs of Fastly
https://github.com/Randommood/Altitude2018

More Related Content

PDF
Deploying Confluent Platform for Production
PDF
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
PDF
HBaseCon2017 Data Product at AirBnB
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PPTX
Top Ten Kafka® Configs
PPTX
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Deploying Confluent Platform for Production
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
HBaseCon2017 Data Product at AirBnB
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Top Ten Kafka® Configs
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...

What's hot (20)

PDF
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
PPTX
How to Improve the Observability of Apache Cassandra and Kafka applications...
PDF
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
PPTX
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
PDF
Kafka for Real-Time Event Processing in Serverless Environments
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
PDF
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
PDF
Flink forward-2017-netflix keystones-paas
PDF
Registry improvements update
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PPTX
Apache HBase at Airbnb
PDF
Matching the Scale at Tinder with Kafka
PDF
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
PDF
Stream processing with Apache Flink @ OfferUp
PDF
NetflixOSS Meetup S6E1 - Titus & Containers
PPTX
ChronoLogic Tools Demo: 6/12/18
PPTX
Portable Streaming Pipelines with Apache Beam
PDF
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
PPTX
DataEngConf SF16 - High cardinality time series search
PDF
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
How to Improve the Observability of Apache Cassandra and Kafka applications...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Kafka for Real-Time Event Processing in Serverless Environments
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Flink forward-2017-netflix keystones-paas
Registry improvements update
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Apache HBase at Airbnb
Matching the Scale at Tinder with Kafka
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Stream processing with Apache Flink @ OfferUp
NetflixOSS Meetup S6E1 - Titus & Containers
ChronoLogic Tools Demo: 6/12/18
Portable Streaming Pipelines with Apache Beam
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
DataEngConf SF16 - High cardinality time series search
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Ad

Similar to The burden of a successful feature: Scaling our real time logging platform (20)

PDF
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
PPTX
Asynchronous micro-services and the unified log
PDF
Loggly - How to Scale Your Architecture and DevOps Practices for Big Data App...
PDF
Altitude SF 2017: Logging at the edge
PDF
Logging : How much is too much? Network Security Monitoring Talk @ hasgeek
PDF
Security Events Logging at Bell with the Elastic Stack
KEY
London devops logging
PPTX
Is 12 Factor App Right About Logging
PPTX
Software architecture for data applications
KEY
Message:Passing - lpw 2012
PPTX
Your Guide to Streaming - The Engineer's Perspective
PPTX
Tools and practices to use in a Continuous Delivery pipeline
PDF
How to build observability into Serverless (BuildStuff 2018)
PDF
Log Management: AtlSecCon2015
PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
PPTX
How fluentd fits into the modern software landscape
PDF
OSMC 2023 | Large-scale logging made easy by Alexandr Valialkin
PPTX
PDF
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
PDF
Building data intensive applications
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
Asynchronous micro-services and the unified log
Loggly - How to Scale Your Architecture and DevOps Practices for Big Data App...
Altitude SF 2017: Logging at the edge
Logging : How much is too much? Network Security Monitoring Talk @ hasgeek
Security Events Logging at Bell with the Elastic Stack
London devops logging
Is 12 Factor App Right About Logging
Software architecture for data applications
Message:Passing - lpw 2012
Your Guide to Streaming - The Engineer's Perspective
Tools and practices to use in a Continuous Delivery pipeline
How to build observability into Serverless (BuildStuff 2018)
Log Management: AtlSecCon2015
Intro to open source observability with grafana, prometheus, loki, and tempo(...
How fluentd fits into the modern software landscape
OSMC 2023 | Large-scale logging made easy by Alexandr Valialkin
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
Building data intensive applications
Ad

More from Fastly (20)

PDF
Revisiting HTTP/2
PPTX
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
PPTX
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
PDF
Altitude San Francisco 2018: The World Cup Stream
PDF
Altitude San Francisco 2018: We Own Our Destiny
PDF
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
PDF
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
PDF
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
PDF
Altitude San Francisco 2018: HTTP Invalidation Workshop
PDF
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
PPTX
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
PDF
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
PPTX
Altitude San Francisco 2018: Authentication at the Edge
PDF
Altitude San Francisco 2018: WebAssembly Tools & Applications
PPTX
Altitude San Francisco 2018: Testing with Fastly Workshop
PDF
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
PPTX
Altitude San Francisco 2018: WAF Workshop
PPTX
Altitude San Francisco 2018: Logging at the Edge
PPTX
Altitude San Francisco 2018: Video Workshop Docs
PPTX
Altitude San Francisco 2018: Programming the Edge
Revisiting HTTP/2
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Programming the Edge

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A comparative analysis of optical character recognition models for extracting...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
sap open course for s4hana steps from ECC to s4
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

The burden of a successful feature: Scaling our real time logging platform