SlideShare a Scribd company logo
Jitney,
Kafka at Airbnb
ALEXIS MIDON & KRISHNA PUTTASWAMY / 2016-02-23 / KAFKA MEETUP
Jitney ?!
a bus carrying passengers for a
low fare
Some Kafka Facts
• 1 Production cluster
• v0.8.2
• 90 “small” brokers, d2.2xlarge
• 70 topics
• Replication Factor of 3
• 5 Billions events / day
• IN: 80MB / second
• OUT: 1.5GB / second
• Network bound
• Super stable
Why Jitney?
Pick any metric
Standardization!
What Use Cases ?
Classic Message Bus
Jitney
MySQL
Monorail
MySQL Elasticsearch
Jitney Client
& Schemas
• Decouple Services
• Standard Events
• At-least once delivery
• Standard clients, for Java and Ruby
• Easy to use
• Conventions over configuration
Message Bus
• Site and image load times, OOM events
• Searches, requests, bookings, etc.
• Experiment assignments
• Event data is critical for building data products
• Data ingestion should be reliable: timely and complete
User Activity
Logging
• JSON events without schemas
• Easy to break events during evolution/code changes
• One topic overall for 800+ event types
• Improper producer configs
• Lack of monitoring
• Lead to:
• Too many data outages, data loss incidents
• Lack of trust on data systems
Challenges
Data Stability
CEO dashboard and
Magical booking
dashboards were
regularly broken.
A Year Ago
Data Stability
ERF was unstable and
experimentation culture
was weak
Hi team,
This is partly a PSA to let you
know ERF dashboard data
hasn't been up to date/
accurate for several weeks
now. Do not rely on the ERF
dashboard for information
about your experiment.
A Year Ago
Join Forces!
Data Infrastructure
&Production Infrastructure
Jitney Components
Jitney Components
Schema Repository
Thrift Schema Repository
Why Thrift?
• Easy syntax
• Good performance in Ruby
• Ubiquitous
Advantages of schema repo?
• Great Catalyst for communication, documentation, etc
• it ships jar and gems
• Will developers hate you for this? no
• Standard Field in the event schema
• Managed Explicitly
• use Semantic Versioning:
1.0.0 = MODEL . REVISION . ADDITION
MODEL is a change which breaks the rules of backward
compatibility.
Example: changing the type of a field.
REVISION is a change which is backward compatible but not
forward compatible.
Example: adding a new field to a union type.
ADDITION is a change which is both backward compatible and
forward compatible.
Example: adding a new optional field.
Schema
Evolution
Example of Thrift Event
because the event is your API
Jitney Components
Schema Repository Topic Repository
Topic Repository
• Declare all Jitney topics
• Aggregate all characteristics of a topic:
name
ordering (partitioning function)
white list of accepted schemas
• Great for documentation purposes
• DRY
Example of a Topic
Jitney Components
Schema Repository Topic Repository Clients
Jitney Clients
• Kafka clients are hard to use correctly
• it’s better with 0.9
• Committing offsets is tricky, someone will get it wrong
• even with 0.9
• Configuration is a mess
Jitney Clients
it provides:
• metrics reporting: github.com/airbnb/kafka-statsd-metrics2
• configuration for default clusters
• built-in support for Schema Repository and Topic Repository
Consumer:
• offset management to implement at-least once delivery
• polymorphic dispatching to event handler
Example of a Java Producer
Example of a Java Consumer
Jitney Components
Schema Repository Topic Repository Clients
HTTP Proxy
Jitney Components
Schema Repository Topic Repository Clients
HTTP Proxy
Warehouse
Integration
Data Ingestion Pipeline
• Stack: Jitney, Spark Streaming, HBase, HDFS
• Spark Streaming 1.5 with Kafka “direct” connect
• Process 1 minute batches
• Write to HBase after deserializing with the right schema
• Dump data to HDFS every hour (with dedup) and add a Hive partition
• But live data can be queried via “current” partition
Data Ingestion Pipeline
end to end
Audit
124
124
3
Event Schema for Audit
Metadata
How is Jitney used in the org?
DB change
ingestion
Payment
processing via
pub/sub
Experimentation
User activity
ingestion
Cache invalidation
Use cases
currently
powered
Key take aways
1 2 3
Standardization! Auditing Pipeline
Huge Advantage
for the organization
Thank You!

More Related Content

PDF
Data Mesh Part 4 Monolith to Mesh
PPTX
Azure Synapse Analytics Overview (r2)
PDF
AWS EMR Cost optimization
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Elastic stack Presentation
PPTX
Elastic Stack Introduction
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh Part 4 Monolith to Mesh
Azure Synapse Analytics Overview (r2)
AWS EMR Cost optimization
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Elastic stack Presentation
Elastic Stack Introduction
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...

What's hot (20)

PDF
How to govern and secure a Data Mesh?
PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
PDF
Introduction to Apache NiFi 1.11.4
PPTX
Data Lake Overview
PDF
Data platform architecture
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PDF
Time to Talk about Data Mesh
PDF
Considerations for Data Access in the Lakehouse
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PDF
Future of Data Engineering
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
The Patterns of Distributed Logging and Containers
PPTX
Data Engineering Efficiency @ Netflix - Strata 2017
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PDF
Apache Hadoop YARNとマルチテナントにおけるリソース管理
PDF
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
How to govern and secure a Data Mesh?
Apache Superset - open source data exploration and visualization (Conclusion ...
Introduction to Apache NiFi 1.11.4
Data Lake Overview
Data platform architecture
Introducing the Snowflake Computing Cloud Data Warehouse
Time to Talk about Data Mesh
Considerations for Data Access in the Lakehouse
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Future of Data Engineering
Emerging Trends in Data Architecture – What’s the Next Big Thing?
The Patterns of Distributed Logging and Containers
Data Engineering Efficiency @ Netflix - Strata 2017
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Enabling a Data Mesh Architecture with Data Virtualization
Architect’s Open-Source Guide for a Data Mesh Architecture
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Apache Hadoop YARNとマルチテナントにおけるリソース管理
DAS Slides: Master Data Management — Aligning Data, Process, and Governance
Ad

Similar to Jitney, Kafka at Airbnb (20)

PDF
Real-Time Dynamic Data Export Using the Kafka Ecosystem
PPTX
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
PPTX
Distributed Kafka Architecture Taboola Scale
PPTX
Azure Messaging Crossroads
PDF
Building real-time data analytics on Google Cloud
PDF
AWS for Java Developers workshop
PDF
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
PDF
Scalable and Reliable Logging at Pinterest
PPTX
Hybrid Integration with BizTalk Server - ACSUG
PPTX
Festive Tech Calendar 2021
PPTX
AWS for the Java Developer
PPTX
Why real integration developers ride Camels
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
PDF
The journey of Moving from AWS ELK to GCP Data Pipeline
PDF
Index conf sparkml-feb20-n-pentreath
PDF
Serverless on AWS : Understanding the hard parts at Froscon 2019
PDF
Keystone - ApacheCon 2016
PPTX
Lambda architecture: from zero to One
PDF
Introduction to Google Cloud Platform
PDF
Real Time Insights for Advertising Tech
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Distributed Kafka Architecture Taboola Scale
Azure Messaging Crossroads
Building real-time data analytics on Google Cloud
AWS for Java Developers workshop
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
Hybrid Integration with BizTalk Server - ACSUG
Festive Tech Calendar 2021
AWS for the Java Developer
Why real integration developers ride Camels
Reducing Microservice Complexity with Kafka and Reactive Streams
The journey of Moving from AWS ELK to GCP Data Pipeline
Index conf sparkml-feb20-n-pentreath
Serverless on AWS : Understanding the hard parts at Froscon 2019
Keystone - ApacheCon 2016
Lambda architecture: from zero to One
Introduction to Google Cloud Platform
Real Time Insights for Advertising Tech
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Modernizing your data center with Dell and AMD
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...

Jitney, Kafka at Airbnb