SlideShare a Scribd company logo
Bravo Six, Going Realtime.
Transitioning Activision Data
Pipeline to Streaming
© 2020 Activision Publishing, Inc.
Hello!
I am Yaroslav Tkachenko
Software Architect at Activision Data.
You can find me at @sap1ens (pretty much everywhere).
2
Activision Data Pipeline
3
● Ingesting, processing and storing game telemetry data
● Providing tabular, API and streaming access to data
HTTP API
Schema
Registry
Magic
200k+ msg/s
Ingestion rate
9 years
Age of the oldest game
5+ PB
Data lake size (AWS S3)
5
Challenges
● Complex client-side & server-side game telemetry
● Long-living titles, hard to update or deprecate
● Various data formats, message schemas and envelopes
● Development data == production data
● Scalability, elasticity & cost
6
Established standards
7
● Kafka topic name conventions must be followed
● Payload schema must be uploaded to the Schema Registry
● Message envelope has a schema too (Protobuf), with a set of
required fields
Old pipeline
Quick overview
aggregate transform transform
devdata
proddata
Batch job*
(MR, Hive, Spark)
ETL API
* every X hours
transformed data
ETL’ed data
Prod data
Old pipeline
Architecture Flaws
● Scalability solution as a workaround
● Painful to switch between dev &
prod
● No streaming capabilities
● Adhoc integration
Bottlenecks
● Latency limitations
● MR glob length, memory is not
infinite (ETL API), etc.
● Lots of manual configuration
● Lots of manual ETL
11
New pipeline
It gets better from here
Apache Kafka
● The Streams API allows an application to act as a stream
processor, consuming an input stream from one or more topics
and producing an output stream to one or more output topics,
effectively transforming the input streams to output streams.
● The Connector API allows building and running reusable
producers or consumers that connect Kafka topics to existing
applications or data systems. For example, a connector to a
relational database might capture every change to a table.
13
~10 seconds
End-to-end streaming latency
90% cheaper
Per user/byte
6-24 hours → 5-10 mins
Tabular data available for querying
14
Kafka Streams
● One transformation step = one
service*
○ Not entirely true anymore, we’ve
combined some steps to optimize
cost and reduce unnecessary IO
● Stateless if possible
● Rich routing
● Auto-scaling & self-healing
● LOTS of tooling
Guiding principles
Kafka Connect
● Handle integration - AWS S3,
Cassandra, Elasticsearch, etc.
● Only sink connectors
● Invest in configuration,
deployments, monitoring
15
transform transform Connect
Why
Kafka
Streams?
17
Simple Java
library
Industry
standard
features
Separation
of concerns
that makes
sense
Kafka
first
Our internal protocol
18
Serialized Avro
Null (99%)
Schema guid
Other metadata,
mostly for routing
Kafka Message Value
Kafka Message Key
Kafka Message Headers
Schema management
● Schemas are generated & uploaded automatically if needed.
Schema hash is used as id
● Make schemas immutable and cache them aggressively. You
have to use them for every single record!
19
Schema
Registry API
Distributed
Cache
In-memory
Cache
Typical Kafka Streams
service topology
20
consume process
enrich produce
DLQ
21
1 KStream[] streams = builder
2 .stream(Pattern.compile(applicationConfig.getTopics()))
3 .transform(MetadataEnricher::new)
4 .transform(() -> new InputMetricsHandler(applicationMetrics))
5 .transform(ResultExtractor::new)
6 .transform(() -> new OutputMetricsHandler(applicationMetrics))
7 .branch(
8 (key, value) -> value instanceof RecordSucceeded,
9 (key, value) -> value instanceof RecordFailed,
10 (key, value) -> value instanceof RecordSkipped
11 );
12
13 // RecordSucceeded
14 streams[0].map((key, value) -> KeyValue.pair(key, ((RecordSucceeded)
value).getGenericRecord()))
15 .transform(SchemaGuidEnricher<String, GenericRecord>::new)
16 .to(new SinkTopicNameExtractor());
17
18 // RecordFailed
19 streams[1].process(dlqFailureResultHandlerSupplier);
Routing & configuration
Before:
<env>.<producer>.<title>.<category>-<protocol>
e.g.
prod.service-a.1234.match_summary-v1
“raw” data, no transformations
22
Routing & configuration
Now:
<env>.rdp.<game>.<stage1>
↓
<env>.rdp.<game>.<stage2>
↓
<env>.rdp.<game>.<stageN>
23
microservice
microservice
Routing & configuration
prod.rdp.mw.ingested
↓
prod.rdp.mw.parsed
24
microservice
prodMwServiceA:
stream:
headers:
env: prod
game: mw
source: service-a
exclude: <thingX>
action:
type: parse
protocol: proto2
Routing & configuration
prod.rdp.mw.ingested
↓
prod.rdp.mw.parsed
25
microservice
prodMwServiceA:
stream:
headers:
env: prod
game: mw
source: service-a
exclude: <thingX>
action:
type: parse
protocol: proto2Streams can be skipped, split, merged, sampled, etc.
Dynamic Routing*
26
● Centralized, declarative configuration
● Self-serve APIs and UIs
● Every change is automatically applied to all running services
within seconds
Infra & Tools
27
● One-click Kafka deployment (Jenkins, Ansible)
● Kafka broker EBS auto-scaling
● Versioned & deployable Kafka topic configuration
● Built tooling for:
○ Data reprocessing and DLQ resubmission
○ Offset migration between consumer groups
○ Message inspection
○ ...
Scaling
● Every application submits
<app_name>.lag metric in
milliseconds
● ECS Step Scaling: add/remove
X more instances every Y
minutes
● Add an extra policy for rapid
scaling
Auto-scaling & self-healing
Healing
● Heartbeat endpoint monitors
streams.state() result
● ECS healthcheck replaces
unhealthy instances
● Stateful applications need
more time to bootstrap
28
Why
Kafka
Connect?
29
Powerful
framework
Built-in
connectors
Separation
of concerns
that makes
sense
Kafka
first
Kafka Connect
● Multiple smaller clusters > one big cluster
● Connectors configuration lives in git, uses Jsonnet.
Deployment script leverages REST API
● Custom Converter, thanks to KIP-440
● ❤ lensesio/kafka-connect-ui
● Collecting & using tons of metrics available over JMX
30
C* Connector
● Implemented from scratch, inspired by JDBC connector
● Started with porting over existing C* integration code
● Took us a few days (!) to wrap it up
● Generalizing is hard
● Very performant, usually just a few tasks are running
31
ES Connector
● Using open-source kafka-connect-elasticsearch
● Leveraging SMTs to:
○ Partition single topic into multiple indexes
○ Enrich with a timestamp
● Currently very low-volume
32
S3 Connector
● Started with forking open-source kafka-connect-s3
● Added custom Avro and Parquet formats
● Added a new flexible partitioner
● Optimized connector for at-least-once delivery
○ Generate less files on S3, reduce TPS
○ Avoid file overrides with non-deterministic upload triggers
● Running hundreds of tasks
33
Dev data is prod data
● Scale is different, but the pipeline is the same
● Running as a separate set of services to reduce latency,
low latency is a requirement
● Different approach to alerting
Otherwise, it’s the same!
34
Use Case: RADS
Flatten my data!
36
{
"headers": {
"field1": "value1",
},
"data": {
"match": {
"field2": "value2"
},
"players": [
{"field3": "value3",
"field4": "value4"},
{"field3": "value3",
"field4": "value4"}
]
}
}
message_id context_headers_field1_s data_match_field2_s
... ... ...
... ... ...
fact_data
message_id index context_headers
_field1_s
data_players
_field3_i
...
... ... ... ... ...
... ... ... ... ...
fact_data_players
DDL
ingest transform flatten
table-generator
S3
connector
consolidator
Avro
Parquet
1:1 1:1 1:M
RADS
Schema
Registry API
Project API Metastore DB
S3
connector
Avro
Why is RADS rad?
● Has enough automation and generic configuration to
automatically create Hive databases, tables, add new
columns and partitions for a brand new game with no*
human intervention.
● As a data producer you just need to start sending data in
the right format to the right Kafka topic, that’s it!
● We get realtime (“hot”) and historical (“cold”) data in the
same place!
38
39
Thanks!
Any questions?
@sap1ens

More Related Content

PDF
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
PDF
Streaming all over the world Real life use cases with Kafka Streams
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Data integration with Apache Kafka
PDF
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
PPTX
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
PDF
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Streaming all over the world Real life use cases with Kafka Streams
Performance Tuning RocksDB for Kafka Streams’ State Stores
Data integration with Apache Kafka
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...

What's hot (20)

PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
A Tour of Apache Kafka
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
PDF
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
PDF
Introduction to Kafka Streams
PDF
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
PDF
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
PDF
Changing landscapes in data integration - Kafka Connect for near real-time da...
PPTX
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
PDF
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
PPTX
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
PDF
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
PDF
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
PDF
Diving into the Deep End - Kafka Connect
PDF
Apache kafka-a distributed streaming platform
PDF
How to over-engineer things and have fun? | Oto Brglez, OPALAB
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
A Tour of Apache Kafka
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Introduction to apache kafka, confluent and why they matter
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Introduction to Kafka Streams
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Changing landscapes in data integration - Kafka Connect for near real-time da...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Kafka Summit SF 2017 - Database Streaming at WePay
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
Diving into the Deep End - Kafka Connect
Apache kafka-a distributed streaming platform
How to over-engineer things and have fun? | Oto Brglez, OPALAB
Ad

Similar to Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming (Yaroslav Tkachenko, Activision) Kafka Summit 2020 (20)

PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
PDF
Building a Streaming Platform with Kafka
PDF
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
PDF
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
PDF
Kafka Connect and Streams (Concepts, Architecture, Features)
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PPTX
Data Architectures for Robust Decision Making
PPTX
Apache Kafka Streams
PDF
Building Streaming Data Applications Using Apache Kafka
PPTX
Leveraging the power of the unbundled database
PDF
Devoxx university - Kafka de haut en bas
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Streaming architecture patterns
PDF
Kafka Vienna Meetup 020719
PDF
How to Write Great Kafka Connectors
PDF
Event Driven Microservices
PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Streaming Data and Stream Processing with Apache Kafka
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
Building a Streaming Platform with Kafka
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Kafka Connect and Streams (Concepts, Architecture, Features)
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Data Architectures for Robust Decision Making
Apache Kafka Streams
Building Streaming Data Applications Using Apache Kafka
Leveraging the power of the unbundled database
Devoxx university - Kafka de haut en bas
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Streaming architecture patterns
Kafka Vienna Meetup 020719
How to Write Great Kafka Connectors
Event Driven Microservices
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
A Presentation on Artificial Intelligence
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Modernizing your data center with Dell and AMD
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
A Presentation on Artificial Intelligence
The Rise and Fall of 3GPP – Time for a Sabbatical?
Modernizing your data center with Dell and AMD
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Monthly Chronicles - July 2025
“AI and Expert System Decision Support & Business Intelligence Systems”

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming (Yaroslav Tkachenko, Activision) Kafka Summit 2020