This document summarizes Activision Data's transition from a batch data pipeline to a real-time streaming data pipeline using Apache Kafka and Kafka Streams. Some key points:
- The new pipeline ingests, processes, and stores game telemetry data from over 200k messages per second and over 5PB of data across 9 years of games.
- Kafka Streams is used to transform the raw streaming data through multiple microservices with low 10-second end-to-end latency, compared to 6-24 hours previously.
- Kafka Connect integrates the streaming data with data stores like AWS S3, Cassandra, and Elasticsearch.
- The new pipeline provides real-time and historical access to structured
Related topics: