SlideShare a Scribd company logo
1
Hadoop Made Fast
Why Virtual Reality
Needed Stream Processing
to Survive
Greg Fodor, Co-founder, AltspaceVR
Gehrig Kunz, Technical Product Marketing, Confluent
2Confidential
Streaming in Action Series
You are here!
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io
3
A look at today
A Streaming Platform is Hadoop Made Fast
● Hadoop was a good idea, it has its flaws
● How a streaming platform can look like Hadoop
● Companies are using a streaming platform
Stream Processing with Kafka for Virtual Reality
● An example of Kafka with VR
● Challenges VR has that require stream processing
● Examples where it helps
● Why stream processing with Kafka makes sense
4
Interest in Hadoop
5
Good idea, Hadoop is
● Get all the datas
● Perform analysis, explore data
● Perfect for understanding your business
6
But today is different
Star Wars is good, again.
And the apps we build require
constant data.
7
Bringing it to today
Get all the datas
Process data as it arrives
Power your business
git commit -m “Today you want to”
With Hadoop you wanted to
Get all the datas
Explore historical data
Understanding your business
8
What this looks like in practice
9
What this looks like in practice
Ingest a stream
of data.
Process and act on it as it arrives.
Power your business.
1
2
3
10
Kafka’s Streams API
● Kafka’s Streams API: A lightweight library for
performing stream processing
• Aggregations, Sessions, Windowing, Joins,
et al
● Build apps, not clusters
Client
Server
Runs outside
Kafka brokers!
11
Build scalable, fault-tolerant apps
Client
Server
12
Build today’s apps quicker
13
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
14
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
Virtual reality, anyone?
Psst, Greg.
15
The best shared VR platform
https://altvr.com/kafka
16
Use cases
https://altvr.com/kafka
17
VR Mirroring + Capture
https://altvr.com/kafka
18
“Real” Reggie
“VIP” Room
https://altvr.com/kafka
19
“Real” Reggie
“VIP” Room
“Mirrored” Reggies
Room 1 Room 2 Room 3 Room 4
https://altvr.com/kafka
20
Use cases for capture/replay
21
22https://altvr.com/kafka
23
24
25
Kafka’s Streams API
26
Kafka’s Streams API
Stream processing: it’s not just for analytics!
27
Kafka’s Streams API
• Independent capacity
• Arbitrary transformations
• Flexible and simple ops
28
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extensibility
• Apply patterns + avoid pitfalls
29
Job #1: Game Streams
30
Game Streams
Create a logical stream across Photon servers
• Real-time netdata transformation
• Routing between Photon servers
• Stateful, due to Photon protocol
31
“Mirror User A to room R2”
32
6 months later: “Capture User A”
33
Job #2: Playbacks
34
Playbacks
Replays captured data
• Load capture data (Kafka/S3)
• Timed emission
• Checkpointing, looping, filtering
35
“Playback capture to room R2”
36
“Mirror User A to room R2”
37
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extendibility
• Apply patterns + avoid pitfalls
GameStreams job allows:
• User capture/mirroring
• Interactable object capture/mirroring
• VoIP, avatar transforms, VR emojis payloads
• Entire room capture/mirroring
38
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extendibility
• Apply patterns + avoid pitfalls
GameStreams job allows:
• Design names, record types generically
• Build in mechanisms for parameterization + control
• Use avro and schema registry
• Job code is not throwaway! Build accordingly
39
Patterns + Pitfalls
40
Patterns + Pitfalls
41
Config KTables
• Drive job behavior via OLTP state
• In our case, users interact with Rails API to control mirroring + captures
42
KIP-99 Global Tables
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
99%3A+Add+Global+Tables+to+Kafka+Streams
43
Prefer declarative OLTP table state
Database tables state should describe “how the world should be” not “steps to perform”
Job’s duty is to make the world look like the one desired
“A stream should exist from playback A to room B” not
“Right now, create a stream from playback A to room B”
Straightforward to test + verify: does desired world match up with reality?
Easier to reason about in failure cases
44
Keep consistent topic naming
Kafka Stream jobs involve a lot of source + intermediate topics
We prefer:
[<data source>|<job application id>]-<avro record type>[_<specifier>]-<partition key>
Ex:
oltp_db-user-user_id
job_playbacks-photon_instantiations-game_stream_id
45
RocksDB range scans
Did you know that RocksDB stores keys lexicographically sorted?
Kafka Streams exposes range() queries on persistent state stores!
46
Example: Scheduled tasks
Keys in “tasks” topic are a composite key of <timestamp, id>
Allows range queries for upcoming tasks (local to partition, obviously)
47
Dark staging jobs
Eventually you will need to deploy a staging version of a job into prod for integration testing
while known-good version is serving users.
Ensure you bake in the necessary degree of freedom! (Duplicate topics, application ids, etc.)
48
Patterns + Pitfalls
49
KTable rematerialization
Cold nodes read *entire* KTable transaction log for each KTable on startup. (Of course!)
Not something you’re likely to experience except during a failure.
You could be in for a surprise!
Easy to force a rematerialization to test: stop job, remove state dir from job work directory,
restart.
(But you should probably check your xlog topic sizes first)
In our case, AWS EBS I/O throttling caused us to be unable to bring a fresh node up!
Ensure topic xlog doesn’t grow unbounded:
- Ensure you delete dead keys explicitly and have proper compaction policies set on xlog topics
- Or, use set up topic rentention policies if data can be purged after time duration
50
Reset switches + flushing
Sometimes KTables topics or entries need to be forcibly rematerialized/flushed/read from
beginning.
For example: KTable topic exists before first job run. Or, something broke.
Handy to build in mechanisms to:
- Reset consumer offsets to zero
- For OLTP/Connect-backed KTable data, force a no-op update to database record(s) to flush
- In Rails, ActiveRecord#flush
May be less necessary in newer versions of Kafka Streams (ex due to KAFKA-4114 + bugfixes)
Handy topic consumer group offset resetter routine, pass in job Properties:
https://gist.github.com/gfodor/a4f5e4721e959766e75e4c901bf42890
51
Streaming for VR
Kafka Streams has been amazing for us.
Shown so far, we have jobs for:
• VR Mirror/Capture/Playback
• Presence
• Scheduled tasks
We are also using it for:
• Real time game telemetry ET
• VR Capture archival to S3
• Real-time push messaging
52
From batch to real-time
● Provides similar concepts to Hadoop
● Streaming platform is right for today’s applications
○ Distributed storage, Stream processing, Publish/Subscribe model
53
A streaming platform can be ‘Hadoop Made Fast’
● Use Kafka as a ‘source of truth’
● Process data as it arrives
● Power real-time experiences (like VR)
54Confidential
Streaming in Action Series
You are here
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io
55Confidential
Download Confluent Open Source
Join the Confluent Slack community
Check out Kafka Summit!
August 28th in San Francisco
Thanks!

More Related Content

PDF
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
PDF
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
PDF
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
PDF
Using Apache Kafka to Analyze Session Windows
PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Using Apache Kafka to Analyze Session Windows
Streaming Data and Stream Processing with Apache Kafka
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs

What's hot (20)

PDF
Introduction to apache kafka, confluent and why they matter
PPTX
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
PDF
Simplify Governance of Streaming Data
PPTX
Data Streaming with Apache Kafka & MongoDB
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
PDF
Etl is Dead; Long Live Streams
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
PDF
What is Apache Kafka and What is an Event Streaming Platform?
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
PDF
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
PDF
Introduction to Apache Kafka and Confluent... and why they matter
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
PPTX
Stream Processing Live Traffic Data with Kafka Streams
PDF
How Yelp Leapt to Microservices with More than a Message Queue
PDF
Putting the Micro into Microservices with Stateful Stream Processing
PDF
KSQL: Open Source Streaming for Apache Kafka
PDF
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
PDF
Leveraging Mainframe Data for Modern Analytics
PDF
Real-world Streaming Architectures
PDF
Stream Processing with Apache Kafka and .NET
Introduction to apache kafka, confluent and why they matter
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Simplify Governance of Streaming Data
Data Streaming with Apache Kafka & MongoDB
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Etl is Dead; Long Live Streams
Capital One Delivers Risk Insights in Real Time with Stream Processing
What is Apache Kafka and What is an Event Streaming Platform?
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Introduction to Apache Kafka and Confluent... and why they matter
Kafka Summit SF 2017 - Database Streaming at WePay
Stream Processing Live Traffic Data with Kafka Streams
How Yelp Leapt to Microservices with More than a Message Queue
Putting the Micro into Microservices with Stateful Stream Processing
KSQL: Open Source Streaming for Apache Kafka
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Leveraging Mainframe Data for Modern Analytics
Real-world Streaming Architectures
Stream Processing with Apache Kafka and .NET
Ad

Similar to Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive (20)

PDF
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PDF
Making Apache Kafka Even Faster And More Scalable
PDF
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
PPTX
Building a derived data store using Kafka
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Open Security Operations Center - OpenSOC
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
PDF
28March2024-Codeless-Generative-AI-Pipelines
PDF
OpenStack Preso: DevOps on Hybrid Infrastructure
PDF
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
PPTX
ETL with SPARK - First Spark London meetup
PDF
Building scalable data with kafka and spark
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PPT
Kafka Explainaton
PPTX
Building Event-Driven Systems with Apache Kafka
PDF
Introducing Kafka's Streams API
PDF
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Making Apache Kafka Even Faster And More Scalable
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
Building a derived data store using Kafka
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Open Security Operations Center - OpenSOC
Apache Flink(tm) - A Next-Generation Stream Processor
28March2024-Codeless-Generative-AI-Pipelines
OpenStack Preso: DevOps on Hybrid Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
ETL with SPARK - First Spark London meetup
Building scalable data with kafka and spark
Real time Messages at Scale with Apache Kafka and Couchbase
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Kafka Explainaton
Building Event-Driven Systems with Apache Kafka
Introducing Kafka's Streams API
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Digital Strategies for Manufacturing Companies
PPTX
Transform Your Business with a Software ERP System
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
medical staffing services at VALiNTRY
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
DOCX
The Five Best AI Cover Tools in 2025.docx
PPTX
ai tools demonstartion for schools and inter college
PPTX
Introduction to Artificial Intelligence
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Choose the Right IT Partner for Your Business in Malaysia
Which alternative to Crystal Reports is best for small or large businesses.pdf
ISO 45001 Occupational Health and Safety Management System
Design an Analysis of Algorithms II-SECS-1021-03
Materi-Enum-and-Record-Data-Type (1).pptx
Digital Strategies for Manufacturing Companies
Transform Your Business with a Software ERP System
Online Work Permit System for Fast Permit Processing
ManageIQ - Sprint 268 Review - Slide Deck
medical staffing services at VALiNTRY
How to Migrate SBCGlobal Email to Yahoo Easily
Odoo POS Development Services by CandidRoot Solutions
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
The Five Best AI Cover Tools in 2025.docx
ai tools demonstartion for schools and inter college
Introduction to Artificial Intelligence
Upgrade and Innovation Strategies for SAP ERP Customers

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

  • 1. 1 Hadoop Made Fast Why Virtual Reality Needed Stream Processing to Survive Greg Fodor, Co-founder, AltspaceVR Gehrig Kunz, Technical Product Marketing, Confluent
  • 2. 2Confidential Streaming in Action Series You are here! August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines Watch on Confluent.io
  • 3. 3 A look at today A Streaming Platform is Hadoop Made Fast ● Hadoop was a good idea, it has its flaws ● How a streaming platform can look like Hadoop ● Companies are using a streaming platform Stream Processing with Kafka for Virtual Reality ● An example of Kafka with VR ● Challenges VR has that require stream processing ● Examples where it helps ● Why stream processing with Kafka makes sense
  • 5. 5 Good idea, Hadoop is ● Get all the datas ● Perform analysis, explore data ● Perfect for understanding your business
  • 6. 6 But today is different Star Wars is good, again. And the apps we build require constant data.
  • 7. 7 Bringing it to today Get all the datas Process data as it arrives Power your business git commit -m “Today you want to” With Hadoop you wanted to Get all the datas Explore historical data Understanding your business
  • 8. 8 What this looks like in practice
  • 9. 9 What this looks like in practice Ingest a stream of data. Process and act on it as it arrives. Power your business. 1 2 3
  • 10. 10 Kafka’s Streams API ● Kafka’s Streams API: A lightweight library for performing stream processing • Aggregations, Sessions, Windowing, Joins, et al ● Build apps, not clusters Client Server Runs outside Kafka brokers!
  • 11. 11 Build scalable, fault-tolerant apps Client Server
  • 13. 13 Kafka, stream processing for developers Deploy apps – not clusters – that are: ● Real-time ● Elastic ● Fault-tolerant ● Teams can be more efficient ● Provide a better, new experience to users
  • 14. 14 Kafka, stream processing for developers Deploy apps – not clusters – that are: ● Real-time ● Elastic ● Fault-tolerant ● Teams can be more efficient ● Provide a better, new experience to users Virtual reality, anyone? Psst, Greg.
  • 15. 15 The best shared VR platform https://altvr.com/kafka
  • 17. 17 VR Mirroring + Capture https://altvr.com/kafka
  • 19. 19 “Real” Reggie “VIP” Room “Mirrored” Reggies Room 1 Room 2 Room 3 Room 4 https://altvr.com/kafka
  • 20. 20 Use cases for capture/replay
  • 21. 21
  • 23. 23
  • 24. 24
  • 26. 26 Kafka’s Streams API Stream processing: it’s not just for analytics!
  • 27. 27 Kafka’s Streams API • Independent capacity • Arbitrary transformations • Flexible and simple ops
  • 28. 28 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extensibility • Apply patterns + avoid pitfalls
  • 29. 29 Job #1: Game Streams
  • 30. 30 Game Streams Create a logical stream across Photon servers • Real-time netdata transformation • Routing between Photon servers • Stateful, due to Photon protocol
  • 31. 31 “Mirror User A to room R2”
  • 32. 32 6 months later: “Capture User A”
  • 34. 34 Playbacks Replays captured data • Load capture data (Kafka/S3) • Timed emission • Checkpointing, looping, filtering
  • 36. 36 “Mirror User A to room R2”
  • 37. 37 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extendibility • Apply patterns + avoid pitfalls GameStreams job allows: • User capture/mirroring • Interactable object capture/mirroring • VoIP, avatar transforms, VR emojis payloads • Entire room capture/mirroring
  • 38. 38 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extendibility • Apply patterns + avoid pitfalls GameStreams job allows: • Design names, record types generically • Build in mechanisms for parameterization + control • Use avro and schema registry • Job code is not throwaway! Build accordingly
  • 41. 41 Config KTables • Drive job behavior via OLTP state • In our case, users interact with Rails API to control mirroring + captures
  • 43. 43 Prefer declarative OLTP table state Database tables state should describe “how the world should be” not “steps to perform” Job’s duty is to make the world look like the one desired “A stream should exist from playback A to room B” not “Right now, create a stream from playback A to room B” Straightforward to test + verify: does desired world match up with reality? Easier to reason about in failure cases
  • 44. 44 Keep consistent topic naming Kafka Stream jobs involve a lot of source + intermediate topics We prefer: [<data source>|<job application id>]-<avro record type>[_<specifier>]-<partition key> Ex: oltp_db-user-user_id job_playbacks-photon_instantiations-game_stream_id
  • 45. 45 RocksDB range scans Did you know that RocksDB stores keys lexicographically sorted? Kafka Streams exposes range() queries on persistent state stores!
  • 46. 46 Example: Scheduled tasks Keys in “tasks” topic are a composite key of <timestamp, id> Allows range queries for upcoming tasks (local to partition, obviously)
  • 47. 47 Dark staging jobs Eventually you will need to deploy a staging version of a job into prod for integration testing while known-good version is serving users. Ensure you bake in the necessary degree of freedom! (Duplicate topics, application ids, etc.)
  • 49. 49 KTable rematerialization Cold nodes read *entire* KTable transaction log for each KTable on startup. (Of course!) Not something you’re likely to experience except during a failure. You could be in for a surprise! Easy to force a rematerialization to test: stop job, remove state dir from job work directory, restart. (But you should probably check your xlog topic sizes first) In our case, AWS EBS I/O throttling caused us to be unable to bring a fresh node up! Ensure topic xlog doesn’t grow unbounded: - Ensure you delete dead keys explicitly and have proper compaction policies set on xlog topics - Or, use set up topic rentention policies if data can be purged after time duration
  • 50. 50 Reset switches + flushing Sometimes KTables topics or entries need to be forcibly rematerialized/flushed/read from beginning. For example: KTable topic exists before first job run. Or, something broke. Handy to build in mechanisms to: - Reset consumer offsets to zero - For OLTP/Connect-backed KTable data, force a no-op update to database record(s) to flush - In Rails, ActiveRecord#flush May be less necessary in newer versions of Kafka Streams (ex due to KAFKA-4114 + bugfixes) Handy topic consumer group offset resetter routine, pass in job Properties: https://gist.github.com/gfodor/a4f5e4721e959766e75e4c901bf42890
  • 51. 51 Streaming for VR Kafka Streams has been amazing for us. Shown so far, we have jobs for: • VR Mirror/Capture/Playback • Presence • Scheduled tasks We are also using it for: • Real time game telemetry ET • VR Capture archival to S3 • Real-time push messaging
  • 52. 52 From batch to real-time ● Provides similar concepts to Hadoop ● Streaming platform is right for today’s applications ○ Distributed storage, Stream processing, Publish/Subscribe model
  • 53. 53 A streaming platform can be ‘Hadoop Made Fast’ ● Use Kafka as a ‘source of truth’ ● Process data as it arrives ● Power real-time experiences (like VR)
  • 54. 54Confidential Streaming in Action Series You are here August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines Watch on Confluent.io
  • 55. 55Confidential Download Confluent Open Source Join the Confluent Slack community Check out Kafka Summit! August 28th in San Francisco Thanks!