SlideShare a Scribd company logo
1
Processing IoT Data with
Apache Kafka
Matt Howlett
Confluent Inc.
2
Pub Sub
Messaging Protocol
Pub Sub
Messaging System
(rethought as a distributed commit log)
Distributed Streaming Platform
● Pub Sub Messaging
● Event Storage
● Processing Framework
3
OBD-II Adapters
4
Problem Statement
Let’s build a system to:
• Transport OBD-II data over unreliable links from cars to the data center
• Capable of handling millions of devices*
• Extract information from + respond to this data in (near) real time (at scale)
• Handle surges in usage
• Potential for ad-hoc historical processing
* also less
Architecture / technology / methods applicable to many scenarios.
5
Publish / subscribe messaging protocol:
• Built on top of TCP/IP
• Features that make it well suited to poor connectivity / high latency scenarios
• Lightweight
• Efficient client implementations, low network overhead
• MQTT-SN for non IP networks (’virtual connections’)
• Many (open source) broker implementations
• Mosquitto, RabbitMQ, HiveMQ, VerneMQ
• Many Client Libraries
• C, C++, Java, C#, Python, Javascript, websockets, Arduino …
• Widely used (incl. phone apps!)
• Oil pipeline sensor via satellite link
• Facebook Messenger
• AWS IoT
MQTT Introduction
6
• Simple API
• Hierarchical topics
• myhome/kitchen/door/front/battery/level
• wildcard subscription: myhome/*/door/*/battery/level
• 3 qualities of service (on both produce and consume)
• At most once (QoS 0)
• At least once (QoS 1)
• Exactly once (QoS 2) [not universally supported]
• Persistent consumer sessions
• Important for QoS 1, QoS 2
• Last will and testament
• Last known good value
• Authorization, SSL/TLS
MQTT Features
7
• Device Id
• GPS Location [lon, lat]
• Ignition on / off
• Speedometer reading
• Timestamp
• …plus a lot more
Assume: data sent via 3G wireless connection at ~30 second interval
OBD-II Data
8
Deficiencies:
• Single MQTT server can handle maybe ~100K
connections
• Can’t handle usage surges (no buffering)
• No storage of events or reprocess capability
MQTT
Server 1
Processor 1 Processor 2 ...
Ingest Architecture V1
topic: [deviceid]/obd
9
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
http / REST
...
• Easily Shardable
• Treat MQTT server as
commodity service
Ingest Architecture V2
10
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
Kafka Connect
OBD_Data
Stream
processing
kafka
OBD -> MQTT -> Kafka
11
Apache Kafka
Distributed Streaming Platform:
• Pub Sub Messaging
• (typically clients are within data-center)
• Data Store
• Messages not deleted after delivery
• Stream Processing
• Low or high level libraries
• Data re-processing
12
Apache Kafka adoption spans
companies across industries.
13
● Persisted
● Append only
● Immutable
● Delete earliest data based on time / size / never
14
• Allows topics to scale past constraints
of single server
• Message → partition_id deterministic.
Partition relevant to application.
• Ordering guarantees per partition but
not across partitions
15
Apache Kafka Replication
• cheap durability!
• choose # acks for
message produced
confirmation
16
Apache Kafka Consumer Groups
partitions possibly across different brokers
17
Kafka Connect
• Use client library producers / consumers in custom applications.
• Often want to bulk transfer data between standard systems:
• Don’t re-invent the wheel – configure Kafka Connect
• Narrow scope: move data into & out of Kafka
• Off-the-shelf connectors
• Fault Tolerant
• Auto-balances load
• Pluggable Serialization
• Standalone and distributed modes of operation
• Configuration / management via REST API
18
19
MQTT Connector
https://github.com/evokly/kafka-connect-mqtt
• Single Task
• Single MQTT Broker
• Source only
Either:
• Start a bunch of these connectors (in one connect cluster), one per server, or:
• Implement a new multi-task connector, one task per MQTT broker.
• Communicate with MQTT Controller
20
• user_id
• device_id
• name
• address
• phone_number
• speed_alert_level
• ...
SQL Db
User_Info
User Data
21
Example: Car Towed Alert
Detect movement of car when ignition off, send SMS alert
kafka
OBD_Data P1
OBD_Data P5
Consumer 1
Consumer 2
Broker 1
...
OBD_Data P3
OBD_Data P7
Broker 2
...
...
...
SMS Gateway
Last loc. in mem
KV store
Last loc. in mem
KV store
User Info
22
Consumer Implementation
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
if (obd_data.ignition_on)
return;
if (!kv_store.contains(device_id)) {
kv_store.add(device_id, obd_data.lon_lat);
return;
}
var prev_lon_lat = kv_store.get(device_id);
var dist = calc_dist(obd_data.lon_lat, prev_lon_lat);
kv_store.set(device_id, obd_data.lon_lat);
if (dist > alert_max_dist) {
// infrequent
send_alert(SQL.get_phone_number(device_id));
}
}
• Message can be from any partition
assigned to this consumer
• Ordering guaranteed per partition, but
not predictable across partitions
• All messages from a particular device
guaranteed to arrive at the same
consumer instance
23
Example: Speed Alert
• Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a
specified speed.
• In the Tow Alert example User_Info only needs to be queried in the event of an alert.
• In this example, the table needs to be queried for every OBD data record in every partition.
OBD_data
[can update
at any time]
User Info
table
Not scalable! Cache?
...
Highfrequency
P1
24
Time = 0 1 60 {device_id=1, speed_limit=60}
Time = 1 1 60 {device_id=2, speed_limit=80}
2 80
Time = 2 1 60 {device_id=3, speed_limit=70}
2 80
3 70
Time = 3 1 80 {device_id=1, speed_limit=80}
2 80
3 70
Time = 4 1 80 {device_id=1, speed_limit=65}
2 80
3 70
Table can be represented as stream of updates
device_id speed_limit
Log compaction!
25
Debezium
Kafka Connector that turns database tables into streams of update records.
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
User Info
[key: userId]
User_Info
[changelog topic]Partition by device_id
26
Stream / Table Join
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
...
Consumer 1
Relevant subset of
User_Info
device_id speed_limit
1 80
3 70
User_Info
[ChangeLog, compacted]
OBD_Data
[Record Stream]
...
debezium
key:device_id
key:device_id
27
Speed Alert: Message handler
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
var user_info = user_info_local.get(device_id);
if (obd_data.speedometer > user_info.max_speed) {
alert_user(device_id, user_info);
}
}
28
MQTT Phone Client Connectivity
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
[deviceid]/alert
...
Consumer 1 ...
MQTT
Server 3
...
[deviceid]/obd
29
Speed Limit Alert: Rate limiting
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
app_state kafka topic
• Prefer to rate limit on server to minimize network overhead.
• Create new Kafka topic app_state, partitioned on
device_id.
• When alert triggered, store alert time in this topic.
• [can use this topic as general store for other per device
state info too]
• Materialize this change-log stream on consumers as
necessary.
30
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
...
Consumer 1
Relevant
subset of
User_Info
...
OBD_Data
[Record Stream]
User_Info
[ChangeLog, compacted]
Partition 4
Partition 1
Partition 2
Partition 3
...
Partition 4
App_State
[compacted]
Relevant
subset of
App_State
31
Example: Location Based Special Offers
When Car enters specific region, send available special offers to the user’s phone.
Require:
• User_Info
• Address – so we know whether they are local to their current location or not
• App_state
• Use to persist already sent offers
• Special_Offer_Info
• Table that store list of all special offers.
32
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
36 37 38 39 40 41 42
Regions
• Regions may be simple (as depicted
here) or complex
• F(lon, lat) -> locationId.
• Note: could also implement ride—share
surge pricing using similar partitioning.
33
Special Offer Change-log Stream
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
Special Offer
Info
Special_Offers
[changelog,
compacted]
Partition by location_id
34
Multi-stage Data Pipeline
OBD_Data App_State
[offers already sent]
User_Info
[address]
K: device_id
V: OBD record
consume enrich
K: device_id
V: OBD record
address
K: device_id
V: OBD record
Address
offers_sent
enrich
35
Multi-stage Data Pipeline (continued)
K: [device_id]
V: OBD record
Address
offers_sent
K: location_id
V: OBD record
Address
offers_sent
OBD_Data_By_Location
P1
……
…
Repartition by location_id
P2
P1
P3
Data from given device will still all be on the same partition
(except when region changes)
36
Multi-stage Data Pipeline (continued)
K: location_id
V: OBD record
Address
offers_sent
Special_Offers
K: location_id
V: OBD record
address
offers_sent
available_offers
re-partition
enrich
37
Multi-stage Data Pipeline (continued)
Special offer available in
location
Special offer not already
sent
User address near location?
MQTT
Server
filter
filter
filter
...
[deviceId]/alert
38
39
40
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by
41
Thank You
@matt_howlett
@confluentinc

More Related Content

PDF
Io t world_2016_iot_smart_gateways_moe
PPTX
INTERNET OF THINGS & AZURE
PDF
IoT & Azure
PDF
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
PDF
Iot Toolkit and the Smart Object API - Architecture for Interoperability
PDF
Apache Pulsar Overview
PPTX
TechTalk: Connext DDS 5.2.
PDF
Lightweight and scalable IoT Architectures with MQTT
Io t world_2016_iot_smart_gateways_moe
INTERNET OF THINGS & AZURE
IoT & Azure
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Iot Toolkit and the Smart Object API - Architecture for Interoperability
Apache Pulsar Overview
TechTalk: Connext DDS 5.2.
Lightweight and scalable IoT Architectures with MQTT

What's hot (16)

PPTX
Protocols for internet of things
PPTX
Osiot14 buildout
PPTX
OpenContrail Silicon Valley Meetup Aug 25 2015
PDF
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
PPTX
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
PPTX
Functional reactive programming
PPTX
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
PPTX
Microservices Part 4: Functional Reactive Programming
PPTX
Choosing the right platform for your Internet -of-Things solution
PPTX
Cisco OpenSOC
PDF
StarlingX - Project Onboarding
PDF
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
PDF
Open Source Bristol 30 March 2022
PPT
Standards Drive the Internet of Things
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
PPTX
OpenDaylight VTN Policy
Protocols for internet of things
Osiot14 buildout
OpenContrail Silicon Valley Meetup Aug 25 2015
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Functional reactive programming
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
Microservices Part 4: Functional Reactive Programming
Choosing the right platform for your Internet -of-Things solution
Cisco OpenSOC
StarlingX - Project Onboarding
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
Open Source Bristol 30 March 2022
Standards Drive the Internet of Things
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
OpenDaylight VTN Policy
Ad

Viewers also liked (7)

PPTX
Kafka Intro With Simple Java Producer Consumers
PPTX
MapR Streams and MapR Converged Data Platform
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
PDF
MapR Data Analyst
PPTX
Apache Kafka Best Practices
Kafka Intro With Simple Java Producer Consumers
MapR Streams and MapR Converged Data Platform
Avro Tutorial - Records with Schema for Kafka and Hadoop
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Kafka Tutorial - basics of the Kafka streaming platform
MapR Data Analyst
Apache Kafka Best Practices
Ad

Similar to Processing IoT Data with Apache Kafka (20)

PDF
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
PDF
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
PDF
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
PDF
Processing IoT Data from End to End with MQTT and Apache Kafka
PPTX
Summer 2017 undergraduate research powerpoint
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
PDF
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
PDF
Internet of things
PDF
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
PDF
Gene Hynson [InfluxData] | How We Built the MQTT Native Collector | InfluxDay...
PDF
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
PDF
Talking Traffic: Data in the Driver's Seat (Dominique Chanet, Klarrio) Kafka ...
PDF
BigData in IoT #iotconfua
PPT
Camel one v3-6
PDF
Internet of Car Parks - a discussion about IoT
KEY
Evolution of a big data project
PDF
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Processing IoT Data from End to End with MQTT and Apache Kafka
Summer 2017 undergraduate research powerpoint
Getting insights from IoT data with Apache Spark and Apache Bahir
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Internet of things
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Gene Hynson [InfluxData] | How We Built the MQTT Native Collector | InfluxDay...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Talking Traffic: Data in the Driver's Seat (Dominique Chanet, Klarrio) Kafka ...
BigData in IoT #iotconfua
Camel one v3-6
Internet of Car Parks - a discussion about IoT
Evolution of a big data project
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...

Recently uploaded (20)

PDF
System and Network Administration Chapter 2
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
System and Network Administraation Chapter 3
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Introduction to Artificial Intelligence
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
L1 - Introduction to python Backend.pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
System and Network Administration Chapter 2
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
System and Network Administraation Chapter 3
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo POS Development Services by CandidRoot Solutions
Introduction to Artificial Intelligence
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Design an Analysis of Algorithms I-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
L1 - Introduction to python Backend.pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

Processing IoT Data with Apache Kafka

  • 1. 1 Processing IoT Data with Apache Kafka Matt Howlett Confluent Inc.
  • 2. 2 Pub Sub Messaging Protocol Pub Sub Messaging System (rethought as a distributed commit log) Distributed Streaming Platform ● Pub Sub Messaging ● Event Storage ● Processing Framework
  • 4. 4 Problem Statement Let’s build a system to: • Transport OBD-II data over unreliable links from cars to the data center • Capable of handling millions of devices* • Extract information from + respond to this data in (near) real time (at scale) • Handle surges in usage • Potential for ad-hoc historical processing * also less Architecture / technology / methods applicable to many scenarios.
  • 5. 5 Publish / subscribe messaging protocol: • Built on top of TCP/IP • Features that make it well suited to poor connectivity / high latency scenarios • Lightweight • Efficient client implementations, low network overhead • MQTT-SN for non IP networks (’virtual connections’) • Many (open source) broker implementations • Mosquitto, RabbitMQ, HiveMQ, VerneMQ • Many Client Libraries • C, C++, Java, C#, Python, Javascript, websockets, Arduino … • Widely used (incl. phone apps!) • Oil pipeline sensor via satellite link • Facebook Messenger • AWS IoT MQTT Introduction
  • 6. 6 • Simple API • Hierarchical topics • myhome/kitchen/door/front/battery/level • wildcard subscription: myhome/*/door/*/battery/level • 3 qualities of service (on both produce and consume) • At most once (QoS 0) • At least once (QoS 1) • Exactly once (QoS 2) [not universally supported] • Persistent consumer sessions • Important for QoS 1, QoS 2 • Last will and testament • Last known good value • Authorization, SSL/TLS MQTT Features
  • 7. 7 • Device Id • GPS Location [lon, lat] • Ignition on / off • Speedometer reading • Timestamp • …plus a lot more Assume: data sent via 3G wireless connection at ~30 second interval OBD-II Data
  • 8. 8 Deficiencies: • Single MQTT server can handle maybe ~100K connections • Can’t handle usage surges (no buffering) • No storage of events or reprocess capability MQTT Server 1 Processor 1 Processor 2 ... Ingest Architecture V1 topic: [deviceid]/obd
  • 9. 9 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd http / REST ... • Easily Shardable • Treat MQTT server as commodity service Ingest Architecture V2
  • 10. 10 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd Kafka Connect OBD_Data Stream processing kafka OBD -> MQTT -> Kafka
  • 11. 11 Apache Kafka Distributed Streaming Platform: • Pub Sub Messaging • (typically clients are within data-center) • Data Store • Messages not deleted after delivery • Stream Processing • Low or high level libraries • Data re-processing
  • 12. 12 Apache Kafka adoption spans companies across industries.
  • 13. 13 ● Persisted ● Append only ● Immutable ● Delete earliest data based on time / size / never
  • 14. 14 • Allows topics to scale past constraints of single server • Message → partition_id deterministic. Partition relevant to application. • Ordering guarantees per partition but not across partitions
  • 15. 15 Apache Kafka Replication • cheap durability! • choose # acks for message produced confirmation
  • 16. 16 Apache Kafka Consumer Groups partitions possibly across different brokers
  • 17. 17 Kafka Connect • Use client library producers / consumers in custom applications. • Often want to bulk transfer data between standard systems: • Don’t re-invent the wheel – configure Kafka Connect • Narrow scope: move data into & out of Kafka • Off-the-shelf connectors • Fault Tolerant • Auto-balances load • Pluggable Serialization • Standalone and distributed modes of operation • Configuration / management via REST API
  • 18. 18
  • 19. 19 MQTT Connector https://github.com/evokly/kafka-connect-mqtt • Single Task • Single MQTT Broker • Source only Either: • Start a bunch of these connectors (in one connect cluster), one per server, or: • Implement a new multi-task connector, one task per MQTT broker. • Communicate with MQTT Controller
  • 20. 20 • user_id • device_id • name • address • phone_number • speed_alert_level • ... SQL Db User_Info User Data
  • 21. 21 Example: Car Towed Alert Detect movement of car when ignition off, send SMS alert kafka OBD_Data P1 OBD_Data P5 Consumer 1 Consumer 2 Broker 1 ... OBD_Data P3 OBD_Data P7 Broker 2 ... ... ... SMS Gateway Last loc. in mem KV store Last loc. in mem KV store User Info
  • 22. 22 Consumer Implementation on_message(message m) { var device_id = m.key; var obd_data = m.value; if (obd_data.ignition_on) return; if (!kv_store.contains(device_id)) { kv_store.add(device_id, obd_data.lon_lat); return; } var prev_lon_lat = kv_store.get(device_id); var dist = calc_dist(obd_data.lon_lat, prev_lon_lat); kv_store.set(device_id, obd_data.lon_lat); if (dist > alert_max_dist) { // infrequent send_alert(SQL.get_phone_number(device_id)); } } • Message can be from any partition assigned to this consumer • Ordering guaranteed per partition, but not predictable across partitions • All messages from a particular device guaranteed to arrive at the same consumer instance
  • 23. 23 Example: Speed Alert • Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a specified speed. • In the Tow Alert example User_Info only needs to be queried in the event of an alert. • In this example, the table needs to be queried for every OBD data record in every partition. OBD_data [can update at any time] User Info table Not scalable! Cache? ... Highfrequency P1
  • 24. 24 Time = 0 1 60 {device_id=1, speed_limit=60} Time = 1 1 60 {device_id=2, speed_limit=80} 2 80 Time = 2 1 60 {device_id=3, speed_limit=70} 2 80 3 70 Time = 3 1 80 {device_id=1, speed_limit=80} 2 80 3 70 Time = 4 1 80 {device_id=1, speed_limit=65} 2 80 3 70 Table can be represented as stream of updates device_id speed_limit Log compaction!
  • 25. 25 Debezium Kafka Connector that turns database tables into streams of update records. debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL User Info [key: userId] User_Info [changelog topic]Partition by device_id
  • 26. 26 Stream / Table Join Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 ... Consumer 1 Relevant subset of User_Info device_id speed_limit 1 80 3 70 User_Info [ChangeLog, compacted] OBD_Data [Record Stream] ... debezium key:device_id key:device_id
  • 27. 27 Speed Alert: Message handler on_message(message m) { var device_id = m.key; var obd_data = m.value; var user_info = user_info_local.get(device_id); if (obd_data.speedometer > user_info.max_speed) { alert_user(device_id, user_info); } }
  • 28. 28 MQTT Phone Client Connectivity MQTT Server Coordinator MQTT Server 1 MQTT Server 2 [deviceid]/alert ... Consumer 1 ... MQTT Server 3 ... [deviceid]/obd
  • 29. 29 Speed Limit Alert: Rate limiting Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... app_state kafka topic • Prefer to rate limit on server to minimize network overhead. • Create new Kafka topic app_state, partitioned on device_id. • When alert triggered, store alert time in this topic. • [can use this topic as general store for other per device state info too] • Materialize this change-log stream on consumers as necessary.
  • 30. 30 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 ... Consumer 1 Relevant subset of User_Info ... OBD_Data [Record Stream] User_Info [ChangeLog, compacted] Partition 4 Partition 1 Partition 2 Partition 3 ... Partition 4 App_State [compacted] Relevant subset of App_State
  • 31. 31 Example: Location Based Special Offers When Car enters specific region, send available special offers to the user’s phone. Require: • User_Info • Address – so we know whether they are local to their current location or not • App_state • Use to persist already sent offers • Special_Offer_Info • Table that store list of all special offers.
  • 32. 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Regions • Regions may be simple (as depicted here) or complex • F(lon, lat) -> locationId. • Note: could also implement ride—share surge pricing using similar partitioning.
  • 33. 33 Special Offer Change-log Stream debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL Special Offer Info Special_Offers [changelog, compacted] Partition by location_id
  • 34. 34 Multi-stage Data Pipeline OBD_Data App_State [offers already sent] User_Info [address] K: device_id V: OBD record consume enrich K: device_id V: OBD record address K: device_id V: OBD record Address offers_sent enrich
  • 35. 35 Multi-stage Data Pipeline (continued) K: [device_id] V: OBD record Address offers_sent K: location_id V: OBD record Address offers_sent OBD_Data_By_Location P1 …… … Repartition by location_id P2 P1 P3 Data from given device will still all be on the same partition (except when region changes)
  • 36. 36 Multi-stage Data Pipeline (continued) K: location_id V: OBD record Address offers_sent Special_Offers K: location_id V: OBD record address offers_sent available_offers re-partition enrich
  • 37. 37 Multi-stage Data Pipeline (continued) Special offer available in location Special offer not already sent User address near location? MQTT Server filter filter filter ... [deviceId]/alert
  • 38. 38
  • 39. 39
  • 40. 40 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by