Real Time Stream Processing with KSQL and Kafka

KSQL AND
REAL TIME STREAM PROCESSING WITH

DAVID PETERSON
Systems Engineer - Confluent APAC
@davidseth

Changing Architectures
Kafka?
Stream Processing
KSQL
KSQL in Production

QUICK INTRO TO CONFLUENT
69% of active Kafka Committers
Founded
September 2014
Technology developed
while at LinkedIn
Founded by the creators of
Apache Kafka

76%of Kafka code created
by Confluent team

Events
A Sale An Invoice A Trade A Customer
Experience

Real Time Stream Processing with KSQL and Kafka

CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ASSUMPTIONS...
Stream Data is
The Faster the Better
Big Data was
The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data

CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ARCHITECTURES…
Lambda
Big OR Fast
Speed Table Batch Table
DB
Streams Hadoop
Kappa
Big AND Fast
KSQL Stream
Kafka
HDFSCassandra Elastic
Topic A
Micro-
service

A CHANGE OF MINDSET...
KAFKA: EVENT CENTRIC THINKING

A CHANGE OF MINDSET...
AN EVENT-DRIVEN ENTERPRISE
● Everything is an event
● Available instantly to all applications
in a company
● Ability to query data as it arrives vs
when it is too late
● Simplifying the data architecture by
deploying a single platform
What are the possibilities?

It’s a massively scalable
distributed, fault tolerant,
publish & subscribe
key/value datastore with
infinite data retention
computing unbounded,
streaming data in real time.

It’s made up of 3 key primitives

Store Process
Publish &
Subscribe

Producer &
Consumer API
Connect API Streams API
Open-source client
libraries for numerous
languages. Direct
integration with your
systems.
Reliable and scalable
integration of Kafka
with other systems – no
coding required.
Low-level and DSL,
create applications &
microservices
to process your data in
real-time

Confidential 21
1.0
One<dot>Oh
release!
A Brief History of Apache Kafka and Confluent
0.11
Exactly-once
semantics
0.10
Stream
processing
0.9
Data
integration
Intra-cluster
replication
0.8
2012 2014
0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
2.0
☺

22
Producers
Kafka
cluster
Consumers

{“actor”:”bear”,
“x”:410, “y”:20}
“x”:410, “y”:20}
{“actor”:”racoon”,
“x”:380, “y”:20}
“x”:380, “y”:20}

“x”:380, “y”:22}
“x”:380, “y”:22}
“x”:350, “y”:22}
“x”:350, “y”:22}

“x”:350, “y”:25}
“x”:350, “y”:25}
“x”:330, “y”:25}
“x”:330, “y”:25}

“x”:280, “y”:32}
“x”:280, “y”:32}
“x”:310, “y”:32}
“x”:310, “y”:32}

Real Time stream processing with KSQL and Kafka SEP / API DAYS 41
Changelog stream – immutable events

Rebuild original table

KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y
45
Standard App
No need to create a separate cluster
Highly scaleable, elastic, fault tolerant

Confluent – Looking Forward J U L Y 46
Lives inside your application
Stream processing

Streams meet Tables
record stream
When you need… so that the topic is
interpreted as a
All the values of a key KStream
then you’d read the
Kafka topic into a
Example
All the places Alice
has ever been to
with messages
interpreted as
INSERT
(append)

Streams meet Tables
record stream
changelog stream
When you need… so that the topic is
interpreted as a
All the values of a key
Latest value of a key
KStream
KTable
then you’d read the
Kafka topic into a
Example
All the places Alice
has ever been to
Where Alice
is right now
with messages
interpreted as
INSERT
(append)
UPSERT
(overwrite
existing)

Same data, but different use cases require different interpretations
“Alice has been to SFO, NYC, Rio, Sydney,
Beijing, Paris, and finally Berlin.”
“Alice is in SFO, NYC, Rio, Sydney,
Beijing, Paris, Berlin right now.”
⚑ ⚑
⚑⚑
⚑
⚑
⚑ ⚑ ⚑
⚑⚑
⚑
⚑
⚑
Use case 1: Frequent traveler status? Use case 2: Current location?
KStream KTable

KSQL — get started fast with Stream Processing
Kafka
(data)
KSQL
(processing)
read,
write
network
All you need is Kafka – no complex deployments of
bespoke systems for stream processing!
CREATE STREAM
CREATE TABLE
SELECT …and more…

Confluent – Looking Forward J U L Y 52
● No need for source code deployment
○ Zero, none at all, not even one tiny file
● All the Kafka Streams capabilities out-of-
the-box
○ Exactly Once Semantics
○ Windowing
○ Event-time aggregation
○ Late-arriving data
○ Distributed, fault-tolerant, scalable, ...
KSQL
Concepts

KSQL — SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition

KSQL
are some
what
use cases?
10+5

KSQL — Data exploration
An easy way to inspect data in Kafka
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
SHOW TOPICS;
PRINT 'my-topic' FROM BEGINNING;

KSQL — Data enrichment
Join data from a variety of sources to see the full picture
CREATE STREAM enriched_payments AS
SELECT payment_id, u.country, total
FROM payments_stream p
LEFT JOIN users_table u
ON p.user_id = u.user_id;
Stream-table join

KSQL — Streaming ETL
Filter, cleanse, process data while it is moving
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level ='Platinum';

KSQL — Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING COUNT(*) > 3;
… per 5 min windows
Aggregate data
Aggregate data to identify patterns or anomalies in real-time

KSQL — Real time monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 3;

KSQL — Data transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON') AS
SELECT * FROM clickstream
PARTITION BY user_id; Re-key the data
Convert data to JSON

KSQL — Stream to Stream JOINs
Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2-
hour window.
CREATE STREAM late_orders AS
SELECT o.orderid, o.itemid FROM orders o
FULL OUTER JOIN shipments s WITHIN 2 HOURS
ON s.orderid = o.orderid WHERE s.orderid IS NULL;

INSERT INTO statement for Streams
CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...);
CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...);
CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...);
-- Merge the streams into `all_sales`
INSERT INTO all_sales SELECT itemId, price FROM sales_online;
INSERT INTO all_sales SELECT itemId, price FROM sales_offline;
CREATE TABLE daily_sales_per_item AS
SELECT itemId, SUM(price) FROM all_sales
WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId;
Example: Compute daily sales per item across online and offline stores

KSQL — Demo
customers
Kafka Connect
streams data in
Kafka Connect
streams data out
KSQL processes
table changes
in real-time

KSQL — Deep Learning for IoT Sensor Analytics
KSQL UDF using an analytic model under the hood
→ Write once, use in any KSQL statement
SELECT event_id
anomaly(SENSORINPUT)
FROM health_sensor;
User Defined Function

KSQL — User
Defined
Function (UDF)

Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streaming
restore” of
A’s local state to B
Changelog Topic
“streaming
backup” of
A’s local state
KSQL
Kafka
A key challenge of distributed stream processing is fault-tolerant state.
State is automatically migrated
in case of server failure
Server B:
“I restore the state and
continue processing where
server A stopped.”
Fault-Tolerance, powered by Kafka

Processing fails over automatically, without data loss or miscomputation.
1 Kafka consumer group
rebalance is triggered
2 Processing and state of #3
is migrated via Kafka to
remaining servers #1 + #2
4 Part of processing incl.
state is migrated via Kafka
from #1 + #2 to server #3
#3 is back so the work is split again#3 died so #1 and #2 take over
Fault-Tolerance, powered by Kafka

You can add, remove, restart servers in KSQL clusters during live operations.
2 Part of processing incl.
state is migrated via Kafka
to additional server processes
“We need more processing power!”
Kafka consumer group
3
4 Processing incl. state of
stopped servers is migrated
via Kafka to remaining servers
“Ok, we can scale down again.”
Elasticity and Scalability, powered by Kafka

KSQLis the
Streaming
SQL Engine
for
Apache Kafka

Resources and Next Steps
• Try the demo on GitHub :)
• Check out the code
• Play with the examples
Download Confluent Open Source: https://www.confluent.io/download/
Chat with us: https://slackpass.io/confluentcommunity #ksql
https://github.com/confluentinc/demo-scene

KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y
84
The World’s Best Streaming Platform — Everywhere
DAVID PETERSON
Systems Engineer - Confluent APAC
david.peterson@confluent.io | @davidseth

Real Time Stream Processing with KSQL and Kafka

More Related Content

What's hot (18)

Similar to Real Time Stream Processing with KSQL and Kafka (20)

More from David Peterson (8)

Recently uploaded (20)

Real Time Stream Processing with KSQL and Kafka

Editor's Notes