SlideShare a Scribd company logo
KSQL in Practice
Almog Gavra
Engineer, KSQL
2
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
3
Why
KSQL?
3
Performance and
Robustness from Apache
Kafka®
Why
KSQL?
3
Performance and
Robustness from Apache
Kafka®
Simplified Stream
Processing
Why
KSQL?
3
Performance and
Robustness from Apache
Kafka®
ksql>Familiar & Expressive
SQL-Like Language
Simplified Stream
Processing
Why
KSQL?
4
SELECT * FROM authorizations
WHERE card_number = 123;
User submits SQL
4
SELECT * FROM authorizations
WHERE card_number = 123;
User submits SQL KSQL builds
KafkaStreams topology
4
SELECT * FROM authorizations
WHERE card_number = 123;
User submits SQL KSQL builds
KafkaStreams topology
Kafka
Streams
KafkaStreams
executes the topology
5
Fraud Detection via KSQL
“find all card numbers that had more than 3 authorization
attempts within 5 seconds”
5
Fraud Detection via KSQL
CREATE TABLE possible_frauds AS
SELECT card_number, count(*)
FROM authorizations
GROUP BY card_number
HAVING count(*) > 3;
“find all card numbers that had more than 3 authorization
attempts within 5 seconds”
5
Fraud Detection via KSQL
CREATE TABLE possible_frauds AS
SELECT card_number, count(*)
FROM authorizations
GROUP BY card_number
HAVING count(*) > 3;
WINDOW TUMBLING (SIZE 5 SECONDS)
“find all card numbers that had more than 3 authorization
attempts within 5 seconds”
6
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
7
but first…
7
T
S
Stream/Table Dualitybut first…
8
T SA X B Y CAA XX BB YY CC
topic
A
key value
8
T SA X B Y C
A A
XX BB YY CC
topic
A
key value
8
T SA X B Y C
A A
X X
BB YY CC
topic
A
key value
8
T SA X B Y C
A A
X X
B
B
YY CC
topic
A
key value
8
T SA X B Y C
A A
X X
B
BY
Y
CC
topic
A
key value
8
T SA X B Y C
A A
X X
B
BY
Y
C
C
topic
A
key value
8
T SA X B Y C
A A
X X
B
BY
Y
C
C
topic
‣ a table is the current state of
your data
‣ the topic is a changelog
A
key value
8
T SA X B Y C
A A
X X
B
BY
Y
C
C
topic
‣ a table is the current state of
your data
‣ the topic is a changelog
‣ a stream is the historical state
of your data
‣ the topic is a sequence
A
key value
9
Import Data Create Stream/Table Write KSQL Queries Verify Output
Development Lifecycle
10
Import Data Create Stream/Table Write KSQL Queries Verify Output
Kafka
Connect
INSERT INTO foo VALUES *
* Feature Coming in CP 5.3
./ksql-datagen
11
Import Data Create Stream/Table Write KSQL Queries Verify Output
CREATE TABLE/STREAM name (schema …)
WITH (kafka_topic=“topic”);
DDL
11
Import Data Create Stream/Table Write KSQL Queries Verify Output
CREATE TABLE/STREAM name (schema …)
WITH (kafka_topic=“topic”);
DDL
CREATE TABLE/STREAM name (schema …)
AS SELECT * FROM source;
DML
12
Import Data Create Stream/Table Write KSQL Queries Verify Output
{ REST }
13
Import Data Create Stream/Table Write KSQL Queries Verify Output
UDF UDAF
14
Import Data Create Stream/Table Write KSQL Queries Verify Output
UDF
SELECT amount, OBFUSCATE(card_number) AS id

FROM AUTHORIZATIONS;
15
Import Data Create Stream/Table Write KSQL Queries Verify Output
UDAF
SELECT card_number, SUM(cost) AS total_cost 

FROM AUTHORIZATIONS WINDOW TUMBLING (SIZE 7 DAYS)
GROUP BY card_number;
16
Import Data Create Stream/Table Write KSQL Queries Verify Output
UDAF
SELECT card_number, SUM(cost) AS total_cost 

FROM AUTHORIZATIONS WINDOW TUMBLING (SIZE 7 DAYS)
GROUP BY card_number;
TS
(123, $10)
(123, $20)
(456, $40)
(123, $5)
(123_w1, $30)
(456_w1, $40)
(123_w2, $5)
17
Import Data Create Stream/Table Write KSQL Queries Verify Output
SELECT * FROM stream LIMIT 3
ksql-test-framework*
* Feature Coming in CP 5.3
18
Import Data Create Stream/Table Write KSQL Queries Verify Output
* Feature Coming in CP 5.3
ksql-test-framework*
{
"statements": [
"CREATE STREAM TEST (source VARCHAR) WITH (kafka_topic='test_topic');",
"CREATE STREAM O AS SELECT CONCAT('prefix-', source) AS C FROM TEST;"],
"inputs": [
{"topic": "test_topic", "value": {"source": "s1"}},
{"topic": "test_topic", "value": {"source": "s2"}],
"outputs": [
{"topic": "O", "value": {"C":"prefix-s1"}},
{"topic": "O", "value": {“C":"prefix-s2"}}]
}
18
Import Data Create Stream/Table Write KSQL Queries Verify Output
* Feature Coming in CP 5.3
ksql-test-framework*
{
"statements": [
"CREATE STREAM TEST (source VARCHAR) WITH (kafka_topic='test_topic');",
"CREATE STREAM O AS SELECT CONCAT('prefix-', source) AS C FROM TEST;"],
"inputs": [
{"topic": "test_topic", "value": {"source": "s1"}},
{"topic": "test_topic", "value": {"source": "s2"}],
"outputs": [
{"topic": "O", "value": {"C":"prefix-s1"}},
{"topic": "O", "value": {“C":"prefix-s2"}}]
}
19
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
20
Deployment Options
20
Interactive Mode
Deployment Options
20
Interactive Mode Headless Mode
Deployment Options
21
Interactive Mode Headless Mode
21
Interactive Mode Headless Mode
22
KSQL scales relative
to partitions in Kafka
22
KSQL scales relative
to partitions in Kafka
… and rebalances if a
replica is lost
22
KSQL scales relative
to partitions in Kafka
… and rebalances if a
replica is lost✓ tip: initially over partitioning
your input topics can help
scale!
23
23
application.id ksql.service.id
24
24
queries with unique
application.id
25
queries with identical
application.id
26
servers with identical
ksql.service.id
27
servers with unique
ksql.service.id
28
Recommended Deployment Topology
28
group queries by use-
case using service.id
Recommended Deployment Topology
29
group queries by use-
case using service.id
scale by increasing
replicas
Recommended Deployment Topology
29
group queries by use-
case using service.id
scale by increasing
replicas
queries auto-assign
application.id
Recommended Deployment Topology
30
Recommended Deployment Topology
group queries by use-
case using service.id
queries auto-assign
application.id
31
application.id = cluster-foo_123
32
application.id
ksql.service.id
=
query_prefix query_id+ +
cluster-foo_123
33
filters
projections
joins
aggregates
34
filters
projections
joins
aggregates
100 EPS
100 EPS
50 EPS
25 EPS
* assume that your max throughput is 100 EPS (it is
likely you can support orders of magnitude more)
*
35
joins
aggregates
see: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#stateful-transformations
35
joins
aggregates
S T
S S
T T
see: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#stateful-transformations
35
joins
aggregates
S T
S S
T T
see: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#stateful-transformations
36
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
37
processing log
37
processing log
37
processing log
38
processing log
38
processing log
KSQL_PROCESSING_LOG
38
processing log
KSQL_PROCESSING_LOG
✓ tip: the best way to debug
streaming is by streaming
39
39
JMX Interceptors
39
JMX Interceptors
error-rate: 0.0
num-persistent-queries: 2.0
num-active-queries: 2.0
messages-consumed-per-sec: 193024.78294586178
messages-produced-per-sec: 193025.4730374501
messages-consumed-max: 103397.81191436431
40
40
40
41
Encryption
see: https://www.confluent.io/kafka-summit-ny19/ksql-and-security
41
Encryption
Authentication
see: https://www.confluent.io/kafka-summit-ny19/ksql-and-security
41
Encryption
Authentication
Authorization
see: https://www.confluent.io/kafka-summit-ny19/ksql-and-security
42
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
43
Queries Are Continuous
43
Queries Are Continuous
44
44
new queries read from most
recent by default
44
new queries read from most
recent by default
SET ‘auto.offset.reset’ = ‘earliest’
45
joins may cause
repartition
A B B D C
B A C B A
45
joins may cause
repartition
A B B D C
B A C B A
A B BB A B
D CC
46
repartition does not
preserve order across
partitions…
46
repartition does not
preserve order across
partitions…
T
46
repartition does not
preserve order across
partitions…
T
so tables cannot be
rekeyed
47
47
crashes replay (live query) state
48
ksql>
48
ksql> ksql>ksql>
49
ksql> ksql>ksql>
50
● Intro
● Develop
● Deploy
● Operate
● Common Mistakes
● Q&A
in/agavra almog@confluent.io

More Related Content

ODP
Stream processing using Kafka
PDF
Getting Started with Confluent Schema Registry
PPTX
Grafana
PDF
Producer Performance Tuning for Apache Kafka
PPTX
Apache Flink and what it is used for
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
Fundamentals of Apache Kafka
PDF
Kafka Streams: What it is, and how to use it?
Stream processing using Kafka
Getting Started with Confluent Schema Registry
Grafana
Producer Performance Tuning for Apache Kafka
Apache Flink and what it is used for
Introduction to Apache Flink - Fast and reliable big data processing
Fundamentals of Apache Kafka
Kafka Streams: What it is, and how to use it?

What's hot (20)

PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Advanced backup methods (Postgres@CERN)
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Grafana introduction
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
PDF
Best Practices for Middleware and Integration Architecture Modernization with...
PPTX
Kafka presentation
PDF
ksqlDB: A Stream-Relational Database System
PPTX
Dapr: distributed application runtime
PPTX
Envoy and Kafka
PPTX
Apache Flink: API, runtime, and project roadmap
PPTX
Exactly-once Stream Processing with Kafka Streams
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Tour of Dapr
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
PDF
ksqlDB - Stream Processing simplified!
PPTX
Flink Streaming
PDF
Introduction to Kafka Streams
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Advanced backup methods (Postgres@CERN)
Apache Kafka Architecture & Fundamentals Explained
Grafana introduction
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Best Practices for Middleware and Integration Architecture Modernization with...
Kafka presentation
ksqlDB: A Stream-Relational Database System
Dapr: distributed application runtime
Envoy and Kafka
Apache Flink: API, runtime, and project roadmap
Exactly-once Stream Processing with Kafka Streams
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Tour of Dapr
Evening out the uneven: dealing with skew in Flink
MeetUp Monitoring with Prometheus and Grafana (September 2018)
ksqlDB - Stream Processing simplified!
Flink Streaming
Introduction to Kafka Streams
Ad

Similar to KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019 (20)

PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Riviera Jug - 20/03/2018 - KSQL
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
PDF
KSQL - Stream Processing simplified!
PDF
KSQL: Streaming SQL for Kafka
PDF
APAC ksqlDB Workshop
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
PDF
KSQL Intro
PDF
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
PPTX
Real Time Stream Processing with KSQL and Kafka
PDF
London Apache Kafka Meetup (Jan 2017)
PDF
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
PDF
KSQL---Streaming SQL for Apache Kafka
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
PDF
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
PPTX
Event streaming webinar feb 2020
PDF
KSQL: Open Source Streaming for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Riviera Jug - 20/03/2018 - KSQL
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL - Stream Processing simplified!
KSQL: Streaming SQL for Kafka
APAC ksqlDB Workshop
Un'introduzione a Kafka Streams e KSQL... and why they matter!
KSQL Intro
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Real Time Stream Processing with KSQL and Kafka
London Apache Kafka Meetup (Jan 2017)
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
KSQL---Streaming SQL for Apache Kafka
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Event streaming webinar feb 2020
KSQL: Open Source Streaming for Apache Kafka
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Modernizing your data center with Dell and AMD
CIFDAQ's Market Insight: SEC Turns Pro Crypto

KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019