SlideShare a Scribd company logo
Windowing in Kafka Streams and Flink SQL
Bill Bejeck, Staff DevX Engineer
Apache Kafka committer and PMC member
bill@confluent.io | @bbejeck
Wanna Buy a Book?
“Kafka Streams in Action” - 2nd edition published!
2
Overview
@bbejeck
Kafka Streams – DSL API
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String > stockStr = builder.stream(”stocks”);
stockStr.groupByKey()
.windowedBy(...)
.aggregate(()-> new TradeStats(),
(k, v, tradeStats) -> tradeStats.add(v))
.toStream()
.map(..)
.to(“output”);
4
@bbejeck
Processor Topology
builder.stream(..)
stockStr.groupByKey()
.windowedBy()
.aggregate()
map()
to(..)
5
@bbejeck
Tasks
6
@bbejeck
Stream Threads
7
Architecture Overview - Flink
8
@bbejeck
9
@bbejeck
Task Distribution & Assignment – Flink
Architecture Overview – Flink SQL
10
@bbejeck
11
@bbejeck
Architecture Overview – Flink SQL
Why Windowing?
13
@bbejeck
Otter and Squirrel Wildlife Refuge
14
@bbejeck
Why Windowing?
15
@bbejeck
Why Windowing?
16
@bbejeck
Why Windowing?
17
@bbejeck
Why Windowing?
Hopping Windows
@bbejeck
Use Case
“Every 30 seconds give me the average temperature of the
squirrel and otter pens over the last minute”
19
@bbejeck
Hopping Windows
20
@bbejeck
Hopping Windows
21
@bbejeck
Kafka Streams – Hopping Window
stockStr.groupByKey()
.windowedBy(
TimeWindows
.ofSizeWithNoGrace(Duration.ofMinutes(1)
.advanceBy(Duration.ofSeconds(30))
)
.aggregate(..)
...
22
@bbejeck
Flink SQL – Windowing TVF
23
@bbejeck
Flink SQL – Hopping Window
SELECT window_start, window_end, device_id,AVG(reading) AS avg_reading
FROM TABLE(HOP (
TABLE device_readings,
DESCRIPTOR(ts),
INTERVAL '30' SECONDS,
INTERVAL '1' MINUTES
))
GROUP BY window_start, window_end, device_id
24
Tumbling Windows
@bbejeck
Use Case
“Every hour give me a count of visitors entering the
park”
26
@bbejeck
Tumbling Windows
27
@bbejeck
Tumbling Windows
28
@bbejeck
Kafka Streams – Tumbling Window
stockStr.groupByKey()
.windowedBy(
TimeWindows
.ofSizeWithNoGrace(Duration.ofMinutes(1))
)
.aggregate(..)
...
29
@bbejeck
Flink SQL – Tumbling Window
SELECT window_start, window_end, device_id,AVG(reading) AS avg_reading
FROM TABLE(TUMBLE (
TABLE device_readings,
DESCRIPTOR(ts),
INTERVAL '1' MINUTES
))
GROUP BY window_start, window_end, device_id
30
Cumulating Windows
@bbejeck
Use Case
“Give me total food consumption every hour, updated at
15-minute intervals”
32
@bbejeck
Cumulating Windows
33
@bbejeck
Cumulating Windows
34
@bbejeck
Cumulating Windows
SELECT window_start, window_end, user_id, SUM(page_view)
FROM TABLE(CUMULATE (
TABLE device_readings,
DESCRIPTOR(ts),
INTERVAL ‘15' SECONDS,
INTERVAL '1' MINUTES
))
GROUP BY window_start, window_end, user_id
35
Sliding Windows
@bbejeck
Use Case
“Give me a rolling average of power usage from devices
that report within 30 seconds of each other”
37
@bbejeck
Sliding Windows
38
@bbejeck
Kafka Streams – Sliding Window
stockStr.groupByKey()
.windowedBy(
SlidingWindows.ofTimeDifferenceWithNoGrace
(Duration.ofMinutes(1))
)
.aggregate(..)
...
39
@bbejeck
Flink SQL – OVER Aggregation
40
@bbejeck
Flink SQL – OVER Aggregation
41
@bbejeck
Flink SQL – OVER Aggregation
42
@bbejeck
Flink SQL – OVER Aggregation
SELECT device_id, report_time,
AVG(temp_reading) OVER (
PARTITION BY location
ORDER BY report_time
RANGE BETWEEN INTERVAL '1' MINUTE PRECEDING
AND CURRENT ROW
) AS one_minute_location_temp_averages
FROM readings;
43
Session Windows
@bbejeck
Use Case
“I need to track the otter usage of the pool”
“How do customers interact with our website?”
45
@bbejeck
Session Window
46
@bbejeck
Kafka Streams - Session Window
stockStr.groupByKey()
.windowedBy(
SessionWindows.ofInactivityGapWithNoGrace(
Duration.ofMinutes(1))
)
.aggregate(..)
...
47
@bbejeck
Flink SQL – Session Window
SELECT window_start, window_end COUNT(click) AS total_clicks
FROM TABLE(SESSION (
TABLE device_readings,
DESCRIPTOR(ts),
INTERVAL '1' MINUTES
))
GROUP BY window_start, window_end;
48
Time Semantics
50
@bbejeck
Time Semantics – Stream Processing
51
@bbejeck
Time Semantics – Alignment
52
@bbejeck
Time Semantics – Alignment
53
@bbejeck
Time Semantics – Alignment
54
@bbejeck
Time Semantics – Advancement
55
@bbejeck
Time Semantics – Advancement Kafka Streams
56
@bbejeck
Time Semantics – Advancement Flink SQL
@bbejeck
Flink SQL – Specify Watermarks
CREATE TABLE ratings (
rating_id INT,
title STRING,
release_year INT,
rating DOUBLE,
rating_time TIMESTAMP(3),
WATERMARK FOR rating_time AS rating_time
)
57
@bbejeck
Kafka Streams – Handling out of order
stockStr.groupByKey()
.windowedBy(
TimeWindows
.ofSizeAndGrace(Duration.ofMinutes(1),
Duration.ofSeconds(30))
)
.aggregate(..)
...
58
@bbejeck
Flink SQL – Handling out of order
CREATE TABLE ratings (
rating_id INT,
title STRING,
release_year INT,
rating DOUBLE,
rating_time TIMESTAMP(3),
WATERMARK FOR rating_time AS rating_time - INTERVAL '30' SECOND
)
59
Analysis
61
@bbejeck
Analysis – Kafka Streams
@bbejeck
Analysis – Kafka Streams
public class WindowTimeToAggregateMapper implements KeyValueMapper<..>
@Override
public KeyValue<String, Agg> apply(Windowed<String> windowed,
Agg myAgg) {
long start = windowed.window().start();
long end = windowed.window().end();
myAgg.setWindowStart(start);
myAgg.setWindowEnd(end);
return KeyValue.pair(windowed.key(), myAgg);
}
}
62
@bbejeck
Analysis – Kafka Streams
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String > stockStr = builder.stream(”stocks”);
stockStr.groupByKey()
.windowedBy(...)
.aggregate(()-> new TradeStats(),
(k, v, tradeStats) -> tradeStats.add(v))
.toStream()
.map(new WindowTimeToAggregateMapper())
.to(“output”);
63
@bbejeck
Analysis - Flink SQL
SELECT location, device_id, report_time, avg_temps
FROM (
SELECT location, device_id, report_time, Avg(reading)
OVER (ORDER BY report_time
RANGE BETWEEN INTERVAL '15' MINUTE PRECEDING AND CURRENT ROW
) AS avg_temps
FROM readings
)
WHERE avg_temps > N;
64
@bbejeck
Analysis - Flink SQL
CREATE TABLE reading_alerts (location STRING,
device_id STRING,
report_time TIMESTAMP(3),
reading_alerts DOUBLE);
65
@bbejeck
Analysis - Flink SQL
INSERT INTO reading_alerts
SELECT location, device_id, report_time, avg_temps
FROM (
SELECT location, device_id, report_time, Avg(reading)
OVER (ORDER BY report_time
RANGE BETWEEN INTERVAL '15' MINUTE PRECEDING AND CURRENT ROW
) AS avg_temps
FROM readings
)
WHERE avg_temps > N;
66
Testing
@bbejeck
Testing - Kafka Streams
try (TopologyTestDriver driver = new TopologyTestDriver(topology)) {
TestInputTopic<String, String> inputTopic = driver.createInputTopic(....)
TestOutputTopic<String, Agg> outputTopic = driver.createOutputTopic(…)
Instant instant = Instant.now();
int advanceOne = 20;
int advanceTwo = 40;
}
68
@bbejeck
Testing - Kafka Streams
try (TopologyTestDriver driver = new TopologyTestDriver(topology)) {
TestInputTopic<String, String> inputTopic = driver.createInputTopic(....)
TestOutputTopic<String, Agg> outputTopic = driver.createOutputTopic(…)
LocalDateTime localDateTime = LocalDateTime.of(localDate.getYear(),
localDate.getMonthValue(),
localDate.getDayOfMonth(),
12, 0, 18);
Instant instant = localDateTime.toInstant(ZoneOffset.UTC);
69
@bbejeck
Testing - Kafka Streams
try (TopologyTestDriver driver = new TopologyTestDriver(topology)) {
TestInputTopic<String, String> inputTopic = driver.createInputTopic(....)
TestOutputTopic<String, Agg> outputTopic = driver.createOutputTopic(…)
inputTopic.pipeInput(key, value, instant);
inputTopic.pipeInput(key, value, instant.plusSeconds(advanceOne));
inputTopic.pipeInput(key, value, instant.plusSeconds(advanceTwo));
}
70
@bbejeck
Testing – Flink SQL
StreamTableEnvironment.executeSql(...)
TableResult tableResult = streamTableEnv.exeucuteSql(...)
71
@bbejeck
Testing – Flink SQL
List<Row> rowResults = extract(tableResult.collect())
...
List<Row> expected = [ Row.ofKind(...) ]
...
assert(rowResults.contain(expected))
72
@bbejeck
Resources
• Apache Flink® on Confluent Cloud - https://www.confluent.io/product/flink/
• Kafka Streams 101 - https://developer.confluent.io/learn-kafka/kafka-
streams/get-started/
• Apache Flink 101 - https://developer.confluent.io/courses/apache-flink/intro/
• Kafka Streams in Action - 2nd edition published!
• https://www.manning.com/books/kafka-streams-in-action-second-edition
73

More Related Content

PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PPTX
Stream Analytics with SQL on Apache Flink
PDF
Stream Processing with Apache Flink
PDF
Stream Processing with Flink and Stream Sharing
PDF
Unified Stream and Batch Processing with Apache Flink
PDF
Apache Flink - a Gentle Start
PPTX
Real-time Stream Processing with Apache Flink
PDF
Data Stream Analytics - Why they are important
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
Stream Processing with Apache Flink
Stream Processing with Flink and Stream Sharing
Unified Stream and Batch Processing with Apache Flink
Apache Flink - a Gentle Start
Real-time Stream Processing with Apache Flink
Data Stream Analytics - Why they are important

Similar to Windowing in Kafka Streams and Flink SQL (20)

PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
PDF
Kafka Streams Windows: Behind the Curtain
PDF
Kafka streams windowing behind the curtain
PPTX
The Stream Processor as a Database Apache Flink
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PDF
Continuous SQL with Apache Streaming (FLaNK and FLiP)
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
PPTX
QCon London - Stream Processing with Apache Flink
PDF
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
PPTX
Apache Flink Training: DataStream API Part 2 Advanced
PDF
Flink Streaming Berlin Meetup
PDF
Apples and Oranges - Comparing Kafka Streams and Flink with Bill Bejeck
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Real-time Stream Processing with Apache Flink @ Hadoop Summit
PPTX
Flink meetup
PDF
Feeding a Squirrel in Time---Windows in Flink
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Kafka Streams Windows: Behind the Curtain
Kafka streams windowing behind the curtain
The Stream Processor as a Database Apache Flink
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Why and how to leverage the power and simplicity of SQL on Apache Flink
QCon London - Stream Processing with Apache Flink
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
GOTO Night Amsterdam - Stream processing with Apache Flink
Apache Flink Training: DataStream API Part 2 Advanced
Flink Streaming Berlin Meetup
Apples and Oranges - Comparing Kafka Streams and Flink with Bill Bejeck
K. Tzoumas & S. Ewen – Flink Forward Keynote
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Flink meetup
Feeding a Squirrel in Time---Windows in Flink
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Ad

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Mushroom cultivation and it's methods.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
August Patch Tuesday
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
Chapter 5: Probability Theory and Statistics
Assigned Numbers - 2025 - Bluetooth® Document
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Hybrid model detection and classification of lung cancer
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
DP Operators-handbook-extract for the Mautical Institute
Hindi spoken digit analysis for native and non-native speakers
MIND Revenue Release Quarter 2 2025 Press Release
Enhancing emotion recognition model for a student engagement use case through...
1. Introduction to Computer Programming.pptx
Web App vs Mobile App What Should You Build First.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Mushroom cultivation and it's methods.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
August Patch Tuesday
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
WOOl fibre morphology and structure.pdf for textiles

Windowing in Kafka Streams and Flink SQL