SlideShare a Scribd company logo
KSQL AND
REAL TIME STREAM PROCESSING WITH
DAVID PETERSON
Systems Engineer - Confluent APAC
@davidseth
Changing Architectures
Kafka?
Stream Processing
KSQL
KSQL in Production
QUICK INTRO TO CONFLUENT
69% of active Kafka Committers
Founded
September 2014
Technology developed
while at LinkedIn
Founded by the creators of
Apache Kafka
76%of Kafka code created
by Confluent team
Changing
Architectures
Events
A Sale An Invoice A Trade A Customer
Experience
Real Time Stream Processing with KSQL and Kafka
CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ASSUMPTIONS...
Stream Data is
The Faster the Better
Big Data was
The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data
CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ARCHITECTURES…
Lambda
Big OR Fast
Speed Table Batch Table
DB
Streams Hadoop
Kappa
Big AND Fast
KSQL Stream
Kafka
HDFSCassandra Elastic
Topic A
Micro-
service
A CHANGE OF MINDSET...
KAFKA: EVENT CENTRIC THINKING
A CHANGE OF MINDSET...
AN EVENT-DRIVEN ENTERPRISE
● Everything is an event
● Available instantly to all applications
in a company
● Ability to query data as it arrives vs
when it is too late
● Simplifying the data architecture by
deploying a single platform
What are the possibilities?
Real Time Stream Processing with KSQL and Kafka
It’s a massively scalable
distributed, fault tolerant,
publish & subscribe
key/value datastore with
infinite data retention
computing unbounded,
streaming data in real time.
So, what is Kafka really?
It’s made up of 3 key primitives
Store Process
Publish &
Subscribe
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Producer &
Consumer API
Connect API Streams API
Open-source client
libraries for numerous
languages. Direct
integration with your
systems.
Reliable and scalable
integration of Kafka
with other systems – no
coding required.
Low-level and DSL,
create applications &
microservices
to process your data in
real-time
Confidential 21
1.0
One<dot>Oh
release!
A Brief History of Apache Kafka and Confluent
0.11
Exactly-once
semantics
0.10
Stream
processing
0.9
Data
integration
Intra-cluster
replication
0.8
2012 2014
0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
2.0
☺
22
Producers
Kafka
cluster
Consumers
So, what exactly is a stream?
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
1. TOPIC
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
{“actor”:”bear”,
“x”:410, “y”:20}
{“actor”:”bear”,
“x”:410, “y”:20}
{“actor”:”racoon”,
“x”:380, “y”:20}
{“actor”:”racoon”,
“x”:380, “y”:20}
{“actor”:”bear”,
“x”:380, “y”:22}
{“actor”:”bear”,
“x”:380, “y”:22}
{“actor”:”racoon”,
“x”:350, “y”:22}
{“actor”:”racoon”,
“x”:350, “y”:22}
{“actor”:”bear”,
“x”:350, “y”:25}
{“actor”:”bear”,
“x”:350, “y”:25}
{“actor”:”racoon”,
“x”:330, “y”:25}
{“actor”:”racoon”,
“x”:330, “y”:25}
{“actor”:”racoon”,
“x”:280, “y”:32}
{“actor”:”racoon”,
“x”:280, “y”:32}
{“actor”:”bear”,
“x”:310, “y”:32}
{“actor”:”bear”,
“x”:310, “y”:32}
2.STR
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
3.TABLE
Exposure Sheet
Real Time stream processing with KSQL and Kafka SEP / API DAYS 41
Changelog stream – immutable events
Real Time stream processing with KSQL and Kafka SEP / API DAYS 42
Rebuild original table
Real Time Stream Processing with KSQL and Kafka
Stream Processing
KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y
45
Standard App
No need to create a separate cluster
Highly scaleable, elastic, fault tolerant
Confluent – Looking Forward J U L Y 46
Lives inside your application
Stream processing
Real Time stream processing with KSQL and Kafka SEP / API DAYS 47
Streams meet Tables
record stream
When you need… so that the topic is
interpreted as a
All the values of a key KStream
then you’d read the
Kafka topic into a
Example
All the places Alice
has ever been to
with messages
interpreted as
INSERT
(append)
Real Time stream processing with KSQL and Kafka SEP / API DAYS 48
Streams meet Tables
record stream
changelog stream
When you need… so that the topic is
interpreted as a
All the values of a key
Latest value of a key
KStream
KTable
then you’d read the
Kafka topic into a
Example
All the places Alice
has ever been to
Where Alice
is right now
with messages
interpreted as
INSERT
(append)
UPSERT
(overwrite
existing)
Real Time stream processing with KSQL and Kafka SEP / API DAYS 49
Same data, but different use cases require different interpretations
“Alice has been to SFO, NYC, Rio, Sydney,
Beijing, Paris, and finally Berlin.”
“Alice is in SFO, NYC, Rio, Sydney,
Beijing, Paris, Berlin right now.”
⚑ ⚑
⚑⚑
⚑
⚑
⚑ ⚑ ⚑
⚑⚑
⚑
⚑
⚑
Use case 1: Frequent traveler status? Use case 2: Current location?
KStream KTable
KSQL
Real Time stream processing with KSQL and Kafka SEP / API DAYS 51
KSQL — get started fast with Stream Processing
Kafka
(data)
KSQL
(processing)
read,
write
network
All you need is Kafka – no complex deployments of
bespoke systems for stream processing!
CREATE STREAM
CREATE TABLE
SELECT …and more…
Confluent – Looking Forward J U L Y 52
● No need for source code deployment
○ Zero, none at all, not even one tiny file
● All the Kafka Streams capabilities out-of-
the-box
○ Exactly Once Semantics
○ Windowing
○ Event-time aggregation
○ Late-arriving data
○ Distributed, fault-tolerant, scalable, ...
KSQL
Concepts
Real Time stream processing with KSQL and Kafka SEP / API DAYS 53
KSQL — SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
KSQL
are some
what
use cases?
10+5
Real Time stream processing with KSQL and Kafka SEP / API DAYS 55
KSQL — Data exploration
An easy way to inspect data in Kafka
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
SHOW TOPICS;
PRINT 'my-topic' FROM BEGINNING;
Real Time stream processing with KSQL and Kafka SEP / API DAYS 56
KSQL — Data enrichment
Join data from a variety of sources to see the full picture
CREATE STREAM enriched_payments AS
SELECT payment_id, u.country, total
FROM payments_stream p
LEFT JOIN users_table u
ON p.user_id = u.user_id;
Stream-table join
Real Time stream processing with KSQL and Kafka SEP / API DAYS 57
KSQL — Streaming ETL
Filter, cleanse, process data while it is moving
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level ='Platinum';
Real Time stream processing with KSQL and Kafka SEP / API DAYS 58
KSQL — Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING COUNT(*) > 3;
… per 5 min windows
Aggregate data
Aggregate data to identify patterns or anomalies in real-time
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
TIME
STREAMING
TUMBLING
HOPPING
SESSION
Real Time stream processing with KSQL and Kafka SEP / API DAYS 64
KSQL — Real time monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 3;
Real Time stream processing with KSQL and Kafka SEP / API DAYS 65
KSQL — Data transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON') AS
SELECT * FROM clickstream
PARTITION BY user_id; Re-key the data
Convert data to JSON
Real Time stream processing with KSQL and Kafka SEP / API DAYS 66
KSQL — Stream to Stream JOINs
Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2-
hour window.
CREATE STREAM late_orders AS
SELECT o.orderid, o.itemid FROM orders o
FULL OUTER JOIN shipments s WITHIN 2 HOURS
ON s.orderid = o.orderid WHERE s.orderid IS NULL;
Real Time stream processing with KSQL and Kafka SEP / API DAYS 67
INSERT INTO statement for Streams
CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...);
CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...);
CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...);
-- Merge the streams into `all_sales`
INSERT INTO all_sales SELECT itemId, price FROM sales_online;
INSERT INTO all_sales SELECT itemId, price FROM sales_offline;
CREATE TABLE daily_sales_per_item AS
SELECT itemId, SUM(price) FROM all_sales
WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId;
Example: Compute daily sales per item across online and offline stores
Real Time stream processing with KSQL and Kafka SEP / API DAYS 68
KSQL — Demo
customers
Kafka Connect
streams data in
Kafka Connect
streams data out
KSQL processes
table changes
in real-time
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
Real Time stream processing with KSQL and Kafka SEP / API DAYS 72
KSQL — Deep Learning for IoT Sensor Analytics
KSQL UDF using an analytic model under the hood
→ Write once, use in any KSQL statement
SELECT event_id
anomaly(SENSORINPUT)
FROM health_sensor;
User Defined Function
Real Time stream processing with KSQL and Kafka SEP / API DAYS 73
KSQL — User
Defined
Function (UDF)
Putting KSQL into
Production
DEPLOYI
NG
KSQL
CLI
REST
CODE
Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streaming
restore” of
A’s local state to B
Changelog Topic
“streaming
backup” of
A’s local state
KSQL
Kafka
A key challenge of distributed stream processing is fault-tolerant state.
State is automatically migrated
in case of server failure
Server B:
“I restore the state and
continue processing where
server A stopped.”
Fault-Tolerance, powered by Kafka
Processing fails over automatically, without data loss or miscomputation.
1 Kafka consumer group
rebalance is triggered
2 Processing and state of #3
is migrated via Kafka to
remaining servers #1 + #2
3 Kafka consumer group
rebalance is triggered
4 Part of processing incl.
state is migrated via Kafka
from #1 + #2 to server #3
#3 is back so the work is split again#3 died so #1 and #2 take over
Fault-Tolerance, powered by Kafka
You can add, remove, restart servers in KSQL clusters during live operations.
1 Kafka consumer group
rebalance is triggered
2 Part of processing incl.
state is migrated via Kafka
to additional server processes
“We need more processing power!”
Kafka consumer group
rebalance is triggered
3
4 Processing incl. state of
stopped servers is migrated
via Kafka to remaining servers
“Ok, we can scale down again.”
Elasticity and Scalability, powered by Kafka
PARALLELI
PARALLELI
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Real Time stream processing with KSQL and Kafka SEP / API DAYS 83
Resources and Next Steps
• Try the demo on GitHub :)
• Check out the code
• Play with the examples
Download Confluent Open Source: https://www.confluent.io/download/
Chat with us: https://slackpass.io/confluentcommunity #ksql
https://github.com/confluentinc/demo-scene
KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y
84
The World’s Best Streaming Platform — Everywhere
DAVID PETERSON
Systems Engineer - Confluent APAC
david.peterson@confluent.io | @davidseth

More Related Content

PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
PDF
Streams, Tables, and Time in KSQL
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
PDF
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
PDF
KSQL: Open Source Streaming for Apache Kafka
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
KSQL and Kafka Streams – When to Use Which, and When to Use Both
Real-Time Stream Processing with KSQL and Apache Kafka
Streams, Tables, and Time in KSQL
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Un'introduzione a Kafka Streams e KSQL... and why they matter!
KSQL: Open Source Streaming for Apache Kafka

What's hot (18)

PDF
I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか
PDF
Fast NoSQL from HDDs?
PDF
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
PDF
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
PDF
AWS Kinesis Streams
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PDF
Crossing the streams viktor gamov
PPTX
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
PDF
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
PDF
Data Driven Enterprise with Apache Kafka
PDF
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
PPTX
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
PDF
One Kubernetes to rule them all (ZEUS 2019 Keynote)
PPTX
Realtime stream processing with kafka
PPTX
Running Presto and Spark on the Netflix Big Data Platform
PPTX
Building a Lambda Architecture with Elasticsearch at Yieldbot
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか
Fast NoSQL from HDDs?
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
AWS Kinesis Streams
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Crossing the streams viktor gamov
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Data Driven Enterprise with Apache Kafka
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
One Kubernetes to rule them all (ZEUS 2019 Keynote)
Realtime stream processing with kafka
Running Presto and Spark on the Netflix Big Data Platform
Building a Lambda Architecture with Elasticsearch at Yieldbot
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Ad

Similar to Real Time Stream Processing with KSQL and Kafka (20)

PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
PDF
Streaming ETL with Apache Kafka and KSQL
PPTX
Event streaming webinar feb 2020
PDF
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
PDF
Riviera Jug - 20/03/2018 - KSQL
PDF
APAC ksqlDB Workshop
PDF
Live Coding a KSQL Application
PDF
ksqlDB Workshop
PPTX
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
PDF
Jug - ecosystem
PDF
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
PDF
Chti jug - 2018-06-26
PDF
KSQL - Stream Processing simplified!
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Introduction to apache kafka, confluent and why they matter
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Streaming ETL with Apache Kafka and KSQL
Event streaming webinar feb 2020
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Riviera Jug - 20/03/2018 - KSQL
APAC ksqlDB Workshop
Live Coding a KSQL Application
ksqlDB Workshop
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Jug - ecosystem
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Chti jug - 2018-06-26
KSQL - Stream Processing simplified!
Introduction to KSQL: Streaming SQL for Apache Kafka®
Ad

More from David Peterson (8)

PDF
Personalisation in the Open Marketing Cloud
PDF
Enabling Government through Open Source
PDF
Architecting govCMS: Australian Government as a Service -
PDF
Better User Experience through Personalisation in Drupal
PDF
Bringing History Alive: Telling stories with Linked Data and open source tools
PDF
Drupal case study: ABC Dig Music
PPTX
Mashed Up Playlist
PDF
Semantic Web For Distributed Social Networks
Personalisation in the Open Marketing Cloud
Enabling Government through Open Source
Architecting govCMS: Australian Government as a Service -
Better User Experience through Personalisation in Drupal
Bringing History Alive: Telling stories with Linked Data and open source tools
Drupal case study: ABC Dig Music
Mashed Up Playlist
Semantic Web For Distributed Social Networks

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing

Real Time Stream Processing with KSQL and Kafka

  • 1. KSQL AND REAL TIME STREAM PROCESSING WITH
  • 2. DAVID PETERSON Systems Engineer - Confluent APAC @davidseth
  • 4. QUICK INTRO TO CONFLUENT 69% of active Kafka Committers Founded September 2014 Technology developed while at LinkedIn Founded by the creators of Apache Kafka
  • 5. 76%of Kafka code created by Confluent team
  • 7. Events A Sale An Invoice A Trade A Customer Experience
  • 9. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ASSUMPTIONS... Stream Data is The Faster the Better Big Data was The More the Better ValueofData Volume of Data ValueofData Age of Data
  • 10. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ARCHITECTURES… Lambda Big OR Fast Speed Table Batch Table DB Streams Hadoop Kappa Big AND Fast KSQL Stream Kafka HDFSCassandra Elastic Topic A Micro- service
  • 11. A CHANGE OF MINDSET... KAFKA: EVENT CENTRIC THINKING
  • 12. A CHANGE OF MINDSET... AN EVENT-DRIVEN ENTERPRISE ● Everything is an event ● Available instantly to all applications in a company ● Ability to query data as it arrives vs when it is too late ● Simplifying the data architecture by deploying a single platform What are the possibilities?
  • 14. It’s a massively scalable distributed, fault tolerant, publish & subscribe key/value datastore with infinite data retention computing unbounded, streaming data in real time.
  • 15. So, what is Kafka really?
  • 16. It’s made up of 3 key primitives
  • 20. Producer & Consumer API Connect API Streams API Open-source client libraries for numerous languages. Direct integration with your systems. Reliable and scalable integration of Kafka with other systems – no coding required. Low-level and DSL, create applications & microservices to process your data in real-time
  • 21. Confidential 21 1.0 One<dot>Oh release! A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Stream processing 0.9 Data integration Intra-cluster replication 0.8 2012 2014 0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA 2.0 ☺
  • 23. So, what exactly is a stream?
  • 34. 2.STR
  • 41. Real Time stream processing with KSQL and Kafka SEP / API DAYS 41 Changelog stream – immutable events
  • 42. Real Time stream processing with KSQL and Kafka SEP / API DAYS 42 Rebuild original table
  • 45. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y 45 Standard App No need to create a separate cluster Highly scaleable, elastic, fault tolerant
  • 46. Confluent – Looking Forward J U L Y 46 Lives inside your application Stream processing
  • 47. Real Time stream processing with KSQL and Kafka SEP / API DAYS 47 Streams meet Tables record stream When you need… so that the topic is interpreted as a All the values of a key KStream then you’d read the Kafka topic into a Example All the places Alice has ever been to with messages interpreted as INSERT (append)
  • 48. Real Time stream processing with KSQL and Kafka SEP / API DAYS 48 Streams meet Tables record stream changelog stream When you need… so that the topic is interpreted as a All the values of a key Latest value of a key KStream KTable then you’d read the Kafka topic into a Example All the places Alice has ever been to Where Alice is right now with messages interpreted as INSERT (append) UPSERT (overwrite existing)
  • 49. Real Time stream processing with KSQL and Kafka SEP / API DAYS 49 Same data, but different use cases require different interpretations “Alice has been to SFO, NYC, Rio, Sydney, Beijing, Paris, and finally Berlin.” “Alice is in SFO, NYC, Rio, Sydney, Beijing, Paris, Berlin right now.” ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ Use case 1: Frequent traveler status? Use case 2: Current location? KStream KTable
  • 50. KSQL
  • 51. Real Time stream processing with KSQL and Kafka SEP / API DAYS 51 KSQL — get started fast with Stream Processing Kafka (data) KSQL (processing) read, write network All you need is Kafka – no complex deployments of bespoke systems for stream processing! CREATE STREAM CREATE TABLE SELECT …and more…
  • 52. Confluent – Looking Forward J U L Y 52 ● No need for source code deployment ○ Zero, none at all, not even one tiny file ● All the Kafka Streams capabilities out-of- the-box ○ Exactly Once Semantics ○ Windowing ○ Event-time aggregation ○ Late-arriving data ○ Distributed, fault-tolerant, scalable, ... KSQL Concepts
  • 53. Real Time stream processing with KSQL and Kafka SEP / API DAYS 53 KSQL — SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 55. Real Time stream processing with KSQL and Kafka SEP / API DAYS 55 KSQL — Data exploration An easy way to inspect data in Kafka SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; SHOW TOPICS; PRINT 'my-topic' FROM BEGINNING;
  • 56. Real Time stream processing with KSQL and Kafka SEP / API DAYS 56 KSQL — Data enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, u.country, total FROM payments_stream p LEFT JOIN users_table u ON p.user_id = u.user_id; Stream-table join
  • 57. Real Time stream processing with KSQL and Kafka SEP / API DAYS 57 KSQL — Streaming ETL Filter, cleanse, process data while it is moving CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum';
  • 58. Real Time stream processing with KSQL and Kafka SEP / API DAYS 58 KSQL — Anomaly Detection CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING COUNT(*) > 3; … per 5 min windows Aggregate data Aggregate data to identify patterns or anomalies in real-time
  • 64. Real Time stream processing with KSQL and Kafka SEP / API DAYS 64 KSQL — Real time monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 3;
  • 65. Real Time stream processing with KSQL and Kafka SEP / API DAYS 65 KSQL — Data transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Re-key the data Convert data to JSON
  • 66. Real Time stream processing with KSQL and Kafka SEP / API DAYS 66 KSQL — Stream to Stream JOINs Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2- hour window. CREATE STREAM late_orders AS SELECT o.orderid, o.itemid FROM orders o FULL OUTER JOIN shipments s WITHIN 2 HOURS ON s.orderid = o.orderid WHERE s.orderid IS NULL;
  • 67. Real Time stream processing with KSQL and Kafka SEP / API DAYS 67 INSERT INTO statement for Streams CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...); CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...); CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...); -- Merge the streams into `all_sales` INSERT INTO all_sales SELECT itemId, price FROM sales_online; INSERT INTO all_sales SELECT itemId, price FROM sales_offline; CREATE TABLE daily_sales_per_item AS SELECT itemId, SUM(price) FROM all_sales WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId; Example: Compute daily sales per item across online and offline stores
  • 68. Real Time stream processing with KSQL and Kafka SEP / API DAYS 68 KSQL — Demo customers Kafka Connect streams data in Kafka Connect streams data out KSQL processes table changes in real-time
  • 72. Real Time stream processing with KSQL and Kafka SEP / API DAYS 72 KSQL — Deep Learning for IoT Sensor Analytics KSQL UDF using an analytic model under the hood → Write once, use in any KSQL statement SELECT event_id anomaly(SENSORINPUT) FROM health_sensor; User Defined Function
  • 73. Real Time stream processing with KSQL and Kafka SEP / API DAYS 73 KSQL — User Defined Function (UDF)
  • 77. Server A: “I do stateful stream processing, like tables, joins, aggregations.” “streaming restore” of A’s local state to B Changelog Topic “streaming backup” of A’s local state KSQL Kafka A key challenge of distributed stream processing is fault-tolerant state. State is automatically migrated in case of server failure Server B: “I restore the state and continue processing where server A stopped.” Fault-Tolerance, powered by Kafka
  • 78. Processing fails over automatically, without data loss or miscomputation. 1 Kafka consumer group rebalance is triggered 2 Processing and state of #3 is migrated via Kafka to remaining servers #1 + #2 3 Kafka consumer group rebalance is triggered 4 Part of processing incl. state is migrated via Kafka from #1 + #2 to server #3 #3 is back so the work is split again#3 died so #1 and #2 take over Fault-Tolerance, powered by Kafka
  • 79. You can add, remove, restart servers in KSQL clusters during live operations. 1 Kafka consumer group rebalance is triggered 2 Part of processing incl. state is migrated via Kafka to additional server processes “We need more processing power!” Kafka consumer group rebalance is triggered 3 4 Processing incl. state of stopped servers is migrated via Kafka to remaining servers “Ok, we can scale down again.” Elasticity and Scalability, powered by Kafka
  • 83. Real Time stream processing with KSQL and Kafka SEP / API DAYS 83 Resources and Next Steps • Try the demo on GitHub :) • Check out the code • Play with the examples Download Confluent Open Source: https://www.confluent.io/download/ Chat with us: https://slackpass.io/confluentcommunity #ksql https://github.com/confluentinc/demo-scene
  • 84. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y 84 The World’s Best Streaming Platform — Everywhere DAVID PETERSON Systems Engineer - Confluent APAC david.peterson@confluent.io | @davidseth

Editor's Notes

  • #2: Unordered, unbounded and massive datasets are increasingly common in day-to-day business. Using this to your advantage is incredibly difficult with current system designs. We are stuck in a model where we can only take advantage of this *after* it has happened. Many times, this is too late to be useful in the enterprise.   KSQL is a streaming SQL engine for Apache Kafka. KSQL lowers the entry bar to the world of stream processing, providing a simple and completely interactive SQL interface for processing data in Kafka. KSQL (like Kafka) is open-source, distributed, scalable, and reliable.   A real time Kafka platform moves your data up the stack, closer to the heart of your business, allowing you to build scalable, mission-critical services by quickly deploying SQL-like queries in a severless pattern.   This talk will highlight key use cases for real time data, stream processing with KSQL: Real time analytics, security and anomaly detection, real time ETL / data integration, Internet of Things, application development, and deploying Machine Learning models with KSQL.   Real time data and stream processing means that Kafka is just as important to the disrupted as it is to the disruptors.
  • #5: Founded by the team that built Apache Kafka®, Confluent builds a streaming platform that enables companies to easily access data as real-time streams. Read Slide.
  • #6: 76% of Kafka code created by Confluent team And…
  • #7: Watch Neha’s KS keynote: https://www.youtube.com/watch?v=eublKlalobg And Jay’s keynote: https://www.youtube.com/watch?v=gsUZ6RYmL1s.
  • #8: A sale, an order, a trade, some aspect of a customer experience. These are all events that you’d expect to happen all the time. So why shouldn’t we capture and respond to it, in the moment, vs treat it like a static pile of data which is how we’ve been conditioned to think of data? To make this concrete, an event is just an immutable record that records something as of some point in time.
  • #9: Instead, Data is often most valuable when it is in flow.  This mindset shift is relatively simple.  Rather than static data in databases, we now have event-streaming platforms which enable us to work with data, or events, in real time.   An 'event' simply means something happened; a booking, a financial transaction, a customer experience online, an invoice, an IoT connected device senses something... the list is endless...   Rather than let the data populate a static database, the data itself can trigger an action or analysis in real-time.  In many cases, that offers new value.  The Silicon Valley companies get it.  Organizations such as; Uber, Ebay, Netflix, Yelp...  and more have architected themselves around event-streaming platforms.
  • #10: https://www.paypal-engineering.com/2016/11/18/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-2/ So, we believe Event Streaming platforms are challenging old assumptions. With big data it was - the more the better. ...With Stream Data it’s about the speed. More recent data is more valuable. Streaming data architectures have become a central element of Silicon Valley’s technology companies. Many of the largest of these have built themselves around real-time streams as a kind of central nervous system that connect applications, data systems, and makes available in real-time a stream of everything happening in the business. EVERY ‘event’ in UBER, NETFLIX, YELP, PayPal Ebay, – runs through Kafka. A streaming platform doesn’t have to replace your data warehouse (just yet); in fact, quite the opposite: it feeds it data. It acts as a conduit for data to quickly flow into the warehouse environment for long-term retention, ad hoc analysis, and batch processing. That same pipeline can run in reverse to publish out derived results from nightly or hourly batch processing.
  • #11: At any time we can add a new Connector, a new data sink and re-play the stream into it. These derived views are then optimised for read, the data can be formatted, normalised or whatever to make it optimal for your Consumers to consume that data. Wether it be a search engine, Cassandra, mongoldb, etc. Good references here for Lambda to Kappa? Netflix architecture How kafka is changing data arch: http://techblog.netflix.com/2016/02/evolution-of-netflix-data-pipeline.html
  • #12: Kafka is exciting because it’s now changing how we work with data.     Do we want to store all that data in static databases? Static data in a fast moving digital businesses is often out of date within hours, if not minutes, or even seconds.  And yet, passive storage is the primary model of most data systems.  Most systems offer a place data goes to sit. We use phrases like “data warehouse”, “data lake” or “data store”.  We mostly have applications working on a request / response basis, with data sitting in relational databases.   In the last few years a new style of system and architecture has emerged.  In addition to passive storage, this focuses on the flow of data in real-time streams. Confluent is at the forefront of this.  It has the software and expertise that helps manage data in flow.  It's no longer just big-data that's interesting, it's fast data.  
  • #13: From Lyndon: Our VISION: is for Kafka to act as the central nervous system of the modern company, across all verticals. If you think about it, a lot of life is a stream of events. Conversations are a stream of information. Most of a business is a stream of events. And so, we talk about a Streaming Platform being the Central Nervous System for a business, managing these streams of events. Overall, we think this technology is changing how data is put to use in companies. We are seeing that streaming data is redefining competition. Those that capitalize on it are creating a new, powerful customer experience, reducing costs, designing for regulatory uncertainty, and lowering risk in real-time.
  • #16: It’s made up of three core primitives
  • #17: It’s made up of three core primitives
  • #18: So what is a streaming platform? There are a set of core capabilities around data streams you have to have... The first is the ability to publish and subscribe to streams of data. This is something that’s been around for a long time. Messaging systems have been able to do this. What’s different now is the ability to store data and do it properly in a replicated manner. The final capability is to be able to process these streams of data. Publish Scalability of a filesystem Store * Key/value database Strict ordering Persistence "Kafka moves data closer to the heart of your business allowing you to react in real time, processing and transforming events into actionable results.”
  • #19: So, what is kafka really?
  • #20: Let’s come back to that idea of 3s
  • #21: So what is a streaming platform? There are a set of core capabilities around data streams you have to have... The first is the ability to publish and subscribe to streams of data. This is something that’s been around for a long time. Messaging systems have been able to do this. What’s different now is the ability to store data and do it properly in a replicated manner. The final capability is to be able to process these streams of data. Publish Scalability of a filesystem Store * Kay / value database Strict ordering Persistence "Kafka moves data closer to the heart of your business allowing you to react in real time, processing and transforming events into actionable results.”
  • #23: At a high-level, Kafka is a pub-sub messaging system that has producers that capture events. Events are sent to and stored locally on a central cluster of brokers. And consumers subscribe to topics or named categories of data. End-to-end, producers to consumer data flow is real-time.
  • #24: To really understand what a stream is within Kafka, you have to get to know 3 primitives.
  • #25: It’s made up of three core primitives
  • #26: It’s made up of three core primitives
  • #27: Let me tell you a little story about a Topic. A topic as we just saw was single snapshots of time, stored in a topic. http://thechive.com/2016/07/11/chuck-jones-rules-to-writing-the-wile-e-coyote-and-road-runner-cartoons-11-photos/
  • #28: When we watch animation, it seems “alive”.
  • #29: The reason I’m highliting animation is that it wraps up all the concepts that are involved in understanding a somewhat complex set of primitives within Kafka. It took me a bit of time to get around this. And animation. I love animation. I couldn’t pass up a chance to combine my love of animation with my job :) Let’s look at a single frame of animation. And pull it apart to see the individual layers Each layer would be stored as an immutable (unchangable) event and stored in a topic. Let’s call the topic “animated_stories”
  • #31: So animation when slowed down to the individual frame becomes individual snapshots of time. Immutable recordings of things that happened. Where they happened.
  • #32: And who they happened to.
  • #33: The recording mechanism just happens to be the 35mm frame, still, or in more modern animation, digital frames or even 3d renderings.
  • #34: The realisations and visualisations are all different, but the concepts are all the same. Now, all these things happened. They can not be updated, deleted. I can go back in time and see where they were, but I can’t delete the fact they were there. If I need to change things I add a new frame, they move, I add a new frame. I don’t forget or DELETE the fact they came from behind the tree.
  • #35: A Kafka Stream is data in motion. A topic as we just saw was single snapshots of time, stored in a topic. A Kafka Stream or KStream is a wrapper that interprets these single, immutable events and puts them in motion, Animates them and allows us to reason with events using a powerful facet. Time.
  • #39: Let me tell you a little story about a Topic. A topic as we just saw was single snapshots of time, stored in a topic. http://thechive.com/2016/07/11/chuck-jones-rules-to-writing-the-wile-e-coyote-and-road-runner-cartoons-11-photos/
  • #40: Now, this takes us to the final primitive within the Streams system, the TABLE.
  • #41: There is a wonderful corollary within animation: the Exposure Sheet. This document painstakingly outlines all the actions, all the sounds, all the characters, all the movements. All the EVENTS. And it records them into a familiar structure that we all know and maybe love, the table. This above image could be any database table, excel spreadsheet, but its tabular data. So we’ve gone from these wonderful drawings, these real life events. Then we’ve covered taking those snapsthos and making them come alive. Animating them. So How the heck do we go from this boring’ish view above?
  • #42: Let’s illustrate this with an example. Imagine a table that tracks the total number of pageviews by user (first column of diagram below). Over time, whenever a new pageview event is processed, the state of the table is updated accordingly. Here, the state changes between different points in time – and different revisions of the table – can be represented as a changelog stream (second column). Stream as Table: A stream can be considered a changelog of a table, where each data record in the stream captures a state change of the table. A stream is thus a table in disguise, and it can be easily turned into a “real” table by replaying the changelog from beginning to end to reconstruct the table. Similarly, aggregatingdata records in a stream will return a table. For example, we could compute the total number of pageviews by user from an input stream of pageview events, and the result would be a table, with the table key being the user and the value being the corresponding pageview count. Table as Stream: A table can be considered a snapshot, at a point in time, of the latest value for each key in a stream (a stream’s data records are key-value pairs). A table is thus a stream in disguise, and it can be easily turned into a “real” stream by iterating over each key-value entry in the table.
  • #43: Because of the stream-table duality, the same stream can be used to reconstruct the original table (third column):
  • #44: To this? The Stream / Table duality of course! Remember. Back when we had our racoon and our bear in the forest walking, every frame their movement was recorded.
  • #45: For speaker notes – watch Neha’s KS keynote: https://www.youtube.com/watch?v=eublKlalobg And Jay’s keynote: https://www.youtube.com/watch?v=gsUZ6RYmL1s.
  • #47: https://docs.confluent.io/current/streams/concepts.html#ktable Your stream processing application doesn’t run inside a broker. Instead, it runs in a separate JVM instance, or in a separate cluster entirely.
  • #48: Alright, this might seem a bit repetitive but the stream-table duality is an important concept, so looking at the same concept from different angles helps to fully understand it.
  • #49: KStream = immutable log KTable ~ mutable materialized view
  • #51: For speaker notes – watch Neha’s KS keynote: https://www.youtube.com/watch?v=eublKlalobg And Jay’s keynote: https://www.youtube.com/watch?v=gsUZ6RYmL1s.
  • #55: 15min IN
  • #63: Windowing A stream query is very different to what we normally consider a query. • Since streams are unbounded, you need some meaningful time frames to do aggregations Window streams of data by time — Group data into time buckets • These time buckets provide a way to get a concrete set of results from a continuous stream of data Windows are tracked per unique key Tumbling Hopping Session
  • #67: If there is no match, the right-hand side of the join result will be NULL, indicating that the order was not shipped WITHIN the expected time (here: 2 hours).
  • #68: Main use cases: 1) Write query output into an existing stream 2) Merge multiple streams into a single stream Preview docs: https://docs.confluent.io/5.0.0-beta-20180702/ksql/docs/syntax-reference.html#insert-into
  • #70: Demo time - https://github.com/confluentinc/demo-scene/blob/master/mysql-debezium-ksql-elasticsearch/mysql-debezium-ksql-elasticsearch-docker.adoc
  • #71: https://github.com/confluentinc/demo-scene/blob/master/mysql-debezium-ksql-elasticsearch/mysql-debezium-ksql-elasticsearch-docker.adoc
  • #72: https://github.com/confluentinc/demo-scene/blob/master/mysql-debezium-ksql-elasticsearch/mysql-debezium-ksql-elasticsearch-docker.adoc
  • #73: https://www.confluent.io/blog/write-user-defined-function-udf-ksql/ https://github.com/kaiwaehner/ksql-machine-learning-udf
  • #74: https://www.confluent.io/blog/write-user-defined-function-udf-ksql/ https://github.com/kaiwaehner/ksql-machine-learning-udf
  • #77: Can deploy 3 ways
  • #81: The parallelism of a Kafka Streams application is primarily determined by how many partitions the input topics have. For example, if your application reads from a single topic that has ten partitions, then you can run up to ten instances of your applications. You can run further instances, but these will be idle.
  • #85: Cross region failover Auto scaling (using S3 etc.) Infinite partitions & storage Near infinite scalability Globally synchronized On premise, Hosted, Cloud in a box