SlideShare a Scribd company logo
ksqlDB Workshop
Agenda — ksqlDB Workshop
22
01
Introductions, Welcome &
guidelines. How to get help 05 Lab: Hands on
11:00AM - 12:00 PM
02
Talk: Introduction to Kafka,
Kafka Streams & ksqlDB
10:10 - 10:30 AM
03
Lab: Scenario overview and
what you’ll be building
10:30 - 10:45 AM
04 Lab: Getting your lab set up
10:45 - 11:00 AM
The Rise of Event Streaming
60%Fortune 100 Companies
Using Apache Kafka
3
Confluent Enables Your
Event Streaming Success
Hall of Innovation
CTO Innovation
Award Winner
2019
Enterprise Technology
Innovation
AWARDS
Confluent founders are
original creators of Kafka
Confluent team wrote 80%
of Kafka commits and has
over 1M hours technical
experience with Kafka
Confluent helps enterprises
successfully deploy event
streaming at scale and
accelerate time to market
Confluent Platform extends
Apache Kafka to be a
secure, enterprise-ready
platform
Introduction to Kafka and streams
6
Kafka
Distributed Commit Log
Apache Kafka®
Apache Kafka Connect API:
Import and Export Data In & Out of Kafka
Kafka Connect API
Kafka Pipeline
Sources Sinks
Instantly Connect Popular Data Sources & Sinks
Data Diode
100+
pre-built
connectors
80+ Confluent Supported 20+ Partner Supported, Confluent Verified
Kafka Streams API
Write standard Java applications &
microservices
to process your data in real-time
Kafka Connect API
Reliable and scalable
integration of Kafka
with other systems – no coding
required.
Apache Kafka®
What’s stream processing good for?
Materialized cache
Build and serve incrementally
updated stateful views of your
data.
10
Streaming ETL pipeline
Manipulate in-flight events to
connect arbitrary sources and
sinks.
Event-driven microservice
Trigger changes based on
observed patterns of events in
a stream.
11
What does a streaming platform do?
Kafka Cluster
12
Stream Processing by Analogy
Example: Using Kafka’s Streams API for writing
elastic, scalable, fault-tolerant Java and Scala
applications
Main
Logi
c
Stream processing with Kafka
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Same example, now with ksqlDB.
Not a single line of Java or Scala code needed.
Stream processing with Kafka
3 modalities of stream processing with Confluent
Kafka clients
15
Kafka Streams ksqlDB
ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(), Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;
Using external processing systems leads to
complicated architectures
DB CONNECTOR
APP
APP
DB
STREAM
PROCESSING
APPDB
CONNECTOR
CONNECTOR
We can put it back together in a simpler way
DB
APP
APP
DB
APP
PULL
PUSH
CONNECTORS
STREAM PROCESSING
STATE STORES
ksqlDB
Consumer,
Producer
Kafka
Streams
ksqlDB
Flexibility
Simplicity
subscribe(),
poll(), send(),
flush()
mapValues(),
filter(),
aggregate()
Select…from…
join…where…
group by..
Client Trade-offs
Build a complete streaming app with one mental
model in SQL
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
Capture data
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);
Multi-way joins
In the past, ksqlDB required
multiple joins to “daisy chain”
together, which was cumbersome
and resource intensive.
ksqlDB now supports efficient
multi-way joins in a single
expression.
Before
CREATE STREAM tmp_join AS
SELECT customers.customerid AS customerid,
customers.customername, orders.orderid,
orders.itemid, orders.purchasedate
FROM orders
INNER JOIN customers ON orders.customerid = customers.customerid
EMIT CHANGES;
CREATE STREAM customers_orders_report AS
SELECT customerid, customername, orderid, items.itemname, purchasedate
FROM tmp_join
LEFT JOIN items ON tmp_join.itemid = items.itemid
EMIT CHANGES;
...
After
CREATE STREAM customers_orders_report AS
SELECT customers.customerid AS customerid,
customers.customername, orders.orderid, items.itemname,
orders.purchasedate
FROM orders
LEFT JOIN customers ON orders.customerid = customers.customerid
LEFT JOIN items ON orders.itemid = items.itemid
EMIT CHANGES;
app
First-class
Java client
Write stream processing programs
using language-neutral SQL, then
access your data from your favorite
programming language.
Use either our first-class Java client,
or use our REST API any language
that you like.
CREATE TABLE t1 AS
SELECT k1, SUM(b)
FROM s1
GROUP BY k1
EMIT CHANGES;
Pull query Push query
Highly available pull queries
22
Pull queries now include improved availability semantics
• Pull queries will continue to work during rebalances (assuming standbys are available)
• Lag-aware routing: standbys with the least amount of lag will be targeted
SELECT * FROM my_table WHERE ROWKEY = ‘my_key’;
my_table replica0
● At offset 100
my_table replica1
● At offset 32
Pull queries are now enabled by default in RBAC-enabled environments, too!
Workshop
How we will run the training
24
You will be working with Zoom, and your browser (instructions, ksqlDB console, and
Confluent Control Centre).
If you have questions you can post them via the Zoom chat feature.
If you are stuck don’t worry - just use the “Raise hand” button in Zoom and a Confluent
engineer will come to help you.
Try to avoid just racing ahead and copy-and-pasting. Most people learn better when they
actually type the code into the console. And it allows you to learn from mistakes.
Activity
25
Identify a use case that applies to your
current work
Based upon your understanding of Kafka and
ksqlDB can you identify an area of your job
where you could use Kafka and ksqlDB to
unleash business value from your data?
Not sure where to start? Visit the Stream
Processing Cookbook
https://www.confluent.io/stream-processing-cookbook/
Cluster Architectural Overview
26
MySQL
customer
database
Microservice
User reviews
Website
Product page with
ratings widget
Kafka Connect
Datagen
connector
MySQL CDC
connector
Kafka
ksqlDB
transforms
enriches
queries
Scenario
Overview
28
• Airline website with customer database
• Customer database stores membership levels
• Members can write reviews and rate services on the website and/or mobile app
• Reviews submitted to a reviews microservice
• Customer account referenced in the review via id - missing customer information in
the review
The airline wants to unlock the business value of user reviews by
processing them in real-time.
Use Case - Cleanliness of Facilities
29
Some reviews mention the cleanliness of the airport toilets. This affects
the customer experience of the airline and holds important data for the
airline.
9/12/19 12:55:05 GMT, 5313, {
"rating_id": 5313,
"user_id": 3,
"stars": 1,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "why is it so difficult to keep the bathrooms clean?"
}
Use Case - Approach 1
30
Reviews go to a data warehouse. We process the reviews at the end of
each month and then respond to areas where we receive a significant
number of comments.
This approach tells you what has already happened.
Use Case - Approach 2
31
Process the reviews in real time, and provide a dashboard to the
Airport management team. This dashboard could sort reviews by
topics to quickly surface issues with cleanliness.
This approach tells you what is happening.
Use Case - Approach 3
32
Process the reviews in real time. Set up alerts for 3 bad reviews related
to toilet cleanliness within a 10-minute window. Automatically page
the cleaning staff to deal with the issue.
This approach does something based upon what is happening.
ksqlDB runs in its own cluster
33
Hands on
3. Testing the setup
4. KSQL
ksqlDB console
35
ksqlDB console
36
> show topics;
> show streams;
> print 'ratings';
Hands on
5. Creating your first ksqlDB
streaming application
Complete up to and including 5.2.2
Discussion - tables vs streams
38
> describe extended customers;
> select * from customers emit changes;
> select * from customers_flat emit changes;
Hands on
5.3 Identify the unhappy
customers
5.4 Monitoring our queries
Pause to consider what we have just done
40
We have taken data from two different, remote systems and pulled
them into Kafka
We have performed real time transformations on this data to reformat
We have joined these two separate data streams
We have created a query that constantly runs against a stream of
events and generates new events when data matches the query
and all of this will run at enterprise scale!
CDC — only after state
41
The JSON data shows what information
is being pulled from MySQL via
Debezium CDC.
Here you can see that there is no
“BEFORE” data (it is null).
This means the record was just created
with no updates. Example would be
when a new user is first added.
CDC — before and after
42
Now we have some “BEFORE” data
because there was an update to the
user’s record.
Confluent Control Center
C3 - Managing connectors
C3 - Visualise ksqlDB
45
• Overview of the CDC step [david]
C3 - ksqlDB FlowUI
46
The topology viewer has been enabled by default in CP 5.5:
Accessible via the “Flow” tab:
Topology viewer
47
Advanced Features
Windowed queries
49
“Alert me if I receive
more than three reviews
within 10 seconds”
Build your alerting logic using
ksqlDBs rich support for
windowed queries. This allows us
to implement solutions for
problems like fraud and anomaly
detection.
UDF and machine learning
50
“I want to apply my machine-learning algorithm to real-time data”
Built in functions
ksqlDB ships with a number of built-in functions to simplify stream processing. Examples
include:
• GEODISTANCE: Measure the distance between two lat/long coordinates
• MASK: Convert a string to a masked or obfuscated version of itself
• JSON_ARRAY_CONTAINS: checks if a search value is contained in the array
User-defined functions
Extend the functions available in ksqlDB by building your own functions. A common use
case is to implement a machine-learning algorithm via ksqlDB, enabling these models to
contribute to your real-time data transformation
Internet of Things
51
“Process telemetry in real
time to provide predictive
maintenance”
Despite its simple
implementation ksqlDB operates
at enterprise scale
Other IoT use cases:
• Mineral extraction
• Cruise Ship
• Production Line
• Connected Car
• Power Plant
• Gas Pipelines
Next Steps
Reflection
53
Consider the challenges you face in your current role, and how
event streaming and processing could help solve them. What
products or solutions could you build if you had access to the
right data?
Learning
54
Visit the ksqlDB site to learn more about the technology
https://ksqldb.io/
Review the Stream Processing Cookbook
https://www.confluent.io/stream-processing-cookbook/?utm_source=field&utm_campaign=fieldocpromo
Download the ebook on designing event driven systems
https://www.confluent.io/designing-event-driven-systems?utm_source=field&utm_campaign=fieldocpromo
Subscribe to the Streaming Audio podcast
https://podcasts.apple.com/au/podcast/streaming-audio-a-confluent-podcast-about-apache-kafka/id1401509765
More resources
https://docs.confluent.io/current/resources.html
Learn Kafka.
developer.confluent.io
Free eBooks
Kafka: The Definitive Guide
Neha Narkhede, Gwen Shapira, Todd
Palino
Making Sense of Stream Processing
Martin Kleppmann
I ❤ Logs
Jay Kreps
Designing Event-Driven Systems
Ben Stopford
http://cnfl.io/book-bundle
Building
57
Download Confluent Platform to develop your new idea
https://docs.confluent.io/current/quickstart/index.html
Get started for free on Confluent Cloud
Get $60 of free Confluent Cloud
(Even if you’re an existing user)
CC60COMM
Promo value expiration: 90 days after activation • Activate by December 31st 2021 • Any unused promo value on the expiration date will be forfeited.
How to activate
Apply this code directly within the Confluent Cloud billing interface
LIMITED PROMOTION
If you receive an invalid promo code error when trying to activate a code, this means that all promo codes have already been claimed
Interacting
59
Join the Confluent Slack Channel
https://launchpass.com/confluentcommunity
Local meetups
https://www.confluent.io/community/
KafkaSummit 2020
https://kafka-summit.org/
Interesting ideas?
60
Did something catch your fancy, want to dive a bit deeper?
Please chat in the zoom window or reach out to us.
APAC ksqlDB Workshop

More Related Content

PDF
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
PDF
All Streams Ahead! ksqlDB Workshop ANZ
PDF
Concepts and Patterns for Streaming Services with Kafka
PPTX
10 Principals for Effective Event Driven Microservices
PDF
Real time data processing and model inferncing platform with Kafka streams (N...
PDF
Amsterdam meetup at ING June 18, 2019
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
PDF
Building Event-Driven Services with Apache Kafka
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
All Streams Ahead! ksqlDB Workshop ANZ
Concepts and Patterns for Streaming Services with Kafka
10 Principals for Effective Event Driven Microservices
Real time data processing and model inferncing platform with Kafka streams (N...
Amsterdam meetup at ING June 18, 2019
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Building Event-Driven Services with Apache Kafka

What's hot (20)

PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
PDF
A Global Source of Truth for the Microservices Generation
PPTX
Realtime stream processing with kafka
PDF
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
PDF
Removing performance bottlenecks with Kafka Monitoring and topic configuration
PPTX
New Approaches for Fraud Detection on Apache Kafka and KSQL
PDF
APAC Kafka Summit - Best Of
PDF
Building a Streaming Platform with Kafka
PDF
Why Build an Apache Kafka® Connector
PPTX
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
PDF
Top use cases for 2022 with Data in Motion and Apache Kafka
PDF
The Future of Streaming: Global Apps, Event Stores and Serverless
PPTX
Data Streaming with Apache Kafka & MongoDB
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PDF
Architecting Microservices Applications with Instant Analytics
PDF
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
PDF
What is Apache Kafka and What is an Event Streaming Platform?
PDF
Evolving from Messaging to Event Streaming
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
A Global Source of Truth for the Microservices Generation
Realtime stream processing with kafka
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Removing performance bottlenecks with Kafka Monitoring and topic configuration
New Approaches for Fraud Detection on Apache Kafka and KSQL
APAC Kafka Summit - Best Of
Building a Streaming Platform with Kafka
Why Build an Apache Kafka® Connector
Bank of China (HK) Tech Talk 1: Dive Into Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
The Future of Streaming: Global Apps, Event Stores and Serverless
Data Streaming with Apache Kafka & MongoDB
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Architecting Microservices Applications with Instant Analytics
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
What is Apache Kafka and What is an Event Streaming Platform?
Evolving from Messaging to Event Streaming
Ad

Similar to APAC ksqlDB Workshop (20)

PDF
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
PPTX
Event streaming webinar feb 2020
PDF
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
PPTX
Bridge Your Kafka Streams to Azure Webinar
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
PPTX
Event Streaming Architectures with Confluent and ScyllaDB
PPTX
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
PPTX
Kick your database_to_the_curb_reston_08_27_19
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
PPTX
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
PDF
KSQL: Open Source Streaming for Apache Kafka
PDF
KSQL - Stream Processing simplified!
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
PDF
Introduction to apache kafka, confluent and why they matter
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Event streaming webinar feb 2020
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Bridge Your Kafka Streams to Azure Webinar
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Event Streaming Architectures with Confluent and ScyllaDB
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Kick your database_to_the_curb_reston_08_27_19
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
KSQL and Kafka Streams – When to Use Which, and When to Use Both
KSQL: Open Source Streaming for Apache Kafka
KSQL - Stream Processing simplified!
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to apache kafka, confluent and why they matter
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

APAC ksqlDB Workshop

  • 2. Agenda — ksqlDB Workshop 22 01 Introductions, Welcome & guidelines. How to get help 05 Lab: Hands on 11:00AM - 12:00 PM 02 Talk: Introduction to Kafka, Kafka Streams & ksqlDB 10:10 - 10:30 AM 03 Lab: Scenario overview and what you’ll be building 10:30 - 10:45 AM 04 Lab: Getting your lab set up 10:45 - 11:00 AM
  • 3. The Rise of Event Streaming 60%Fortune 100 Companies Using Apache Kafka 3
  • 4. Confluent Enables Your Event Streaming Success Hall of Innovation CTO Innovation Award Winner 2019 Enterprise Technology Innovation AWARDS Confluent founders are original creators of Kafka Confluent team wrote 80% of Kafka commits and has over 1M hours technical experience with Kafka Confluent helps enterprises successfully deploy event streaming at scale and accelerate time to market Confluent Platform extends Apache Kafka to be a secure, enterprise-ready platform
  • 5. Introduction to Kafka and streams
  • 7. Apache Kafka Connect API: Import and Export Data In & Out of Kafka Kafka Connect API Kafka Pipeline Sources Sinks
  • 8. Instantly Connect Popular Data Sources & Sinks Data Diode 100+ pre-built connectors 80+ Confluent Supported 20+ Partner Supported, Confluent Verified
  • 9. Kafka Streams API Write standard Java applications & microservices to process your data in real-time Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required. Apache Kafka®
  • 10. What’s stream processing good for? Materialized cache Build and serve incrementally updated stateful views of your data. 10 Streaming ETL pipeline Manipulate in-flight events to connect arbitrary sources and sinks. Event-driven microservice Trigger changes based on observed patterns of events in a stream.
  • 11. 11 What does a streaming platform do?
  • 13. Example: Using Kafka’s Streams API for writing elastic, scalable, fault-tolerant Java and Scala applications Main Logi c Stream processing with Kafka
  • 14. CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; Same example, now with ksqlDB. Not a single line of Java or Scala code needed. Stream processing with Kafka
  • 15. 3 modalities of stream processing with Confluent Kafka clients 15 Kafka Streams ksqlDB ConsumerRecords<String, String> records = consumer.poll(100); Map<String, Integer> counts = new DefaultMap<String, Integer>(); for (ConsumerRecord<String, Integer> record : records) { String key = record.key(); int c = counts.get(key) c += record.value() counts.put(key, c) } for (Map.Entry<String, Integer> entry : counts.entrySet()) { int stateCount; int attempts; while (attempts++ < MAX_RETRIES) { try { stateCount = stateStore.getValue(entry.getKey()) stateStore.setValue(entry.getKey(), entry.getValue() + stateCount) break; } catch (StateStoreException e) { RetryUtils.backoff(attempts); } } } builder .stream("input-stream", Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to("counts", Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;
  • 16. Using external processing systems leads to complicated architectures DB CONNECTOR APP APP DB STREAM PROCESSING APPDB CONNECTOR CONNECTOR
  • 17. We can put it back together in a simpler way DB APP APP DB APP PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB
  • 19. Build a complete streaming app with one mental model in SQL Serve lookups against materialized views Create materialized views Perform continuous transformations Capture data CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …);
  • 20. Multi-way joins In the past, ksqlDB required multiple joins to “daisy chain” together, which was cumbersome and resource intensive. ksqlDB now supports efficient multi-way joins in a single expression. Before CREATE STREAM tmp_join AS SELECT customers.customerid AS customerid, customers.customername, orders.orderid, orders.itemid, orders.purchasedate FROM orders INNER JOIN customers ON orders.customerid = customers.customerid EMIT CHANGES; CREATE STREAM customers_orders_report AS SELECT customerid, customername, orderid, items.itemname, purchasedate FROM tmp_join LEFT JOIN items ON tmp_join.itemid = items.itemid EMIT CHANGES; ... After CREATE STREAM customers_orders_report AS SELECT customers.customerid AS customerid, customers.customername, orders.orderid, items.itemname, orders.purchasedate FROM orders LEFT JOIN customers ON orders.customerid = customers.customerid LEFT JOIN items ON orders.itemid = items.itemid EMIT CHANGES;
  • 21. app First-class Java client Write stream processing programs using language-neutral SQL, then access your data from your favorite programming language. Use either our first-class Java client, or use our REST API any language that you like. CREATE TABLE t1 AS SELECT k1, SUM(b) FROM s1 GROUP BY k1 EMIT CHANGES; Pull query Push query
  • 22. Highly available pull queries 22 Pull queries now include improved availability semantics • Pull queries will continue to work during rebalances (assuming standbys are available) • Lag-aware routing: standbys with the least amount of lag will be targeted SELECT * FROM my_table WHERE ROWKEY = ‘my_key’; my_table replica0 ● At offset 100 my_table replica1 ● At offset 32 Pull queries are now enabled by default in RBAC-enabled environments, too!
  • 24. How we will run the training 24 You will be working with Zoom, and your browser (instructions, ksqlDB console, and Confluent Control Centre). If you have questions you can post them via the Zoom chat feature. If you are stuck don’t worry - just use the “Raise hand” button in Zoom and a Confluent engineer will come to help you. Try to avoid just racing ahead and copy-and-pasting. Most people learn better when they actually type the code into the console. And it allows you to learn from mistakes.
  • 25. Activity 25 Identify a use case that applies to your current work Based upon your understanding of Kafka and ksqlDB can you identify an area of your job where you could use Kafka and ksqlDB to unleash business value from your data? Not sure where to start? Visit the Stream Processing Cookbook https://www.confluent.io/stream-processing-cookbook/
  • 26. Cluster Architectural Overview 26 MySQL customer database Microservice User reviews Website Product page with ratings widget Kafka Connect Datagen connector MySQL CDC connector Kafka ksqlDB transforms enriches queries
  • 28. Overview 28 • Airline website with customer database • Customer database stores membership levels • Members can write reviews and rate services on the website and/or mobile app • Reviews submitted to a reviews microservice • Customer account referenced in the review via id - missing customer information in the review The airline wants to unlock the business value of user reviews by processing them in real-time.
  • 29. Use Case - Cleanliness of Facilities 29 Some reviews mention the cleanliness of the airport toilets. This affects the customer experience of the airline and holds important data for the airline. 9/12/19 12:55:05 GMT, 5313, { "rating_id": 5313, "user_id": 3, "stars": 1, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "why is it so difficult to keep the bathrooms clean?" }
  • 30. Use Case - Approach 1 30 Reviews go to a data warehouse. We process the reviews at the end of each month and then respond to areas where we receive a significant number of comments. This approach tells you what has already happened.
  • 31. Use Case - Approach 2 31 Process the reviews in real time, and provide a dashboard to the Airport management team. This dashboard could sort reviews by topics to quickly surface issues with cleanliness. This approach tells you what is happening.
  • 32. Use Case - Approach 3 32 Process the reviews in real time. Set up alerts for 3 bad reviews related to toilet cleanliness within a 10-minute window. Automatically page the cleaning staff to deal with the issue. This approach does something based upon what is happening.
  • 33. ksqlDB runs in its own cluster 33
  • 34. Hands on 3. Testing the setup 4. KSQL
  • 36. ksqlDB console 36 > show topics; > show streams; > print 'ratings';
  • 37. Hands on 5. Creating your first ksqlDB streaming application Complete up to and including 5.2.2
  • 38. Discussion - tables vs streams 38 > describe extended customers; > select * from customers emit changes; > select * from customers_flat emit changes;
  • 39. Hands on 5.3 Identify the unhappy customers 5.4 Monitoring our queries
  • 40. Pause to consider what we have just done 40 We have taken data from two different, remote systems and pulled them into Kafka We have performed real time transformations on this data to reformat We have joined these two separate data streams We have created a query that constantly runs against a stream of events and generates new events when data matches the query and all of this will run at enterprise scale!
  • 41. CDC — only after state 41 The JSON data shows what information is being pulled from MySQL via Debezium CDC. Here you can see that there is no “BEFORE” data (it is null). This means the record was just created with no updates. Example would be when a new user is first added.
  • 42. CDC — before and after 42 Now we have some “BEFORE” data because there was an update to the user’s record.
  • 44. C3 - Managing connectors
  • 45. C3 - Visualise ksqlDB 45 • Overview of the CDC step [david]
  • 46. C3 - ksqlDB FlowUI 46
  • 47. The topology viewer has been enabled by default in CP 5.5: Accessible via the “Flow” tab: Topology viewer 47
  • 49. Windowed queries 49 “Alert me if I receive more than three reviews within 10 seconds” Build your alerting logic using ksqlDBs rich support for windowed queries. This allows us to implement solutions for problems like fraud and anomaly detection.
  • 50. UDF and machine learning 50 “I want to apply my machine-learning algorithm to real-time data” Built in functions ksqlDB ships with a number of built-in functions to simplify stream processing. Examples include: • GEODISTANCE: Measure the distance between two lat/long coordinates • MASK: Convert a string to a masked or obfuscated version of itself • JSON_ARRAY_CONTAINS: checks if a search value is contained in the array User-defined functions Extend the functions available in ksqlDB by building your own functions. A common use case is to implement a machine-learning algorithm via ksqlDB, enabling these models to contribute to your real-time data transformation
  • 51. Internet of Things 51 “Process telemetry in real time to provide predictive maintenance” Despite its simple implementation ksqlDB operates at enterprise scale Other IoT use cases: • Mineral extraction • Cruise Ship • Production Line • Connected Car • Power Plant • Gas Pipelines
  • 53. Reflection 53 Consider the challenges you face in your current role, and how event streaming and processing could help solve them. What products or solutions could you build if you had access to the right data?
  • 54. Learning 54 Visit the ksqlDB site to learn more about the technology https://ksqldb.io/ Review the Stream Processing Cookbook https://www.confluent.io/stream-processing-cookbook/?utm_source=field&utm_campaign=fieldocpromo Download the ebook on designing event driven systems https://www.confluent.io/designing-event-driven-systems?utm_source=field&utm_campaign=fieldocpromo Subscribe to the Streaming Audio podcast https://podcasts.apple.com/au/podcast/streaming-audio-a-confluent-podcast-about-apache-kafka/id1401509765 More resources https://docs.confluent.io/current/resources.html
  • 56. Free eBooks Kafka: The Definitive Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps Designing Event-Driven Systems Ben Stopford http://cnfl.io/book-bundle
  • 57. Building 57 Download Confluent Platform to develop your new idea https://docs.confluent.io/current/quickstart/index.html Get started for free on Confluent Cloud
  • 58. Get $60 of free Confluent Cloud (Even if you’re an existing user) CC60COMM Promo value expiration: 90 days after activation • Activate by December 31st 2021 • Any unused promo value on the expiration date will be forfeited. How to activate Apply this code directly within the Confluent Cloud billing interface LIMITED PROMOTION If you receive an invalid promo code error when trying to activate a code, this means that all promo codes have already been claimed
  • 59. Interacting 59 Join the Confluent Slack Channel https://launchpass.com/confluentcommunity Local meetups https://www.confluent.io/community/ KafkaSummit 2020 https://kafka-summit.org/
  • 60. Interesting ideas? 60 Did something catch your fancy, want to dive a bit deeper? Please chat in the zoom window or reach out to us.