SlideShare a Scribd company logo
1
Beyond the DSL - #process
If you’re PAPI and you know it, merge your streams!
Antony Stubbs
Solution Architect EMEA
Confluent
MÜNCHEN - 09. OKTOBER 2018
3
Kafka Streams DSL - the Easy Path
4
DSL - but eventually...
5
Quick Scientific™ survey
Who in the audience uses Kafka in prod?
- Stream processing frameworks?
- Kafka Streams?
- PAPI?
6
Antony Stubbs - New Zealand Made
● @psynikal
● github.com/astubbs
● Confluent for ~2 years
● Consultant in EMEA
● Kafka Streams in my favourite
8
Topologies, Trolls and Troglodytes
9
KStream<Integer, Integer> input =
builder.stream("numbers-topic");
// Stateless computation
KStream<Integer, Integer> doubled =
input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.groupByKey()
.reduce((v1, v2) -> v1 + v2, "sum-of-odds");
What is the DSL?
10
What is the DSL?
11
What is #process?
Flexibility
12
What is #process?
Freedom
13
What is #process?
Power
But with great power...
but not that much… :)
14
What is #process?
KStream#process({magic})
15
What is #process?
interface Processor<K, V> {
void process(K key, V value)
}
16
What is #transform?
interface Transformer<K, V, R> {
R transform(K key, V value)
}
17
What is #process?
interface Processor<K, V> {
public void init(ProcessorContext context)
void process(K key, V value)
<... snip ...>
}
19
PAPI vs DSL
When should you use which?
● “It depends”
● DSL
○ Easy
○ Can do a lot with the blocks
■ Clever with data structures
○ If it fits nicely, use it
● PAPI can be more advanced, but also super flexible
○ Build reusable processors for your company
○ Doesn’t have the “nice“ utility functions like count - but that’s the point
○ Can ”shoot your own foot”
○ Be responsible
● Don’t bend over backwards to fit your model to the DSL
20
A Combination - the Best of Both Worlds
ppvStream.groupByKey().reduce({ newState, ktableEntry ->
<.... reduce function …>
}).toStream().transform({
new Transformer<>((){
<.... do something complicated…>
}
}).mapValues({ v ->
<.... map function …>
}).to("output-deltas-v2")
21
What is a State Store? IQ?
A local database - RocksDB
● K/V Store
● High speed
● Spills to disk
An optimisation?
- Moves the state to the processing
What are Interactive Queries (IQ)?
By nature of Kafka
● Distributed
● Redundant
● Backed onto Kafka (optionally)
● Highly available (Kafka + Standby tasks)
22
Simple:
- Deduplication (vs EOS)
- Secondary indices
- Need to do something periodically
- TTL Caches
- Synchronous state checking
Working With Processors and State Stores
Advanced:
- State recalculation
- Expiring data efficiently
- Global KTable triggered joins (DSL work around)
- Probabilistic counting with custom store implementations...
24
State Subselects - Compound Keys
State stores have the #put, #get and #range method call - this brings some new magic...
Order Items
- Select all orders items from my state store, for this order key
- Avoids building larger and larger compounded values
- Which will take longer and longer to update - using individual entries instead
Time
- Timestamp compounded with key
- Great for retrieving entries within computable time windows (hint hint)
- Great for scanning over a range of entries
25
State Subselects - Secondary Indices
● Think ~“tables”...
● Serving an “Interactive Query” for a few possible fields...
○ Can’t construct compound keys for multiple combinations
○ One State Store per combination
○ Upon inserting the primary key/value
■ Also insert into the extra stores the field<->key mappings
■ Upon query, query against the appropriate store that holds the mappings for the requested field
■ Collect the possibly many values (keys) and retrieve the entire object(s) from the primary store
KEY → VALUE
TYPE+KEY → KEY
EMAIL → KEY
29
Case Study - leveraging state stores
WIPERS = ON WIPERS = OFF WIPERS = ?
32
Dynamic,
dependent,
aggregate,
on demand recalculation,
from -
out of order data.
So, need to go back and recalculate...
33
DSL despair!
Can’t update aggregates outside of the aggregate that has been triggered...
Potential messy DSL solution (bend over backwards) - synthetic events!
- Publish an event back to the topic with the correct time stamp and information needed to retrigger the
other aggregates
- You need to calculate / all / the correct time stamps
- Pollutes the stream with fake data
- Is unnatural / smells
- Breaks out of the KS transaction (EOS)
34
Enter #process()
Keep track of our aggregates ourselves
- Need to calculate our own time buckets
- Time query or store for possible future buckets
- All kept with in the KS context (no producer break out for synthetic events)
35
Case Study - leveraging state stores
36
Punc’tuation.?
● What are Punctuations?
● What is Wall Time?
● What is Event Time?
37
● DSL has window retention periods
○ We need state - but for how long?
○ KTable TTL? (Bounded vs unbounded keyset)
#process window retention period you may ask?
● #process TTL?
○ Using punctuation - scan periodically through the state
store and delete all buckets that are beyond our retention
period
○ Do TTL on “KTable” type data
● How? Compound keys...
38
Future Event Triggers
Expiring credit cards in real time or some other future timed action
- don’t want to poll all entries and check action times
- need to be able to expire tasks
Time as secondary index (either compound key or secondary state store)
- range select on all keys with time value before now
- take action
- emit action taken event (context.forward to specific node or emit)
- delete entry
- poll state store with range select every ~second,
- or schedule next punctuator to run at timestamp of next event
- need to update
39
Database Changelog Derivation
Problem:
● DB CDC doesn’t emit deltas, only full state
● Can’t see what’s changed in the document
Solution:
● Derive deltas from full state, stored in a stateful stream processor
● Can use KTable tuples
Issue:
● No TTL - enter PAPI
40
Distributed one to MANY MANY MANY late (maybe) joins
What’s the problem?
- Effectively re-joining missed joins once the right hand side arrives
52
Sometimes it’s useful to avoid some DSL overheads
● Combine operators
● Avoid repartitioning in some cases
● etc...
Optimising Topologies
Beware inconsistent hashing...
53
Speaking of topology optimisation...
Two phase topology building
- First optimisation is reusing intermediate rekey topics
- Avoids branched on demand rekey further down the DAG by detecting the rekey and moving it up
immediately
- manually achievable by forcing rekey straight away with #through
Global topology optimisation coming in 2.1
54
Now I did ask about KSQL after all...
Check out it KSQL if you haven’t already…
● Abstraction over Kafka Streams
● Languages outside of the JVM
● Non programmers
● Among others...
KSQL User Defined Functions in CP 5.0!
● Parallels with Processors combined with the DSL, you can now insert more complex functionality
into ksql
○ Eg trained machine learning model and use as UDF in KSQL
55
Where to next? & We’re hiring!
● github.com/confluentinc/
kafka-streams-examples
● “Ben Stopford” on youtube.com
● Kafka Streams playlist on confluentinc
youtube
● Consulting services? Contact
sales@confluent.io
Further reading
● confluent.io/resources/
● docs.confluent.io/current/streams/
● confluentinc on Youtube
● github.com/astubbs
● @psynikal
Come find me for Q&A later...
Don’t be afraid of #process and do drop down from the DSL for some
operations!
Join us! https://www.confluent.io/careers/
56
THANK YOU!
Learn more:
confluent.io/download
confluent.io/product/ksql/
confluent.io/confluent-cloud/

More Related Content

PDF
Processing Big Data in Real-Time - Yanai Franchi, Tikal
PDF
FastR+Apache Flink
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PDF
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
PDF
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
PDF
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
PDF
Introduction to Stateful Stream Processing with Apache Flink.
Processing Big Data in Real-Time - Yanai Franchi, Tikal
FastR+Apache Flink
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Introduction to Stateful Stream Processing with Apache Flink.

What's hot (20)

PPTX
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
PPTX
Debunking Common Myths in Stream Processing
PDF
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
PDF
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
PDF
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
PPTX
HPBigData2015 PSTL kafka spark vertica
PDF
Visualizing a global DNS network with open source tools
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
PDF
Aggregate Sharing for User-Define Data Stream Windows
PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
PPTX
Predictive Maintenance with Deep Learning and Apache Flink
PPTX
Debunking Six Common Myths in Stream Processing
PDF
Streaming Data from Cassandra into Kafka
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
PDF
Zurich Flink Meetup
PDF
Apache Spark Streaming - www.know bigdata.com
PDF
Stateful Distributed Stream Processing
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
PDF
Looking towards an official cassandra sidecar netflix
PDF
Apache Gearpump next-gen streaming engine
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Debunking Common Myths in Stream Processing
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Andreas Kunft - Efficiently executing R Dataframes...
HPBigData2015 PSTL kafka spark vertica
Visualizing a global DNS network with open source tools
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Aggregate Sharing for User-Define Data Stream Windows
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Predictive Maintenance with Deep Learning and Apache Flink
Debunking Six Common Myths in Stream Processing
Streaming Data from Cassandra into Kafka
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Zurich Flink Meetup
Apache Spark Streaming - www.know bigdata.com
Stateful Distributed Stream Processing
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Looking towards an official cassandra sidecar netflix
Apache Gearpump next-gen streaming engine
Ad

Similar to Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API (20)

PDF
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
PDF
Kudu - Fast Analytics on Fast Data
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
PDF
Using a Fast Operational Database to Build Real-time Streaming Aggregations
PPTX
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
PDF
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PPTX
Get More Out of MySQL with TokuDB
PPTX
Building real time Data Pipeline using Spark Streaming
PDF
Testing Persistent Storage Performance in Kubernetes with Sherlock
PPTX
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
PDF
Data Analytics and Simulation in Parallel with MATLAB*
PDF
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
PDF
Spark Summit EU talk by Berni Schiefer
PDF
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
PPTX
Distributed Task Scheduling with Akka, Kafka and Cassandra
PDF
Dissecting Real-World Database Performance Dilemmas
PDF
Shared Database Concurrency
PDF
Postgres clusters
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Kudu - Fast Analytics on Fast Data
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Using a Fast Operational Database to Build Real-time Streaming Aggregations
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
The Future of Fast Databases: Lessons from a Decade of QuestDB
Get More Out of MySQL with TokuDB
Building real time Data Pipeline using Spark Streaming
Testing Persistent Storage Performance in Kubernetes with Sherlock
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
Data Analytics and Simulation in Parallel with MATLAB*
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Spark Summit EU talk by Berni Schiefer
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Distributed Task Scheduling with Akka, Kafka and Cassandra
Dissecting Real-World Database Performance Dilemmas
Shared Database Concurrency
Postgres clusters
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API

  • 1. 1 Beyond the DSL - #process If you’re PAPI and you know it, merge your streams! Antony Stubbs Solution Architect EMEA Confluent MÜNCHEN - 09. OKTOBER 2018
  • 2. 3 Kafka Streams DSL - the Easy Path
  • 3. 4 DSL - but eventually...
  • 4. 5 Quick Scientific™ survey Who in the audience uses Kafka in prod? - Stream processing frameworks? - Kafka Streams? - PAPI?
  • 5. 6 Antony Stubbs - New Zealand Made ● @psynikal ● github.com/astubbs ● Confluent for ~2 years ● Consultant in EMEA ● Kafka Streams in my favourite
  • 7. 9 KStream<Integer, Integer> input = builder.stream("numbers-topic"); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds"); What is the DSL?
  • 11. 13 What is #process? Power But with great power... but not that much… :)
  • 13. 15 What is #process? interface Processor<K, V> { void process(K key, V value) }
  • 14. 16 What is #transform? interface Transformer<K, V, R> { R transform(K key, V value) }
  • 15. 17 What is #process? interface Processor<K, V> { public void init(ProcessorContext context) void process(K key, V value) <... snip ...> }
  • 16. 19 PAPI vs DSL When should you use which? ● “It depends” ● DSL ○ Easy ○ Can do a lot with the blocks ■ Clever with data structures ○ If it fits nicely, use it ● PAPI can be more advanced, but also super flexible ○ Build reusable processors for your company ○ Doesn’t have the “nice“ utility functions like count - but that’s the point ○ Can ”shoot your own foot” ○ Be responsible ● Don’t bend over backwards to fit your model to the DSL
  • 17. 20 A Combination - the Best of Both Worlds ppvStream.groupByKey().reduce({ newState, ktableEntry -> <.... reduce function …> }).toStream().transform({ new Transformer<>((){ <.... do something complicated…> } }).mapValues({ v -> <.... map function …> }).to("output-deltas-v2")
  • 18. 21 What is a State Store? IQ? A local database - RocksDB ● K/V Store ● High speed ● Spills to disk An optimisation? - Moves the state to the processing What are Interactive Queries (IQ)? By nature of Kafka ● Distributed ● Redundant ● Backed onto Kafka (optionally) ● Highly available (Kafka + Standby tasks)
  • 19. 22 Simple: - Deduplication (vs EOS) - Secondary indices - Need to do something periodically - TTL Caches - Synchronous state checking Working With Processors and State Stores Advanced: - State recalculation - Expiring data efficiently - Global KTable triggered joins (DSL work around) - Probabilistic counting with custom store implementations...
  • 20. 24 State Subselects - Compound Keys State stores have the #put, #get and #range method call - this brings some new magic... Order Items - Select all orders items from my state store, for this order key - Avoids building larger and larger compounded values - Which will take longer and longer to update - using individual entries instead Time - Timestamp compounded with key - Great for retrieving entries within computable time windows (hint hint) - Great for scanning over a range of entries
  • 21. 25 State Subselects - Secondary Indices ● Think ~“tables”... ● Serving an “Interactive Query” for a few possible fields... ○ Can’t construct compound keys for multiple combinations ○ One State Store per combination ○ Upon inserting the primary key/value ■ Also insert into the extra stores the field<->key mappings ■ Upon query, query against the appropriate store that holds the mappings for the requested field ■ Collect the possibly many values (keys) and retrieve the entire object(s) from the primary store KEY → VALUE TYPE+KEY → KEY EMAIL → KEY
  • 22. 29 Case Study - leveraging state stores WIPERS = ON WIPERS = OFF WIPERS = ?
  • 23. 32 Dynamic, dependent, aggregate, on demand recalculation, from - out of order data. So, need to go back and recalculate...
  • 24. 33 DSL despair! Can’t update aggregates outside of the aggregate that has been triggered... Potential messy DSL solution (bend over backwards) - synthetic events! - Publish an event back to the topic with the correct time stamp and information needed to retrigger the other aggregates - You need to calculate / all / the correct time stamps - Pollutes the stream with fake data - Is unnatural / smells - Breaks out of the KS transaction (EOS)
  • 25. 34 Enter #process() Keep track of our aggregates ourselves - Need to calculate our own time buckets - Time query or store for possible future buckets - All kept with in the KS context (no producer break out for synthetic events)
  • 26. 35 Case Study - leveraging state stores
  • 27. 36 Punc’tuation.? ● What are Punctuations? ● What is Wall Time? ● What is Event Time?
  • 28. 37 ● DSL has window retention periods ○ We need state - but for how long? ○ KTable TTL? (Bounded vs unbounded keyset) #process window retention period you may ask? ● #process TTL? ○ Using punctuation - scan periodically through the state store and delete all buckets that are beyond our retention period ○ Do TTL on “KTable” type data ● How? Compound keys...
  • 29. 38 Future Event Triggers Expiring credit cards in real time or some other future timed action - don’t want to poll all entries and check action times - need to be able to expire tasks Time as secondary index (either compound key or secondary state store) - range select on all keys with time value before now - take action - emit action taken event (context.forward to specific node or emit) - delete entry - poll state store with range select every ~second, - or schedule next punctuator to run at timestamp of next event - need to update
  • 30. 39 Database Changelog Derivation Problem: ● DB CDC doesn’t emit deltas, only full state ● Can’t see what’s changed in the document Solution: ● Derive deltas from full state, stored in a stateful stream processor ● Can use KTable tuples Issue: ● No TTL - enter PAPI
  • 31. 40 Distributed one to MANY MANY MANY late (maybe) joins What’s the problem? - Effectively re-joining missed joins once the right hand side arrives
  • 32. 52 Sometimes it’s useful to avoid some DSL overheads ● Combine operators ● Avoid repartitioning in some cases ● etc... Optimising Topologies Beware inconsistent hashing...
  • 33. 53 Speaking of topology optimisation... Two phase topology building - First optimisation is reusing intermediate rekey topics - Avoids branched on demand rekey further down the DAG by detecting the rekey and moving it up immediately - manually achievable by forcing rekey straight away with #through Global topology optimisation coming in 2.1
  • 34. 54 Now I did ask about KSQL after all... Check out it KSQL if you haven’t already… ● Abstraction over Kafka Streams ● Languages outside of the JVM ● Non programmers ● Among others... KSQL User Defined Functions in CP 5.0! ● Parallels with Processors combined with the DSL, you can now insert more complex functionality into ksql ○ Eg trained machine learning model and use as UDF in KSQL
  • 35. 55 Where to next? & We’re hiring! ● github.com/confluentinc/ kafka-streams-examples ● “Ben Stopford” on youtube.com ● Kafka Streams playlist on confluentinc youtube ● Consulting services? Contact sales@confluent.io Further reading ● confluent.io/resources/ ● docs.confluent.io/current/streams/ ● confluentinc on Youtube ● github.com/astubbs ● @psynikal Come find me for Q&A later... Don’t be afraid of #process and do drop down from the DSL for some operations! Join us! https://www.confluent.io/careers/