SlideShare a Scribd company logo
Samza SQL
Srinivasulu Punuru
Agenda
1 What is Samza SQL?
2 Why SQL on Samza?
3 How does it work?
4 Demo
5 Q&A
Stream Processing using Samza SQL
What is Samza SQL?
Samza SQL by Example
Count page views of each member in a five minute window.
Send the result to kafka topic PageViewCount.
Samza low level task API
Repartitioner Job
public class PageViewRepartitioner implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pvMemberId");
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String key = pageViewEvent.getMemberId();
OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent);
collector.send(outMessage);
}
}
Samza low level task API (contd.)
Page view counter
job
public class PageViewCounter implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pageviewCount");
private Instant lastTriggerTime = Instant.now();
private HashMap<String, Integer> counter = new HashMap<>();
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String memberId = pageViewEvent.getMemberId();
counter.put(memberId, counter.getOrDefault(memberId, 0) + 1);
if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) {
counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value)));
counter.clear();
}
}
}
Samza high level API
public class PageViewCountApplication implements StreamApplication {
@Override public void init(StreamGraph graph, Config config) {
MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" );
MessageStream pageViewCount = graph.getOutputStream("pageViewCount" );
pageView
.partitionBy(m -> m.memberId)
.window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5),
initialValue, (m, c) -> c + 1))
.map(MyStreamOutput::new)
.sendTo(pageViewPerMember);
}
}
Samza SQL
INSERT INTO kafka.pageViewCount
SELECT memberId, count(*) FROM kafka.pageViewStream
GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
Samza API stack
User can choose the API to
write a Samza job.
Why SQL on Samza
• Expand the target audience of stream processing.
• Obtain quick real time insights.
• Create stream processing applications quickly.
How does it work?
How do we execute below SQL on Samza?
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
High level architecture
Samza SQL to Calcite relational algebra
INSERT INTO kafka.NewLinkedInEmployees
SELECT firstName, lastName FROM kafka.profileChange
WHERE profile.newCompany = ‘LinkedIn’
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
profileChange
.filter(p -> p.getNewCompany().equals("LinkedIn"))
.map(this::getFirstAndLastName)
.sendTo(newLinkedInEmployees);
Samza SQL message flow
Samza SQL message flow
Samza SQL rel message format
public class SamzaSqlRelMessage {
private final List<Object> relFieldValues = new ArrayList<>();
private final List<String> relFieldNames = new ArrayList<>();
public List<String> getRelFieldNames() {
return relFieldNames;
}
public List<Object> getRelFieldValues() {
return this.relFieldValues;
}
}
• Simple relational format that represents a row in a table
• Ordered list of named values
Pluggable input/output resolvers
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
Samza SQL architecture
Demo
Demo setup
How do you use it?
• Samza SQL is available in Samza 0.14 release.
• Tutorial – http://bit.ly/samzasql
Samza– 0.14
• Samza SQL
• Projection, Filtering, UDFs, Flatten, Union, Avro
• Apache Beam runner for Samza
• Azure EventHub support
• Amazon kinesis support
• Multi stage batch support
• High level API improvements
• Durable state
• Programmable SerDe
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Questions?
Stream Processing using Samza SQL
Thank you
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Pluggable schema and message converters

More Related Content

PPTX
Samza 0.13 meetup slide v1.0.pptx
PDF
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
PDF
ksqlDB: A Stream-Relational Database System
PDF
ApacheCon BigData Europe 2015
PDF
Exactly-once Data Processing with Kafka Streams - July 27, 2017
PDF
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
PDF
KSQL: Streaming SQL for Kafka
Samza 0.13 meetup slide v1.0.pptx
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
ksqlDB: A Stream-Relational Database System
ApacheCon BigData Europe 2015
Exactly-once Data Processing with Kafka Streams - July 27, 2017
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
KSQL: Streaming SQL for Kafka

What's hot (20)

PDF
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
PDF
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
PDF
Getting Started with Confluent Schema Registry
PPTX
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
PPTX
Going Reactive with Spring 5
PDF
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
PDF
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
PDF
Diving into the Deep End - Kafka Connect
PDF
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Kafka Streams: the easiest way to start with stream processing
PPTX
Kick your database_to_the_curb_reston_08_27_19
PPTX
How to manage large amounts of data with akka streams
PPTX
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
PDF
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
PDF
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
PPTX
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
PDF
Actors or Not: Async Event Architectures
PPTX
Apache Incubator Samza: Stream Processing at LinkedIn
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
Getting Started with Confluent Schema Registry
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Going Reactive with Spring 5
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Diving into the Deep End - Kafka Connect
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
Performance Tuning RocksDB for Kafka Streams’ State Stores
Kafka Streams: the easiest way to start with stream processing
Kick your database_to_the_curb_reston_08_27_19
How to manage large amounts of data with akka streams
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Actors or Not: Async Event Architectures
Apache Incubator Samza: Stream Processing at LinkedIn
Ad

Similar to Stream Processing using Samza SQL (17)

PDF
SamzaSQL QCon'16 presentation
PDF
Scalable Stream Processing with Apache Samza
PDF
Apache Samza 1.0 - What's New, What's Next
PPTX
Samza Demo @scale 2017
PPTX
Samza la hug
PPTX
Samza tech talk_2015 - huawei
PPTX
LinkedIn-Teradata Summit feb 25, 2015
PDF
Scaling up Near Real-time Analytics @Uber &LinkedIn
PDF
Samza at LinkedIn
PDF
ApacheCon BigData - What it takes to process a trillion events a day?
PPTX
stream-processing-at-linkedin-with-apache-samza
PPTX
Samza at LinkedIn: Taking Stream Processing to the Next Level
PPTX
Apache samza past, present and future
PPTX
StatSever-Samza: Near Real-Time Analytics
PDF
A memory capacity model for high performing data-filtering applications in Sa...
PDF
Apache Samza Past, Present and Future
POTX
Nextcon samza preso july - final
SamzaSQL QCon'16 presentation
Scalable Stream Processing with Apache Samza
Apache Samza 1.0 - What's New, What's Next
Samza Demo @scale 2017
Samza la hug
Samza tech talk_2015 - huawei
LinkedIn-Teradata Summit feb 25, 2015
Scaling up Near Real-time Analytics @Uber &LinkedIn
Samza at LinkedIn
ApacheCon BigData - What it takes to process a trillion events a day?
stream-processing-at-linkedin-with-apache-samza
Samza at LinkedIn: Taking Stream Processing to the Next Level
Apache samza past, present and future
StatSever-Samza: Near Real-Time Analytics
A memory capacity model for high performing data-filtering applications in Sa...
Apache Samza Past, Present and Future
Nextcon samza preso july - final
Ad

Recently uploaded (20)

PPTX
Construction Project Organization Group 2.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
Artificial Intelligence
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
DOCX
573137875-Attendance-Management-System-original
PPTX
Internet of Things (IOT) - A guide to understanding
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
Construction Project Organization Group 2.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
Artificial Intelligence
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
573137875-Attendance-Management-System-original
Internet of Things (IOT) - A guide to understanding
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)
CH1 Production IntroductoryConcepts.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Safety Seminar civil to be ensured for safe working.
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding

Stream Processing using Samza SQL

  • 2. Agenda 1 What is Samza SQL? 2 Why SQL on Samza? 3 How does it work? 4 Demo 5 Q&A
  • 5. Samza SQL by Example Count page views of each member in a five minute window. Send the result to kafka topic PageViewCount.
  • 6. Samza low level task API Repartitioner Job public class PageViewRepartitioner implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pvMemberId"); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String key = pageViewEvent.getMemberId(); OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent); collector.send(outMessage); } }
  • 7. Samza low level task API (contd.) Page view counter job public class PageViewCounter implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pageviewCount"); private Instant lastTriggerTime = Instant.now(); private HashMap<String, Integer> counter = new HashMap<>(); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String memberId = pageViewEvent.getMemberId(); counter.put(memberId, counter.getOrDefault(memberId, 0) + 1); if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) { counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value))); counter.clear(); } } }
  • 8. Samza high level API public class PageViewCountApplication implements StreamApplication { @Override public void init(StreamGraph graph, Config config) { MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" ); MessageStream pageViewCount = graph.getOutputStream("pageViewCount" ); pageView .partitionBy(m -> m.memberId) .window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5), initialValue, (m, c) -> c + 1)) .map(MyStreamOutput::new) .sendTo(pageViewPerMember); } }
  • 9. Samza SQL INSERT INTO kafka.pageViewCount SELECT memberId, count(*) FROM kafka.pageViewStream GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
  • 10. Samza API stack User can choose the API to write a Samza job.
  • 11. Why SQL on Samza • Expand the target audience of stream processing. • Obtain quick real time insights. • Create stream processing applications quickly.
  • 12. How does it work?
  • 13. How do we execute below SQL on Samza? INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 15. Samza SQL to Calcite relational algebra INSERT INTO kafka.NewLinkedInEmployees SELECT firstName, lastName FROM kafka.profileChange WHERE profile.newCompany = ‘LinkedIn’ LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 16. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan profileChange .filter(p -> p.getNewCompany().equals("LinkedIn")) .map(this::getFirstAndLastName) .sendTo(newLinkedInEmployees);
  • 19. Samza SQL rel message format public class SamzaSqlRelMessage { private final List<Object> relFieldValues = new ArrayList<>(); private final List<String> relFieldNames = new ArrayList<>(); public List<String> getRelFieldNames() { return relFieldNames; } public List<Object> getRelFieldValues() { return this.relFieldValues; } } • Simple relational format that represents a row in a table • Ordered list of named values
  • 20. Pluggable input/output resolvers INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 22. Demo
  • 24. How do you use it? • Samza SQL is available in Samza 0.14 release. • Tutorial – http://bit.ly/samzasql
  • 25. Samza– 0.14 • Samza SQL • Projection, Filtering, UDFs, Flatten, Union, Avro • Apache Beam runner for Samza • Azure EventHub support • Amazon kinesis support • Multi stage batch support • High level API improvements • Durable state • Programmable SerDe
  • 26. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 27. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 31. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 32. Pluggable schema and message converters