SlideShare a Scribd company logo
Stream Processing with Ballerina
S. Suhothayan
Director - Engineering, WSO2
The Problem
○ Integration is not always about request-response.
○ Highly scalable systems use Event Driven Architecture to asynchronously
communicate between multiple processing units.
○ Processing events from Webhooks, CDC, Realtime ETLs, and notification
systems fall into asynchronous event driven systems.
What is a Stream?
An unbounded continuous flow of records (having the same format)
E.g., sensor events, triggers from Webhooks, messages from MQ
Why Stream Processing ?
Doing continuous processing on the data forever !
Such as :
○ Monitor and detect anomalies
○ Real-time ETL
○ Streaming aggregations (e.g., average service response time in last 5
minutes)
○ Join/correlate multiple data streams
○ Detecting complex event patterns or trends
Stream Processing Constructs
○ Projection
○ Modifying the structure of the stream
○ Filter
○ Windows & Aggregations
○ Collection of streaming events over a time or length duration
(last 5 min or last 50 events)
○ Viewed in a sliding or tumbling manner
○ Aggregated over window (e.g., sum, count, min, max, avg, etc)
○ Joins
○ Joining multiple streams
○ Detecting Patterns
○ Trends, non-occurrence of events
How to write Stream Processing Logic?
Use language libraries :
○ Have different functions for each stream processor construct.
○ Pros: You can use the same language for implementation.
○ Cons: Quickly becomes very complex and messy.
User SQL dialog :
○ Use easy-to-use SQL to script the logic
○ Pros: Compact and easy to write the logic.
○ Cons: Need to write UDFs, which SQL does not support.
Solution for Programing Streaming Efficiently
Merging SQL and native programing
1. Consuming events to Ballerina using standard language constructs
○ Via HTTP, HTTP2, WebSocket, JMS and more.
2. Generate streams out of consumed data
○ Map JSON/XML/text messages into a record.
3. Define SQL to manipulate and process data in real time
○ If needed, use Ballerina functions within SQL
4. Generate output streams
5. Use standard language constructs to handle the output or send to an
endpoint
“
Having lots of sensors, among all valid sensors,
detect the sensors that have sent sensor readings
greater than 100 in total within the last minute.
A Use Case
Let’s see some
beautiful code!
Ballerina Stream Processing
Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
Define input and output
record types
Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
}
Define input and output
record types
Function with
input/output Streams
Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
from sensorDataStream
where reading > 0
window time(60000)
select name, sum(reading) as total
group by name
having total > 100
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Among all valid sensors, select
ones having greater than 100 reading
in total within the last minute
Ballerina Stream Processing type Alert record {
string name; int total;
};
type SensorData record {
string name; int reading;
};
function alertQuery(
stream<SensorData> sensorDataStream,
stream<Alert> alertStream) {
forever {
from sensorDataStream
where reading > 0
window time(60000)
select name, sum(reading) as total
group by name
having total > 100
=> (Alert[] alerts) {
alertStream.publish(alerts);
}
}
}
Define input and output
record types
Function with
input/output Streams
Forever block
Among all valid sensors, select
ones having greater than 100 reading
in total within the last minute
Send Alert
Some more queries
Joining Two Streams Over Time
// Detect raw material input falls below 5% of the rate of production consumption
forever {
from productionInputStream window time(10000) as p
join rawMaterialStream window time(10000) as r
on r.name == p.name
select r.name, sum(r.amount) as totalRawMaterial, sum(p.amount) as totalConsumed
group by r.name
having ((totalRawMaterial - totalConsumed) * 100.0 / totalRawMaterial) < 5
=> (MaterialUsage[] materialUsages) {
materialUsageStream.publish(materialUsages);
}
}
Detecting Patterns Within Streams
// Detect small purchase transaction followed by a huge purchase transaction
// from the same card within a day
forever {
from every PurchaseStream where price < 20 as e1
followed by PurchaseStream where price > 200 && e1.id == id as e2
within 1 day
select e1.id as cardId, e1.price as initialPayment, e2.price as finalPayment
=> (Alert[] alerts) {
alertStream.publish(alerts);
}
}
Building Autonomous Services
○ Process incoming messages or
locally produced events
○ Process events at the receiving
node without sending to
centralised system
○ Services can monitor themselves
throw inbuilt matric streams
producing events locally
○ Do local optimizations and take
actions autonomously
Stream Processing at the Edge
○ Support microservices architecture
○ Summarize data at the edge.
○ When possible, take localized decisions.
○ Reduce the amount of data transferred
to the central node.
○ Ability to run independently
○ Highly scalable
The Roadmap
○ Support stream processing to incorporate Ballerina’s custom functions.
○ Building Ballerina Stream Processing using Ballerina.
○ Support streams joining with tables.
○ Improve query language.
○ Support State Recovery.
○ Support High Availability.
Q & A
THANK YOU

More Related Content

PDF
A head start on cloud native event driven applications - bigdatadays
PDF
The Rise of Streaming SQL
PDF
Siddhi - cloud-native stream processor
PDF
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
PDF
Make it fast for everyone - performance and middleware design
PDF
WSO2 Analytics Platform: The one stop shop for all your data needs
PDF
Intelligent integration with WSO2 ESB & WSO2 CEP
PDF
An introduction to the WSO2 Analytics Platform
A head start on cloud native event driven applications - bigdatadays
The Rise of Streaming SQL
Siddhi - cloud-native stream processor
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Make it fast for everyone - performance and middleware design
WSO2 Analytics Platform: The one stop shop for all your data needs
Intelligent integration with WSO2 ESB & WSO2 CEP
An introduction to the WSO2 Analytics Platform

What's hot (20)

PPTX
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
PDF
PPTX
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
PPTX
Google Cloud Spanner Preview
PDF
Cassandra as event sourced journal for big data analytics
PDF
Streaming Operational Data with MariaDB MaxScale
PPTX
Log Events @Twitter
PDF
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
PDF
Story of migrating event pipeline from batch to streaming
PPTX
Open Source india 2014
ODP
Cassandra at Finn.io — May 30th 2013
PPTX
AWS Big Data Demystified #4 data governance demystified [security, networ...
PDF
Introduction to Real-time data processing
PDF
Scaling event aggregation at twitter
PDF
Cassandra data access
PDF
PDF
Argus Production Monitoring at Salesforce
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PPTX
Symantec: Cassandra Data Modelling techniques in action
PPTX
Improve your SQL workload with observability
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Google Cloud Spanner Preview
Cassandra as event sourced journal for big data analytics
Streaming Operational Data with MariaDB MaxScale
Log Events @Twitter
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
Story of migrating event pipeline from batch to streaming
Open Source india 2014
Cassandra at Finn.io — May 30th 2013
AWS Big Data Demystified #4 data governance demystified [security, networ...
Introduction to Real-time data processing
Scaling event aggregation at twitter
Cassandra data access
Argus Production Monitoring at Salesforce
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Symantec: Cassandra Data Modelling techniques in action
Improve your SQL workload with observability
Ad

Similar to Stream Processing with Ballerina (20)

PDF
Stream Processing with Ballerina
PDF
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
PDF
Azure Streaming Analytics: A comprehensive Guide.
PDF
Introducing the WSO2 Complex Event Processor
PDF
Strtio Spark Streaming + Siddhi CEP Engine
PDF
Complex Event Processor 3.0.0 - An overview of upcoming features
PDF
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
ODP
Parallel Complex Event Processing
PDF
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
PDF
Stream Processing Overview
PPTX
INTERNET OF THINGS & AZURE
PDF
Apache Flink Stream Processing
PDF
Reactive Extensions
PPTX
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
PPTX
Reactive Extensions: classic Observer in .NET
PPTX
Fabric - Realtime stream processing framework
PPTX
Inflight to Insights: Real-time Insights with Event Hubs, Stream Analytics an...
PDF
The program will read the file like this, java homework6Bank sma.pdf
PDF
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
PPTX
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Stream Processing with Ballerina
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
Azure Streaming Analytics: A comprehensive Guide.
Introducing the WSO2 Complex Event Processor
Strtio Spark Streaming + Siddhi CEP Engine
Complex Event Processor 3.0.0 - An overview of upcoming features
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Parallel Complex Event Processing
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
Stream Processing Overview
INTERNET OF THINGS & AZURE
Apache Flink Stream Processing
Reactive Extensions
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...
Reactive Extensions: classic Observer in .NET
Fabric - Realtime stream processing framework
Inflight to Insights: Real-time Insights with Event Hubs, Stream Analytics an...
The program will read the file like this, java homework6Bank sma.pdf
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Stream Processing with Ballerina

  • 1. Stream Processing with Ballerina S. Suhothayan Director - Engineering, WSO2
  • 2. The Problem ○ Integration is not always about request-response. ○ Highly scalable systems use Event Driven Architecture to asynchronously communicate between multiple processing units. ○ Processing events from Webhooks, CDC, Realtime ETLs, and notification systems fall into asynchronous event driven systems.
  • 3. What is a Stream? An unbounded continuous flow of records (having the same format) E.g., sensor events, triggers from Webhooks, messages from MQ
  • 4. Why Stream Processing ? Doing continuous processing on the data forever ! Such as : ○ Monitor and detect anomalies ○ Real-time ETL ○ Streaming aggregations (e.g., average service response time in last 5 minutes) ○ Join/correlate multiple data streams ○ Detecting complex event patterns or trends
  • 5. Stream Processing Constructs ○ Projection ○ Modifying the structure of the stream ○ Filter ○ Windows & Aggregations ○ Collection of streaming events over a time or length duration (last 5 min or last 50 events) ○ Viewed in a sliding or tumbling manner ○ Aggregated over window (e.g., sum, count, min, max, avg, etc) ○ Joins ○ Joining multiple streams ○ Detecting Patterns ○ Trends, non-occurrence of events
  • 6. How to write Stream Processing Logic? Use language libraries : ○ Have different functions for each stream processor construct. ○ Pros: You can use the same language for implementation. ○ Cons: Quickly becomes very complex and messy. User SQL dialog : ○ Use easy-to-use SQL to script the logic ○ Pros: Compact and easy to write the logic. ○ Cons: Need to write UDFs, which SQL does not support.
  • 7. Solution for Programing Streaming Efficiently Merging SQL and native programing 1. Consuming events to Ballerina using standard language constructs ○ Via HTTP, HTTP2, WebSocket, JMS and more. 2. Generate streams out of consumed data ○ Map JSON/XML/text messages into a record. 3. Define SQL to manipulate and process data in real time ○ If needed, use Ballerina functions within SQL 4. Generate output streams 5. Use standard language constructs to handle the output or send to an endpoint
  • 8. “ Having lots of sensors, among all valid sensors, detect the sensors that have sent sensor readings greater than 100 in total within the last minute. A Use Case
  • 11. Ballerina Stream Processing type Alert record { string name; int total; }; type SensorData record { string name; int reading; }; Define input and output record types
  • 12. Ballerina Stream Processing type Alert record { string name; int total; }; type SensorData record { string name; int reading; }; function alertQuery( stream<SensorData> sensorDataStream, stream<Alert> alertStream) { } Define input and output record types Function with input/output Streams
  • 13. Ballerina Stream Processing type Alert record { string name; int total; }; type SensorData record { string name; int reading; }; function alertQuery( stream<SensorData> sensorDataStream, stream<Alert> alertStream) { forever { } } Define input and output record types Function with input/output Streams Forever block
  • 14. Ballerina Stream Processing type Alert record { string name; int total; }; type SensorData record { string name; int reading; }; function alertQuery( stream<SensorData> sensorDataStream, stream<Alert> alertStream) { forever { from sensorDataStream where reading > 0 window time(60000) select name, sum(reading) as total group by name having total > 100 } } Define input and output record types Function with input/output Streams Forever block Among all valid sensors, select ones having greater than 100 reading in total within the last minute
  • 15. Ballerina Stream Processing type Alert record { string name; int total; }; type SensorData record { string name; int reading; }; function alertQuery( stream<SensorData> sensorDataStream, stream<Alert> alertStream) { forever { from sensorDataStream where reading > 0 window time(60000) select name, sum(reading) as total group by name having total > 100 => (Alert[] alerts) { alertStream.publish(alerts); } } } Define input and output record types Function with input/output Streams Forever block Among all valid sensors, select ones having greater than 100 reading in total within the last minute Send Alert
  • 17. Joining Two Streams Over Time // Detect raw material input falls below 5% of the rate of production consumption forever { from productionInputStream window time(10000) as p join rawMaterialStream window time(10000) as r on r.name == p.name select r.name, sum(r.amount) as totalRawMaterial, sum(p.amount) as totalConsumed group by r.name having ((totalRawMaterial - totalConsumed) * 100.0 / totalRawMaterial) < 5 => (MaterialUsage[] materialUsages) { materialUsageStream.publish(materialUsages); } }
  • 18. Detecting Patterns Within Streams // Detect small purchase transaction followed by a huge purchase transaction // from the same card within a day forever { from every PurchaseStream where price < 20 as e1 followed by PurchaseStream where price > 200 && e1.id == id as e2 within 1 day select e1.id as cardId, e1.price as initialPayment, e2.price as finalPayment => (Alert[] alerts) { alertStream.publish(alerts); } }
  • 19. Building Autonomous Services ○ Process incoming messages or locally produced events ○ Process events at the receiving node without sending to centralised system ○ Services can monitor themselves throw inbuilt matric streams producing events locally ○ Do local optimizations and take actions autonomously
  • 20. Stream Processing at the Edge ○ Support microservices architecture ○ Summarize data at the edge. ○ When possible, take localized decisions. ○ Reduce the amount of data transferred to the central node. ○ Ability to run independently ○ Highly scalable
  • 21. The Roadmap ○ Support stream processing to incorporate Ballerina’s custom functions. ○ Building Ballerina Stream Processing using Ballerina. ○ Support streams joining with tables. ○ Improve query language. ○ Support State Recovery. ○ Support High Availability.
  • 22. Q & A