SlideShare a Scribd company logo
Stream and Batch
Processing in the Cloud with
Data Microservices
Marius Bogoevici and Mark Fisher, Pivotal
● Use Cases
○ Predictive maintenance
○ Fraud detection
○ QoS measurement
○ Log analysis
● High throughput/low latency
○ Growing quantities of data
○ Immediate response is required
● Grouping and ordering of data
○ Partitioning
○ Windowing
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservices
● Use Cases
○ ETL
○ Account Reconciliation
○ Machine Learning (e.g. model updates)
● Periodic activities
● Finite datasets
● Retry, Skip, Stop, Restart
● Dynamic resource allocation
● Increasing demand for the realm of batch processing use-cases to move to
real-time (“aka Stream Processing”)
● Huge quantities of data to be analyzed efficiently
● Scaling requirements
○ Massive storage
○ Massive computing power (memory/CPU)
○ Massive scalability, from a few machines to data center level
● Reliance on platform’s resource management abilities
○ public and private cloud: AWS
○ cluster managers: Apache YARN, Apache Mesos, Kubernetes
○ full application platforms: Cloud Foundry
Stream and Batch Processing in the Cloud with Data Microservices
● Microservice pattern applied to data processing applications
● Typical benefits:
○ scalability, isolation, agility, continuous deployment, operational control
● Tuning process-specific resources
○ Instance count
○ Memory
○ CPU
● Event-driven
Stream and Batch Processing in the Cloud with Data Microservices
Demo
dataflow:> stream create demo --definition “http | file”
Spring Cloud Stream Spring Cloud Task
Spring Cloud Stream
Modules
Spring Cloud Task
Modules
Spring Cloud Data Flow
Local
Data Flow Server
Cloud Foundry
Data Flow Server
Apache Yarn
Data Flow Server
Apache Mesos
Data Flow Server
Kubernetes
Data Flow Server
Spring BootSpring Integration Spring Batch
Data Flow Shell Data Flow UIREST client, CURL, etc.
Spring Cloud Stream
● Event-driven microservice framework
● Built on Spring stack:
○ Spring Boot: full-stack standalone apps, configuration
○ Spring Integration: messaging primitives and enterprise integration patterns
● Simplify access to middleware
● Common abstractions
○ Middleware binding
○ Consumer groups
○ Partitioning
○ Pluggable Binder API
Spring Cloud Stream in a nutshell
Programming model
@EnableBinding(Processor.class)
public class UpperCase {
@Transformer(inputChannel = Processor.INPUT, outputChannel=Processor.OUTPUT)
public String process(String message) {
return message.toUpperCase();
}
}
package org.springframework.cloud.stream.messaging;
public interface Processor {
String INPUT = "input";
String OUTPUT = "output";
@Input(Processor.INPUT)
SubscribableChannel input();
@Output(Processor.OUTPUT)
MessageChannel output();
}
Event-driven model, publish subscribe semantics
● Published data broadcast to all subscribers
● Reduce data pipeline complexity
● Fits both data streaming and event-driven microservice use cases
Consumer groups
● Borrowed from Kafka, applied across all binders
● Groups of competing consumers within the pub-sub architecture
● Used in scaling and partitioning
Partitioning
● Required for stateful processing scenarios involving data groups (e.g. average calculation)
● Outputs can specify a data partitioning strategy: SpEL, own implementation
● Inputs can be bound to a specific partition
Binder SPI
Binder Implementations
Other implementations: Redis, Gemfire, … your own!
Spring Cloud Task
@SpringBootApplication
@EnableTask
public class MyApp {
@Bean
public MyTaskApplication myTask() {
return new MyTaskApplication();
}
public static void main(String[] args) {
SpringApplication.run(MyApp.class);
}
public static class MyTaskApplication implements CommandLineRunner {
@Override
public void run(String... strings) throws Exception {
System.out.println("Hello World");
}
}
}
● task can be
deployed, executed
and removed on
demand
● result of the
process persists
beyond the life of
the task for future
reporting
Spring Cloud Data Flow
● Orchestration Layer for Streams and Tasks
○ DSL
○ Repositories for Stream and Task Definitions
○ REST API
○ Shell
○ UI
● SPI for Deployment and Lifecycle Management
○ Load Balance
○ Scale Up/Down
○ Allocate Resources
○ Check Status
Data Flow Developer Experience
dataflow:> module register --name uppercase --type processor --coordinates group:artifact:version
dataflow:> stream create demo --definition
"http --server.port=9000 | uppercase | file --directory=/tmp/devnexus"
1: Implement Spring Cloud Stream Microservice App:
2: Build and Install:
$ mvn clean install
3: Register Module with Data Flow:
4: Define Stream via DSL:
@EnableBinding(Processor.class)
public class UpperCase {
@Transformer(inputChannel = Processor.INPUT, outputChannel=Processor.OUTPUT)
public String process(String message) {
return message.toUpperCase();
}
}
Wire Tap
dataflow:> stream create tap --definition ":demo.http > counter --store=redis"
dataflow:> stream create demo --definition
"http --server.port=9000 | uppercase | file --directory=/tmp/devnexus"
Launching Tasks via Data Flow
dataflow:> task create task1 --definition timestamp
dataflow:> task launch task1
dataflow:> task execution list
https://github.com/spring-cloud/spring-cloud-dataflow-admin-cloudfoundry
https://github.com/spring-cloud/spring-cloud-dataflow-admin-yarn
https://github.com/spring-cloud/spring-cloud-dataflow-admin-mesos
https://github.com/spring-cloud/spring-cloud-dataflow-admin-kubernetes
Deployer SPI
● deploy Spring Cloud Stream apps
● deploy Spring Cloud Task apps
● in both cases, pass Spring Boot
Configuration Properties in an
appropriate way for the target platform
● support for checking status of individual
apps as well as app group (e.g. stream)
Links
● http://cloud.spring.io/spring-cloud-stream
● https://github.com/spring-cloud/spring-cloud-stream
● http://cloud.spring.io/spring-cloud-task
● https://github.com/spring-cloud/spring-cloud-task
● http://cloud.spring.io/spring-cloud-dataflow
● https://github.com/spring-cloud/spring-cloud-dataflow
Stream and Batch Processing in the Cloud with Data Microservices

More Related Content

PDF
Clean architectures with fast api pycones
PDF
PPTX
Prometheus and Grafana
PDF
Slide DevSecOps Microservices
PDF
Containerising the Mule Runtime with Kubernetes & From Zero to Batch : MuleS...
PPTX
Formation Agile Scrum
PDF
Agile vs Waterfall | Difference between Agile and Waterfall | Edureka
PPTX
Introduction to blockchain & cryptocurrencies
Clean architectures with fast api pycones
Prometheus and Grafana
Slide DevSecOps Microservices
Containerising the Mule Runtime with Kubernetes & From Zero to Batch : MuleS...
Formation Agile Scrum
Agile vs Waterfall | Difference between Agile and Waterfall | Edureka
Introduction to blockchain & cryptocurrencies

What's hot (20)

PPTX
Adopting OpenTelemetry
PDF
stupid-simple-kubernetes-final.pdf
PPTX
Microservice architecture design principles
PDF
Open Policy Agent
PDF
Kubernetes Networking
PDF
Jenkins Workflow
PDF
Scaling DevSecOps Culture for Enterprise
PDF
Istio service mesh introduction
PDF
DevSecOps and the CI/CD Pipeline
PDF
Methods of Organizational Change Management
PDF
Get started with gitops and flux
PDF
Apache Cassandra - Einführung
PPSX
Blockchain HyperLedger Fabric Internals - Clavent
PDF
Tendermint in a nutshell
PDF
Introduction to docker and docker compose
PPTX
Azure dev ops
PDF
Azure DevOps Presentation
PPTX
Azure DevOps CI/CD For Beginners
PDF
Overview of secret management solutions and architecture
Adopting OpenTelemetry
stupid-simple-kubernetes-final.pdf
Microservice architecture design principles
Open Policy Agent
Kubernetes Networking
Jenkins Workflow
Scaling DevSecOps Culture for Enterprise
Istio service mesh introduction
DevSecOps and the CI/CD Pipeline
Methods of Organizational Change Management
Get started with gitops and flux
Apache Cassandra - Einführung
Blockchain HyperLedger Fabric Internals - Clavent
Tendermint in a nutshell
Introduction to docker and docker compose
Azure dev ops
Azure DevOps Presentation
Azure DevOps CI/CD For Beginners
Overview of secret management solutions and architecture
Ad

Viewers also liked (19)

PDF
Stream Processing in the Cloud With Data Microservices
PDF
Death of the batch job
PDF
Data Microservices with Spring Cloud Stream, Task, and Data Flow #jsug #spri...
PDF
Developing real-time data pipelines with Spring and Kafka
PDF
Stream is the new Batch
PDF
JUG - Soup to Nuts with Self Contained Systems
PDF
Introduction to Real-time data processing
PDF
Event Driven Microservices with Spring Cloud Stream #jjug_ccc #ccc_ab3
PPTX
Data Microservices In The Cloud + 日本語コメント
PPTX
Spring batch introduction
PDF
Data Lake and the rise of the microservices
PPTX
MongoDB in a Mainframe World
PDF
SpringOne 2GX 2014 参加報告 & Spring 4.1について #jsug
PDF
Service Discovery. Spring Cloud Internals
PDF
Lessons Learned From Running Spark On Docker
PPT
PDF
Spring Day 2016 - Web API アクセス制御の最適解
PDF
Developing applications with a microservice architecture (SVforum, microservi...
PPT
Types of Data Processing
Stream Processing in the Cloud With Data Microservices
Death of the batch job
Data Microservices with Spring Cloud Stream, Task, and Data Flow #jsug #spri...
Developing real-time data pipelines with Spring and Kafka
Stream is the new Batch
JUG - Soup to Nuts with Self Contained Systems
Introduction to Real-time data processing
Event Driven Microservices with Spring Cloud Stream #jjug_ccc #ccc_ab3
Data Microservices In The Cloud + 日本語コメント
Spring batch introduction
Data Lake and the rise of the microservices
MongoDB in a Mainframe World
SpringOne 2GX 2014 参加報告 & Spring 4.1について #jsug
Service Discovery. Spring Cloud Internals
Lessons Learned From Running Spark On Docker
Spring Day 2016 - Web API アクセス制御の最適解
Developing applications with a microservice architecture (SVforum, microservi...
Types of Data Processing
Ad

Similar to Stream and Batch Processing in the Cloud with Data Microservices (20)

PDF
Apache Samza 1.0 - What's New, What's Next
PDF
Distributed real time stream processing- why and how
PPTX
Springboot Microservices
PDF
PDF
Dataservices: Processing (Big) Data the Microservice Way
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PDF
Microservices @ Work - A Practice Report of Developing Microservices
PDF
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
PPTX
Stream processing - Apache flink
ODP
Developing Microservices using Spring - Beginner's Guide
PPTX
Task programming
PPTX
Building a Just-in-Time Application Stack for Analysts
PPTX
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
PDF
Dataservices - Processing Big Data The Microservice Way
PPTX
High Throughput Analytics with Cassandra & Azure
PPTX
Fabric - Realtime stream processing framework
PDF
Day in the life event-driven workshop
PDF
Logisland "Event Mining at scale"
PPT
Giga Spaces Data Grid / Data Caching Overview
PPTX
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...
Apache Samza 1.0 - What's New, What's Next
Distributed real time stream processing- why and how
Springboot Microservices
Dataservices: Processing (Big) Data the Microservice Way
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Microservices @ Work - A Practice Report of Developing Microservices
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Stream processing - Apache flink
Developing Microservices using Spring - Beginner's Guide
Task programming
Building a Just-in-Time Application Stack for Analysts
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Dataservices - Processing Big Data The Microservice Way
High Throughput Analytics with Cassandra & Azure
Fabric - Realtime stream processing framework
Day in the life event-driven workshop
Logisland "Event Mining at scale"
Giga Spaces Data Grid / Data Caching Overview
djypllh5r1gjbaekxgwv-signature-cc6692615bbc55079760b9b0c6636bc58ec509cd0446cb...

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Cloud computing and distributed systems.

Stream and Batch Processing in the Cloud with Data Microservices

  • 1. Stream and Batch Processing in the Cloud with Data Microservices Marius Bogoevici and Mark Fisher, Pivotal
  • 2. ● Use Cases ○ Predictive maintenance ○ Fraud detection ○ QoS measurement ○ Log analysis ● High throughput/low latency ○ Growing quantities of data ○ Immediate response is required ● Grouping and ordering of data ○ Partitioning ○ Windowing Stream and Batch Processing in the Cloud with Data Microservices
  • 3. Stream and Batch Processing in the Cloud with Data Microservices ● Use Cases ○ ETL ○ Account Reconciliation ○ Machine Learning (e.g. model updates) ● Periodic activities ● Finite datasets ● Retry, Skip, Stop, Restart ● Dynamic resource allocation ● Increasing demand for the realm of batch processing use-cases to move to real-time (“aka Stream Processing”)
  • 4. ● Huge quantities of data to be analyzed efficiently ● Scaling requirements ○ Massive storage ○ Massive computing power (memory/CPU) ○ Massive scalability, from a few machines to data center level ● Reliance on platform’s resource management abilities ○ public and private cloud: AWS ○ cluster managers: Apache YARN, Apache Mesos, Kubernetes ○ full application platforms: Cloud Foundry Stream and Batch Processing in the Cloud with Data Microservices
  • 5. ● Microservice pattern applied to data processing applications ● Typical benefits: ○ scalability, isolation, agility, continuous deployment, operational control ● Tuning process-specific resources ○ Instance count ○ Memory ○ CPU ● Event-driven Stream and Batch Processing in the Cloud with Data Microservices
  • 6. Demo dataflow:> stream create demo --definition “http | file”
  • 7. Spring Cloud Stream Spring Cloud Task Spring Cloud Stream Modules Spring Cloud Task Modules Spring Cloud Data Flow Local Data Flow Server Cloud Foundry Data Flow Server Apache Yarn Data Flow Server Apache Mesos Data Flow Server Kubernetes Data Flow Server Spring BootSpring Integration Spring Batch Data Flow Shell Data Flow UIREST client, CURL, etc.
  • 8. Spring Cloud Stream ● Event-driven microservice framework ● Built on Spring stack: ○ Spring Boot: full-stack standalone apps, configuration ○ Spring Integration: messaging primitives and enterprise integration patterns ● Simplify access to middleware ● Common abstractions ○ Middleware binding ○ Consumer groups ○ Partitioning ○ Pluggable Binder API
  • 9. Spring Cloud Stream in a nutshell
  • 10. Programming model @EnableBinding(Processor.class) public class UpperCase { @Transformer(inputChannel = Processor.INPUT, outputChannel=Processor.OUTPUT) public String process(String message) { return message.toUpperCase(); } } package org.springframework.cloud.stream.messaging; public interface Processor { String INPUT = "input"; String OUTPUT = "output"; @Input(Processor.INPUT) SubscribableChannel input(); @Output(Processor.OUTPUT) MessageChannel output(); }
  • 11. Event-driven model, publish subscribe semantics ● Published data broadcast to all subscribers ● Reduce data pipeline complexity ● Fits both data streaming and event-driven microservice use cases
  • 12. Consumer groups ● Borrowed from Kafka, applied across all binders ● Groups of competing consumers within the pub-sub architecture ● Used in scaling and partitioning
  • 13. Partitioning ● Required for stateful processing scenarios involving data groups (e.g. average calculation) ● Outputs can specify a data partitioning strategy: SpEL, own implementation ● Inputs can be bound to a specific partition
  • 15. Binder Implementations Other implementations: Redis, Gemfire, … your own!
  • 16. Spring Cloud Task @SpringBootApplication @EnableTask public class MyApp { @Bean public MyTaskApplication myTask() { return new MyTaskApplication(); } public static void main(String[] args) { SpringApplication.run(MyApp.class); } public static class MyTaskApplication implements CommandLineRunner { @Override public void run(String... strings) throws Exception { System.out.println("Hello World"); } } } ● task can be deployed, executed and removed on demand ● result of the process persists beyond the life of the task for future reporting
  • 17. Spring Cloud Data Flow ● Orchestration Layer for Streams and Tasks ○ DSL ○ Repositories for Stream and Task Definitions ○ REST API ○ Shell ○ UI ● SPI for Deployment and Lifecycle Management ○ Load Balance ○ Scale Up/Down ○ Allocate Resources ○ Check Status
  • 18. Data Flow Developer Experience dataflow:> module register --name uppercase --type processor --coordinates group:artifact:version dataflow:> stream create demo --definition "http --server.port=9000 | uppercase | file --directory=/tmp/devnexus" 1: Implement Spring Cloud Stream Microservice App: 2: Build and Install: $ mvn clean install 3: Register Module with Data Flow: 4: Define Stream via DSL: @EnableBinding(Processor.class) public class UpperCase { @Transformer(inputChannel = Processor.INPUT, outputChannel=Processor.OUTPUT) public String process(String message) { return message.toUpperCase(); } }
  • 19. Wire Tap dataflow:> stream create tap --definition ":demo.http > counter --store=redis" dataflow:> stream create demo --definition "http --server.port=9000 | uppercase | file --directory=/tmp/devnexus"
  • 20. Launching Tasks via Data Flow dataflow:> task create task1 --definition timestamp dataflow:> task launch task1 dataflow:> task execution list
  • 21. https://github.com/spring-cloud/spring-cloud-dataflow-admin-cloudfoundry https://github.com/spring-cloud/spring-cloud-dataflow-admin-yarn https://github.com/spring-cloud/spring-cloud-dataflow-admin-mesos https://github.com/spring-cloud/spring-cloud-dataflow-admin-kubernetes Deployer SPI ● deploy Spring Cloud Stream apps ● deploy Spring Cloud Task apps ● in both cases, pass Spring Boot Configuration Properties in an appropriate way for the target platform ● support for checking status of individual apps as well as app group (e.g. stream)
  • 22. Links ● http://cloud.spring.io/spring-cloud-stream ● https://github.com/spring-cloud/spring-cloud-stream ● http://cloud.spring.io/spring-cloud-task ● https://github.com/spring-cloud/spring-cloud-task ● http://cloud.spring.io/spring-cloud-dataflow ● https://github.com/spring-cloud/spring-cloud-dataflow