SlideShare a Scribd company logo
Pulsar Virtual Summit Europe 2021
Supporting the
Entire Lifecycle of
Streaming Data
Pulsar Virtual Summit Europe 2021
Matteo Merli
CTO @ StreamNative
Co-Creator and PMC Chair for Apache Pulsar
PMC Member Apache BookKeeper
Prev: Splunk, Streamlio, Yahoo
Pulsar Virtual Summit Europe 2021
Pulsar and the data in motion
Messaging
Message passing
between components,
application, services
Streaming
Analyze events that just
happened
Pulsar Virtual Summit Europe 2021
Use Cases
Messaging
● OLTP, Integration
○ Main challenges:
○ Latency
○ Availability
○ Data durability
○ High level features
○ Routing, DLQ, delays,
individual acks
Streaming
● Real-time analytics
● Main challenges:
○ Throughput
○ Ordering
○ Stateful processing
○ Batch + Real-Time
Pulsar Virtual Summit Europe 2021
How can Pulsar support both?
Scalable Log Storage
+
Flexible messaging semantics
1.
They get applied in different stages
of handling the same data
Why is messaging + streaming so important?
2.
Integration between different
systems is often:
complex, fragile and inefficient
Why is messaging + streaming so important?
Online & Offline
Online: Microservices
Offline: Streaming Analytics
Interactions between online & offline
Interactions are complex
1. Who is responsible for getting a feed of events?
2. Where is the data stored?
3. How can we feed updates back to online data stores?
4. What happens when systems are not available?
5. How is the schema of data enforced?
6. What is the security model?
If we could just share a single data platform…
Combining microservices and analytics
1. Different kind of services can all share the same Pulsar
cluster
2. Decouple services through topics
3. Provide isolation & availability guarantees
How to tackle integration points
1. Single tooling
2. Supports many different APIs
3. Unified AuthN & AuthZ for access to data
4. Supports end-to-end schema validation
Using Pulsar as the single data platform
1. Keep 1 copy of the data
2. Single source of truth
3. No single component is the “owner” of the data
4. Consumer components can get access directly to the
source
5. There is no need for additional ad-hoc integrations
Removing “data ownership”
The data life-cycle
The data can reside in Pulsar for its entire life-
cycle, since its inception and to the long-term
storage.
The data life-cycle
1. Events are happening — (real-time)
2. Streaming Analytics — (< 1 second)
3. Data replay — (1 hour / days)
4. Long term storage and Batch Processing —
(days / months)
Managing the data life-cycle
1. Store the data only once
2. Make it available to all interested parties
3. Able to hold the data for extended time
Managing the data life-cycle
Pulsar provides infinite stream-storage
abstraction
a. Low latency writes
b. Isolation between tail read and catch-up
c. Long term tiered storage
Isolating the different workloads
Every aspect of Pulsar is designed for multi-
tenancy and multi-workloads:
● IO Isolation
● Limits to access to resources: throttling, quotas,
etc..
Putting it all together
Conclusion
Pulsar is uniquely suited to be the data
platform that powers all the data in motion
use cases and all their interactions

More Related Content

PDF
Hands-on Workshop: Apache Pulsar
PDF
Apache Pulsar Seattle - Meetup
PDF
Stream or segment : what is the best way to access your events in Pulsar_Neng
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
PDF
Apache pulsar - storage architecture
PDF
Creating Data Fabric for #IOT with Apache Pulsar
PDF
Open Source Bristol 30 March 2022
Hands-on Workshop: Apache Pulsar
Apache Pulsar Seattle - Meetup
Stream or segment : what is the best way to access your events in Pulsar_Neng
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Apache pulsar - storage architecture
Creating Data Fabric for #IOT with Apache Pulsar
Open Source Bristol 30 March 2022

What's hot (20)

PDF
Building event streaming pipelines using Apache Pulsar
PDF
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
PPTX
Apache Pulsar First Overview
PPTX
I Heart Log: Real-time Data and Apache Kafka
PPTX
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
PPTX
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia...
PDF
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
PDF
High performance messaging with Apache Pulsar
PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
PPTX
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
PPTX
High cardinality time series search: A new level of scale - Data Day Texas 2016
PDF
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
PPTX
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
PDF
Introduction to Apache BookKeeper Distributed Storage
PDF
Pulsar - Distributed pub/sub platform
PDF
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
PDF
Apache Pulsar at Yahoo! Japan
PDF
Serverless Event Streaming with Pulsar Functions
PPTX
Apache kafka
Building event streaming pipelines using Apache Pulsar
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Apache Pulsar First Overview
I Heart Log: Real-time Data and Apache Kafka
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
Deploying Machine Learning Models with Pulsar Functions - Pulsar Summit Asia...
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
High performance messaging with Apache Pulsar
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
High cardinality time series search: A new level of scale - Data Day Texas 2016
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Introduction to Apache BookKeeper Distributed Storage
Pulsar - Distributed pub/sub platform
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Apache Pulsar at Yahoo! Japan
Serverless Event Streaming with Pulsar Functions
Apache kafka
Ad

Similar to Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data (20)

PDF
Open keynote_carolyn&matteo&sijie
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
Timothy Spann: Apache Pulsar for ML
PDF
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
PDF
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
PDF
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
PDF
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
PDF
Serverless Event Streaming Applications as Functionson K8
PPTX
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
PDF
OSA Con 2022: Streaming Data Made Easy
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
Music city data Hail Hydrate! from stream to lake
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Open keynote_carolyn&matteo&sijie
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann: Apache Pulsar for ML
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Serverless Event Streaming Applications as Functionson K8
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Cloud lunch and learn real-time streaming in azure
bigdata 2022_ FLiP Into Pulsar Apps
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Music city data Hail Hydrate! from stream to lake
Building an Event Streaming Architecture with Apache Pulsar
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
PDF
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
Teaching material agriculture food technology
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data

  • 1. Pulsar Virtual Summit Europe 2021 Supporting the Entire Lifecycle of Streaming Data
  • 2. Pulsar Virtual Summit Europe 2021 Matteo Merli CTO @ StreamNative Co-Creator and PMC Chair for Apache Pulsar PMC Member Apache BookKeeper Prev: Splunk, Streamlio, Yahoo
  • 3. Pulsar Virtual Summit Europe 2021 Pulsar and the data in motion Messaging Message passing between components, application, services Streaming Analyze events that just happened
  • 4. Pulsar Virtual Summit Europe 2021 Use Cases Messaging ● OLTP, Integration ○ Main challenges: ○ Latency ○ Availability ○ Data durability ○ High level features ○ Routing, DLQ, delays, individual acks Streaming ● Real-time analytics ● Main challenges: ○ Throughput ○ Ordering ○ Stateful processing ○ Batch + Real-Time
  • 5. Pulsar Virtual Summit Europe 2021 How can Pulsar support both? Scalable Log Storage + Flexible messaging semantics
  • 6. 1. They get applied in different stages of handling the same data Why is messaging + streaming so important?
  • 7. 2. Integration between different systems is often: complex, fragile and inefficient Why is messaging + streaming so important?
  • 12. Interactions are complex 1. Who is responsible for getting a feed of events? 2. Where is the data stored? 3. How can we feed updates back to online data stores? 4. What happens when systems are not available? 5. How is the schema of data enforced? 6. What is the security model?
  • 13. If we could just share a single data platform…
  • 15. 1. Different kind of services can all share the same Pulsar cluster 2. Decouple services through topics 3. Provide isolation & availability guarantees How to tackle integration points
  • 16. 1. Single tooling 2. Supports many different APIs 3. Unified AuthN & AuthZ for access to data 4. Supports end-to-end schema validation Using Pulsar as the single data platform
  • 17. 1. Keep 1 copy of the data 2. Single source of truth 3. No single component is the “owner” of the data 4. Consumer components can get access directly to the source 5. There is no need for additional ad-hoc integrations Removing “data ownership”
  • 18. The data life-cycle The data can reside in Pulsar for its entire life- cycle, since its inception and to the long-term storage.
  • 19. The data life-cycle 1. Events are happening — (real-time) 2. Streaming Analytics — (< 1 second) 3. Data replay — (1 hour / days) 4. Long term storage and Batch Processing — (days / months)
  • 20. Managing the data life-cycle 1. Store the data only once 2. Make it available to all interested parties 3. Able to hold the data for extended time
  • 21. Managing the data life-cycle Pulsar provides infinite stream-storage abstraction a. Low latency writes b. Isolation between tail read and catch-up c. Long term tiered storage
  • 22. Isolating the different workloads Every aspect of Pulsar is designed for multi- tenancy and multi-workloads: ● IO Isolation ● Limits to access to resources: throttling, quotas, etc..
  • 23. Putting it all together
  • 24. Conclusion Pulsar is uniquely suited to be the data platform that powers all the data in motion use cases and all their interactions