Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Summit NA 2021 Keynote

Pulsar Virtual Summit North America 2021
Apache Pulsar:
Why Unified Messaging
and Streaming Is the
Future
Matteo Merli, Sijie Guo
@ Pulsar PMC

Who are we?
● Sijie Guo (@sijieg)
● CEO, StreamNative
● PMC Member of Pulsar/BookKeeper
● Ex Co-Founder, Streamlio
● Ex Twitter
● Matteo Merli (@merlimat)
● CTO, StreamNative
● Co-creator and PMC chair of Pulsar
● Ex Co-Founder, Streamlio
● Ex Yahoo!

StreamNative
Founded by the creators of Apache Pulsar, StreamNative provides a
cloud-native, unified messaging and streaming platform powered by
Apache Pulsar to support multi-cloud and hybrid-cloud strategies

Announcing StreamNative Platform 1.0
✓ Pulsar Transactions
✓ Kafka-on-Pulsar
✓ Function Mesh for serverless streaming
✓ Enterprise-ready security
✓ Pulsar Operators
✓ Seamless StreamNative Cloud experience

Pulsar Trends
Kafka -> Pulsar
Scale Cloud-Native
Pulsar + Flink

Pulsar at Scale
More companies in Production

Pulsar at Scale
Hit Trillion Messages Per Day

Cloud-Native
Kubernetes Drive Adoption of Pulsar
✓ 80% of Pulsar users deploy Pulsar in a cloud environment
✓ 62% of Pulsar users deploy Pulsar on Kubernetes
✓ 49% noted Pulsar’s Cloud-Native capabilities as one of the
top reasons they chose to adopt Pulsar

Cloud-Native
Built for Kubernetes
Containers
Cloud Native
Hybrid & MultiCloud
● Single Cloud Provider
● Monolithic
Architectures
● Single Tenant Systems
● No Geo-replication
VM / Early Cloud Era Containers / Modern Cloud Era
Microservices

Pulsar + Flink
Unified Stream and Batch

Kafka to Pulsar
More and More Kafka Users Adopt Pulsar
✓ 68% of respondents use Kafka in addition to Pulsar
✓ 34% of respondents use or plan to use Kafka-on-Pulsar
✓ Kafka and Pulsar serve different use cases
✓ Once adopted, Pulsar usage expands across organizations

Pulsar Adoption Use Cases
Adopted Pulsar to replace Kafka
in their DSP (Data Streaming
Platform).
● 1.5-2x lower in capex cost
● 5-50x improvement in
latency
● 2-3x lower in opex due
● 10 PB / day
Adopted Pulsar to power their
billing platform, Midas, which
processing hundreds of billions
of financial transactions daily.
Adoption then expanded to
Tencent’s Federated Learning
Platform and Tencent Gaming.
Use cases require a scalable
message queue for serving
mission-critical business
applications to replace
RabbitMQ.
In the process of expanding use
cases to build data streaming
services

Messaging
● Queueing systems are ideal for work
queues that do not require tasks to
be performed in a particular order—
for example, sending one email
message to many recipients.
● RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Streaming
● Streaming works best in situations
where the order of messages is
important—for example, data
ingestion.
● Kafka and Amazon Kinesis are
examples of messaging systems that
use streaming semantics for
consuming messages.
Data in motion

E-Commerce w/o Pulsar
✓ Separate storage
✓ Tiering outside toolset
✓ Separate application and
data domains
✓ Different tech stacks

Why not a system that is
able to support messaging
and streaming?

E-Commerce with Pulsar
✓ Unified storage for in-
motion data
✓ Native tiered storage
✓ Single system to
exchange data
✓ Teams share toolset

Build Apache Pulsar for
unified messaging and
streaming

Step 1: A scalable storage for streams of data

Step 2: Separate serving from storage
Apache Pulsar
Apache BookKeeper
Broker 0
Producer Consumer
Broker 1 Broker 2
Bookie
0
Bookie
1
Bookie
2
Bookie
3
Bookie
4

Step 3: Unified API
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consumer D-2
Consumer D-3
Subscription D
Key-Shared
Consumer C-1
Consumer C-2
Consumer C-3
Subscription C
m1
m2
m3
m4
m0
Shared
Failover
Consumer B-1
Consumer B-0
Subscription B
m1
m2
m3
m4
m0
In case of failure
in Consumer B-0
Consumer A-1
Consumer A-0
Subscription A
m1
m2
m3
m4
m0
Exclusive
X

Reader and
Batch API
Pub/Sub
API
Publisher
Subscriber
Step 3:
Unified API
Stream Processor
Applications
Microservices or
Event-Driven Architecture

Step 4:
Schema
API
Reader and
Batch API
Pub/Sub
API
Publisher
Subscriber
Stream Processor
Applications
Microservices or
Schema
API
Schema API

Step 5:
Functions
and IO API
Reader and
Batch API
Pub/Sub
API
Publisher
Subscriber
Stream Processor
Applications
Microservices or
Schema
API
Schema API
Functions
API
Pulsar
IO/Connectors
Prebuilt Connectors
Custom Connectors

Step 6:
Tiered
Storage
Reader and
Batch API
Pub/Sub
API
Publisher
Subscriber
Stream Processor
Applications
Microservices or
Schema
API
Schema API
Functions
API
Pulsar
IO/Connectors
Prebuilt Connectors
Custom Connectors
Tiered Storage

Step 7: Protocol Handlers
Apache Pulsar
Pulsar Protocol
Handler
Pulsar Clients
(queue + stream)
Kafka Protocol
Handler
AMQP Protocol
Handler
MQTT Protocol
Handler
Kafka Clients AMQP Clients MQTT Clients

Reader and
Batch API
Pub/Sub
API
Publisher
Subscriber
Stream Processor
Applications
Microservices or
Schema
API
Schema API
Functions
API
Pulsar
IO/Connectors
Prebuilt Connectors
Custom Connectors
Tiered Storage
Step 8:
Transaction
API
Transaction
API

Pulsar 2.8 towards a
complete vision of unified
messaging and streaming

Towards a self-adjusting
data platform
✓ Tuning data platforms to run at scale is hard
✓ Lots of configurations
✓ Requires in-depth knowledge of internals
✓ Workloads are constantly changing

Topic auto-partitioning
✓ Partitions are an artifact of implementation
✓ It’s not a natural property of the data
✓ Abstract the partitioning away from users
✓ Partitions are automatically split / merged based
✓ Rethink how an API should look like

Self-Adjusting Storage
✓ Ensure most optimal utilization of hardware
✓ No configuration
✓ Automatically adjust strategies based on changing
condition:
✓ Disk access
✓ Cache management
✓ Queue sizes

Pulsar Functions
✓ The foundation is now mature — UX is still poor
✓ Simpler tooling to create & manage functions
✓ CI/CD integration — Versioning — A/B testing
✓ Observability & Debuggability
✓ Improve support for Go and Python functions
✓ DSL — Provide higher level constructs to process data

Stream Storage
✓ Evolve the current state of Tiered Storage
✓ Integrate with data lake technologies

Working with the data
community

Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Summit NA 2021 Keynote

More Related Content

What's hot (20)

Similar to Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Summit NA 2021 Keynote (20)

More from StreamNative (20)

Recently uploaded (20)

Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Summit NA 2021 Keynote

Editor's Notes