SlideShare a Scribd company logo
Matteo Merli
fast, durable, flexible pub/sub messaging
Introduction
2
One sentence definition for Apache Pulsar:
“Flexible pub-sub system backed by a durable log storage”
• Easy to use API — Support both Queuing and Streaming
• Strong storage guarantees — Durability, latency,
scalability
Pulsar architecture basics
3
• Brokers — Serving nodes
• Bookies (Apache BookKeeper) — Storage nodes
• Each layer can be scaled independently
• No data locality — Data for a single topic/partition is not
tied to any particular node
Pulsar architecture basics
4
Considerations
• Stateful systems can become unbalanced when traffic
changes
• The system needs to be designed to allow for quick
reaction, distributing the load across all nodes
5
Pulsar broker
• Broker is the only point of interaction for clients
• Brokers acquire ownership of group of topics and
“serve” them
• Broker has no durable state
• There’s a service discovery mechanism for client to
connect to right broker
6
Pulsar broker
7
Segment centric storage
• Storage for a topic is an infinite “stream” of messages
• Implemented as a sequence of segments
• Each segment is a replicated log — BookKeeper “ledger”
• Segments are rolled over based on time, size and after
crashes
8
Segment centric storage
9
Broker failure recovery
10
• Topic is reassigned to
an available broker
based on load
• Can reconstruct the
previous state
consistently
• No data needs to be
copied
• Failover handled
transparently by client
library
Bookie failure recovery
11
• After a write failure,
BookKeeper will
immediately switch write to
a new bookie, within the
same segment.
• As long as we have any 3
bookies in the cluster, we
can continue to write
Bookie failure recovery
12
• In background, starts a
many-to-many recovery
process to regain the
configured replication
factor
Seamless cluster expansion
13
Why should I care?
“Segment centric” vs “Partition centric”
14
Comparison with Apache Kafka
15
Comparison with Apache Kafka
• In Kafka, partitions are assigned to brokers “permanently”
• A single partition is stored entirely in a single node
• Retention is limited by a single node storage capacity
• Failure recovery and capacity expansion require
“rebalancing”
• Rebalancing has a big impact over the system, affecting
regular traffic
16
Recap
Advantages of segment-centric architecture:
• Unbounded log storage
• Instant scaling without data rebalancing
• Fast replica repair
• High write and read availability via maximized data
placement options
17
Q & A
Thank You
http://pulsar.incubator.apache.org
18

More Related Content

PDF
Apache Pulsar Overview
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
Kafka presentation
PDF
Pulsar - Distributed pub/sub platform
PPTX
Improving Kafka at-least-once performance at Uber
PPTX
Apache Pulsar First Overview
PPTX
No data loss pipeline with apache kafka
Apache Pulsar Overview
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Apache Kafka Fundamentals for Architects, Admins and Developers
Kafka presentation
Pulsar - Distributed pub/sub platform
Improving Kafka at-least-once performance at Uber
Apache Pulsar First Overview
No data loss pipeline with apache kafka

What's hot (20)

PDF
Apache Kafka - Martin Podval
PPTX
FeatHub_FFA_2022
PDF
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
PPTX
Apache Kafka
PPTX
Kafka 101
PDF
PDF
SeaweedFS introduction
PDF
Introduction to apache kafka
PDF
NiFi 시작하기
PDF
Introduction to Apache Kafka
PDF
When apache pulsar meets apache flink
PDF
Fundamentals of Apache Kafka
PDF
SpringBoot and Spring Cloud Service for MSA
PDF
Redis cluster
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Kafka Tutorial: Advanced Producers
PDF
Zynq MPSoC勉強会 Codec編
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
PDF
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Apache Kafka - Martin Podval
FeatHub_FFA_2022
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Apache Kafka
Kafka 101
SeaweedFS introduction
Introduction to apache kafka
NiFi 시작하기
Introduction to Apache Kafka
When apache pulsar meets apache flink
Fundamentals of Apache Kafka
SpringBoot and Spring Cloud Service for MSA
Redis cluster
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Kafka Tutorial: Advanced Producers
Zynq MPSoC勉強会 Codec編
APACHE KAFKA / Kafka Connect / Kafka Streams
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ad

Similar to Apache pulsar - storage architecture (20)

PDF
Linked In Stream Processing Meetup - Apache Pulsar
PPTX
Fundamentals and Architecture of Apache Kafka
PDF
Cosmos DB at VLDB 2019
PDF
Apache Kafka Introduction
PDF
Hands-on Workshop: Apache Pulsar
PDF
Introducing Oxia: A Scalable Zookeeper Alternative
PPTX
Unleashing Real-time Power with Kafka.pptx
PDF
Pulsar - flexible pub-sub for internet scale
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
PDF
A Closer Look at Apache Kudu
PDF
System design fundamentals CAP.pdf
PPTX
[Hanoi-August 13] Tech Talk on Caching Solutions
PDF
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
PDF
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
PDF
High performance messaging with Apache Pulsar
PDF
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
PPTX
Microservices deck
PDF
Introduction_to_Kafka - A brief Overview.pdf
PDF
A brief introduction to Kubernetes k8s.pdf
Linked In Stream Processing Meetup - Apache Pulsar
Fundamentals and Architecture of Apache Kafka
Cosmos DB at VLDB 2019
Apache Kafka Introduction
Hands-on Workshop: Apache Pulsar
Introducing Oxia: A Scalable Zookeeper Alternative
Unleashing Real-time Power with Kafka.pptx
Pulsar - flexible pub-sub for internet scale
From cache to in-memory data grid. Introduction to Hazelcast.
Building High-Throughput, Low-Latency Pipelines in Kafka
A Closer Look at Apache Kudu
System design fundamentals CAP.pdf
[Hanoi-August 13] Tech Talk on Caching Solutions
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
High performance messaging with Apache Pulsar
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Microservices deck
Introduction_to_Kafka - A brief Overview.pdf
A brief introduction to Kubernetes k8s.pdf
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Welding lecture in detail for understanding
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Sustainable Sites - Green Building Construction
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPT
Project quality management in manufacturing
PDF
composite construction of structures.pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Welding lecture in detail for understanding
Mechanical Engineering MATERIALS Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Sustainable Sites - Green Building Construction
Lesson 3_Tessellation.pptx finite Mathematics
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Project quality management in manufacturing
composite construction of structures.pdf
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Arduino robotics embedded978-1-4302-3184-4.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
Internet of Things (IOT) - A guide to understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................

Apache pulsar - storage architecture

  • 1. Matteo Merli fast, durable, flexible pub/sub messaging
  • 2. Introduction 2 One sentence definition for Apache Pulsar: “Flexible pub-sub system backed by a durable log storage” • Easy to use API — Support both Queuing and Streaming • Strong storage guarantees — Durability, latency, scalability
  • 3. Pulsar architecture basics 3 • Brokers — Serving nodes • Bookies (Apache BookKeeper) — Storage nodes • Each layer can be scaled independently • No data locality — Data for a single topic/partition is not tied to any particular node
  • 5. Considerations • Stateful systems can become unbalanced when traffic changes • The system needs to be designed to allow for quick reaction, distributing the load across all nodes 5
  • 6. Pulsar broker • Broker is the only point of interaction for clients • Brokers acquire ownership of group of topics and “serve” them • Broker has no durable state • There’s a service discovery mechanism for client to connect to right broker 6
  • 8. Segment centric storage • Storage for a topic is an infinite “stream” of messages • Implemented as a sequence of segments • Each segment is a replicated log — BookKeeper “ledger” • Segments are rolled over based on time, size and after crashes 8
  • 10. Broker failure recovery 10 • Topic is reassigned to an available broker based on load • Can reconstruct the previous state consistently • No data needs to be copied • Failover handled transparently by client library
  • 11. Bookie failure recovery 11 • After a write failure, BookKeeper will immediately switch write to a new bookie, within the same segment. • As long as we have any 3 bookies in the cluster, we can continue to write
  • 12. Bookie failure recovery 12 • In background, starts a many-to-many recovery process to regain the configured replication factor
  • 14. Why should I care? “Segment centric” vs “Partition centric” 14
  • 16. Comparison with Apache Kafka • In Kafka, partitions are assigned to brokers “permanently” • A single partition is stored entirely in a single node • Retention is limited by a single node storage capacity • Failure recovery and capacity expansion require “rebalancing” • Rebalancing has a big impact over the system, affecting regular traffic 16
  • 17. Recap Advantages of segment-centric architecture: • Unbounded log storage • Instant scaling without data rebalancing • Fast replica repair • High write and read availability via maximized data placement options 17
  • 18. Q & A Thank You http://pulsar.incubator.apache.org 18