SlideShare a Scribd company logo
Ricardo Paiva
First impressions of Apache Pulsar features from
someone that have never used it. :)
Apache Pulsar
First Overview
Motivation
3 •
Kafka is an amazing tool, with increadible througput and resilience, but it has some
drawbacks or lacks few features:
 Capacity of a partition is limited by the smallest node
 Ops - Add/remove a new broker requires cluster rebalancing
 No long term storage
 Only sub/pub client pattern (no work queue)
 No namespace or tenancy management
 No multi-cluster replication
Motivation
Key concepts
5 •
Tiered Storage
Uses Apache Jclouds
6 •
Multi-tenant and Namespace
Pulsar Components
8 •
Brokers
9 •
Bookies
10 •
Producer
11 •
Consumer
12 •
Zookeeper
13 •
 It uses BookKeeper but other schema registry can be plugged
 Can be uploaded when a typed Producer is created or via REST API
 Versioned
 Defined at topic level
 Format types:
 String (used for UTF-8-encoded strings)
 JSON
 Protobuf
 Avro
 Only works with Java
Schema Registry
Subscription modes
15 •
Message Acknowledgment
16 •
 Message Retention
 Applies to messages that are marked as acknowledged and set to be deleted
 It’s a time limit applied on a topic whereas.
 TTL
 Applies to messages that were not consumed
 It’s a time limit on consumption with a subscription.
Retention
17 •
Exclusive
18 •
Failover
19 •
Shared (Working queue)
 Message ordering is not guaranteed.
 You cannot use cumulative acknowledgment with shared mode.
Internals
21 •
Bookie Storage
22 •
Cold storage
23 •
SQL with Presto
Other features
25 •
Geo Replication (Sync)
 Requires global Zookeeper installation
 Region Aware Placement Policy
 Higher latency
26 •
Geo Replication (ASync)
 Rack Aware Placement Policy
 First persisted to the local cluster and
then replicated asynchronously to the
remote clusters
 Enabled on a per-tenant basis
 Types:
 master-slave replication
 active-active bidirectional
replication
 full-mesh replication between
multiple data centers
27 •
 Per producer/topic sequence numbers to detect duplicates
 Each topic owner broker maintains an in-memory hashmap of the latest sequence number
per topic/producer.
 The broker periodically snapshots the latest sequence number to a cursor, which allows the
map to be reconstructed by another broker after a fail-over.
Deduplication
https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar
28 •
 Lightweight compute framework
for Pulsar
 Can run inside or outside the
cluster
 State storage is handled by
BookKeeper
 "Serverless" idea
Pulsar Functions

More Related Content

PDF
Apache Pulsar Overview
PDF
Fundamentals of Apache Kafka
PDF
Kafka 101 and Developer Best Practices
PDF
Apache Kafka Introduction
PPTX
A visual introduction to Apache Kafka
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Producer Performance Tuning for Apache Kafka
PPTX
Apache Kafka Best Practices
Apache Pulsar Overview
Fundamentals of Apache Kafka
Kafka 101 and Developer Best Practices
Apache Kafka Introduction
A visual introduction to Apache Kafka
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Producer Performance Tuning for Apache Kafka
Apache Kafka Best Practices

What's hot (20)

ODP
Stream processing using Kafka
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PPTX
Netflix Data Pipeline With Kafka
PPTX
Deep Dive into Apache Kafka
PDF
ksqlDB: A Stream-Relational Database System
PPTX
Kafka presentation
PDF
PDF
How to tune Kafka® for production
PPTX
Apache Kafka at LinkedIn
PDF
Introduction to apache kafka
PPTX
Autoscaling Flink with Reactive Mode
PDF
Apache Kafka - Martin Podval
PDF
When NOT to use Apache Kafka?
PPTX
PPTX
Kafka 101
PPTX
Apache kafka
PPTX
Apache Flink and what it is used for
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
PPTX
Improving Kafka at-least-once performance at Uber
Stream processing using Kafka
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Netflix Data Pipeline With Kafka
Deep Dive into Apache Kafka
ksqlDB: A Stream-Relational Database System
Kafka presentation
How to tune Kafka® for production
Apache Kafka at LinkedIn
Introduction to apache kafka
Autoscaling Flink with Reactive Mode
Apache Kafka - Martin Podval
When NOT to use Apache Kafka?
Kafka 101
Apache kafka
Apache Flink and what it is used for
Apache Kafka Architecture & Fundamentals Explained
How Uber scaled its Real Time Infrastructure to Trillion events per day
Improving Kafka at-least-once performance at Uber
Ad

Similar to Apache Pulsar First Overview (20)

PDF
Linked In Stream Processing Meetup - Apache Pulsar
PDF
Hands-on Workshop: Apache Pulsar
PDF
Kafka in action - Tech Talk - Paytm
PPTX
Fundamentals and Architecture of Apache Kafka
PDF
Apache KAfka
PDF
High performance messaging with Apache Pulsar
PDF
PDF
Ippevent : openshift Introduction
PDF
Pulsar - Distributed pub/sub platform
PDF
Cloud Messaging Service: Technical Overview
PDF
Pulsar - flexible pub-sub for internet scale
PDF
Non-Kafkaesque Apache Kafka - Yottabyte 2018
PDF
Best practices for MySQL/MariaDB Server/Percona Server High Availability
PDF
A day in the life of a log message
PPTX
Apache Kafka
PDF
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+...
PDF
Apache pulsar - storage architecture
PDF
Large Scale Computing Infrastructure - Nautilus
PPTX
Open stack ha design & deployment kilo
PDF
Apache Kafka - Free Friday
Linked In Stream Processing Meetup - Apache Pulsar
Hands-on Workshop: Apache Pulsar
Kafka in action - Tech Talk - Paytm
Fundamentals and Architecture of Apache Kafka
Apache KAfka
High performance messaging with Apache Pulsar
Ippevent : openshift Introduction
Pulsar - Distributed pub/sub platform
Cloud Messaging Service: Technical Overview
Pulsar - flexible pub-sub for internet scale
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Best practices for MySQL/MariaDB Server/Percona Server High Availability
A day in the life of a log message
Apache Kafka
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+...
Apache pulsar - storage architecture
Large Scale Computing Infrastructure - Nautilus
Open stack ha design & deployment kilo
Apache Kafka - Free Friday
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MYSQL Presentation for SQL database connectivity
Encapsulation_ Review paper, used for researhc scholars

Apache Pulsar First Overview

  • 1. Ricardo Paiva First impressions of Apache Pulsar features from someone that have never used it. :) Apache Pulsar First Overview
  • 3. 3 • Kafka is an amazing tool, with increadible througput and resilience, but it has some drawbacks or lacks few features:  Capacity of a partition is limited by the smallest node  Ops - Add/remove a new broker requires cluster rebalancing  No long term storage  Only sub/pub client pattern (no work queue)  No namespace or tenancy management  No multi-cluster replication Motivation
  • 5. 5 • Tiered Storage Uses Apache Jclouds
  • 13. 13 •  It uses BookKeeper but other schema registry can be plugged  Can be uploaded when a typed Producer is created or via REST API  Versioned  Defined at topic level  Format types:  String (used for UTF-8-encoded strings)  JSON  Protobuf  Avro  Only works with Java Schema Registry
  • 16. 16 •  Message Retention  Applies to messages that are marked as acknowledged and set to be deleted  It’s a time limit applied on a topic whereas.  TTL  Applies to messages that were not consumed  It’s a time limit on consumption with a subscription. Retention
  • 19. 19 • Shared (Working queue)  Message ordering is not guaranteed.  You cannot use cumulative acknowledgment with shared mode.
  • 23. 23 • SQL with Presto
  • 25. 25 • Geo Replication (Sync)  Requires global Zookeeper installation  Region Aware Placement Policy  Higher latency
  • 26. 26 • Geo Replication (ASync)  Rack Aware Placement Policy  First persisted to the local cluster and then replicated asynchronously to the remote clusters  Enabled on a per-tenant basis  Types:  master-slave replication  active-active bidirectional replication  full-mesh replication between multiple data centers
  • 27. 27 •  Per producer/topic sequence numbers to detect duplicates  Each topic owner broker maintains an in-memory hashmap of the latest sequence number per topic/producer.  The broker periodically snapshots the latest sequence number to a cursor, which allows the map to be reconstructed by another broker after a fail-over. Deduplication https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar
  • 28. 28 •  Lightweight compute framework for Pulsar  Can run inside or outside the cluster  State storage is handled by BookKeeper  "Serverless" idea Pulsar Functions

Editor's Notes

  • #2: Do quick presentation of each other short agenda (first kafka basics + seconds design choice that made it a great tool for our scale)