Apache PulsarFirst Overview

Ricardo Paiva
First impressions of Apache Pulsar features from
someone that have never used it. :)
Apache Pulsar
First Overview

3 •
Kafka is an amazing tool, with increadible througput and resilience, but it has some
drawbacks or lacks few features:
 Capacity of a partition is limited by the smallest node
 Ops - Add/remove a new broker requires cluster rebalancing
 No long term storage
 Only sub/pub client pattern (no work queue)
 No namespace or tenancy management
 No multi-cluster replication
Motivation

5 •
Tiered Storage
Uses Apache Jclouds

6 •
Multi-tenant and Namespace

13 •
 It uses BookKeeper but other schema registry can be plugged
 Can be uploaded when a typed Producer is created or via REST API
 Versioned
 Defined at topic level
 Format types:
 String (used for UTF-8-encoded strings)
 JSON
 Protobuf
 Avro
 Only works with Java
Schema Registry

16 •
 Message Retention
 Applies to messages that are marked as acknowledged and set to be deleted
 It’s a time limit applied on a topic whereas.
 TTL
 Applies to messages that were not consumed
 It’s a time limit on consumption with a subscription.
Retention

19 •
Shared (Working queue)
 Message ordering is not guaranteed.
 You cannot use cumulative acknowledgment with shared mode.

25 •
Geo Replication (Sync)
 Requires global Zookeeper installation
 Region Aware Placement Policy
 Higher latency

26 •
Geo Replication (ASync)
 Rack Aware Placement Policy
 First persisted to the local cluster and
then replicated asynchronously to the
remote clusters
 Enabled on a per-tenant basis
 Types:
 master-slave replication
 active-active bidirectional
replication
 full-mesh replication between
multiple data centers

27 •
 Per producer/topic sequence numbers to detect duplicates
 Each topic owner broker maintains an in-memory hashmap of the latest sequence number
per topic/producer.
 The broker periodically snapshots the latest sequence number to a cursor, which allows the
map to be reconstructed by another broker after a fail-over.
Deduplication
https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar

28 •
 Lightweight compute framework
for Pulsar
 Can run inside or outside the
cluster
 State storage is handled by
BookKeeper
 "Serverless" idea
Pulsar Functions

Apache PulsarFirst Overview

More Related Content

What's hot (20)

Similar to Apache PulsarFirst Overview (20)

Recently uploaded (20)