SlideShare a Scribd company logo
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or
distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
Pulsar Storage on BookKeeper
Seamless Evolution
June 17, 2020
Joe Francis joef@verizonmedia.com
Rajan Dhabalia rdhabalia@verizonmedia.com
Speakers
2
Joe Francis
Director, Verizon Media
Rajan Dhabalia
Principal Software Engineer, Verizon Media
Agenda
● Pulsar in Verizon Media
● Benchmarking for production use
● Pulsar IO Isolation
● BookKeeper with different storage devices
● Case-study: Kafka use case on Pulsar
● Future
3
Verizon Media & Pulsar
● Developed as a hosted pub-sub service within Yahoo/VMG
○ open-sourced in 2016
● Global deployment
○ 6 DC (Asia, Europe, US)
○ full mesh replication
● Mission critical use cases
○ Serving applications
○ Lower latency bus for use by other low latency services
○ Write availability
4
● Most benchmark numbers do not test production scenarios
○ Messaging systems work well when
■ data fits in memory
■ no disk I/O in critical path (write or read)
● Pulsar was designed to work well under real world work load..
○ Lagging consumers, replay
■ Backlog read from disks will occur.
○ Disks and brokers crash/fail
■ Pulsar ack guarantee: data is synced to disk on 2+ hosts
○ Latencies remain unaffected by load variations
■ backlog reads (I/O isolation)
■ failures (instantaneous recovery)
● Cost matters
○ Compute ($) vs Storage ($$)
● Benchmark for production use !!!
5
Benchmarking for production
6
Data paths
RAM
Journal
Data
Broker
( Cache: RAM)
Bookie
Application
Producer
ackack
RAM
Journal
Data
Bookie
ack
Application
Consumer
Application
Consumer
Cold
Reads
BookKeeper IO Isolation
7
Pulsar Journey
8
- HDD
- Fast low latency sequential writes on HDD with battery backed RAID controller
- Random seek time is much longer for HDD
- Economical
- Journal Device
- Fast sequential writes
- Ledger Device
- Sequential writes on single entry-log data file for multiple streams
- Most of the IOPs is utilized for
- Backlog draining (cold reads)
- Reads and writes on Index files
First Generation Storage - HDD
9
- JOURNAL-Device HDD with RAID10
- DATA-Device HDD with RAID10
- Index: Interleaved index files
Optimizing random IOs for Indexing
10
- Index on interleaved file
- One index file for each topic
- Random IO while updating index
- Scaling number of topics increases random IOs and file handles
- Index on Rocks DB
- LSM based embedded key-value store
- Used as a library within bookie process; no additional operational efforts
- Less write-amplification and better compression
- Drastically reduces random IOPs for indexing
- Small footprint ( < 10 GB); mostly in RAM
Second Generation: SSD/NVMe
11
SSD/NVMe
- SSD provides better performance for sequential and random I/O
- NVMe supports large command queue (64K) with parallel IO
Journal Device
- Bookie can use multiple journal directories to utilize parallel write on NVMe
- Achieve 3x Pulsar throughput with low latency, compared to HDD
Ledger Device
- Significantly faster random reads than HDD
- Faster backlog draining while doing cold reads for multiple topics
- JOURNAL-Device NVMe/SSD
- DATA-Device NVMe/SSD
- Index: RocksDB
Storage Device: Sequential Vs Random IO
12
Storage Device: Performance Vs Cost
13
Storage Evolution & Pulsar Adaptation: PMEM
14
PMEM
● Highest performing block storage device
● Ultra fast, super high throughput with consistent low latency
● Expensive; well suited as small device for WRITE intensive use cases
Journal Device
● WAL/journal is proven design in Databases
○ transactional storage and recovery
○ high throughput
● Write optimized append only structure
● Does not require much storage and keeps short lived transactional data
● Using PMEM for journal device
○ adds < 5% cost for each bookie
○ Increases Pulsar throughput 5x times, and with low publish latency
Pulsar Performance with Different BK-Journal Device
15
Performance configuration
● Enabled fsync on every
published message
● Publish throughput with
backlog draining
● SLA: 5ms (99%lie latency)
○ HDD: 120MB
○ SSD: 200MB
○ NVMe: 350MB
○ PMEM: 600MB
● Cost and Throughput
■ Using PMEM for journal adds < 5% more cost per host but reduce
overall cost and cluster footprints
■ Achieve 5x more throughput with 99%-ile @ <5ms write latency
● Cluster footprint
■ Kafka cluster : 33 Kafka Brokers
■ Pulsar cluster: 10 bookies and 16 brokers
● Pulsar broker is a stateless component and costs 1/4x than
bookie
■ Overall Pulsar cluster resources ½ of the Kafka cluster
Case-study: Migrate Kafka Use Case to Pulsar
16
Case-study: Migrate Kafka Use Case to Pulsar
17
USE CASES APACHE PULSAR APACHE KAFKA
Throughput with low latency
Cost
Geo-replication
Queuing
Committing messages
Future
● Use PMDK API to access persistent memory
○ bypass the file system
○ better throughput
● Tiered Storage for historical data use cases
○ relaxed latency requirements
○ cheaper cost
○ Use cases
■ ML model training
■ audit, forensics
18
Thank you
Joe Francis joef@verizonmedia.com
Rajan Dhabalia rdhabalia@verizonmedia.com

More Related Content

PPTX
Ceph and Openstack in a Nutshell
PDF
Scalar DB: A library that makes non-ACID databases ACID-compliant
PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
PDF
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
PDF
AIOUG : OTNYathra - Troubleshooting and Diagnosing Oracle Database 12.2 and O...
PDF
Deploying openstack using ansible
PPTX
Apache Kafka at LinkedIn
Ceph and Openstack in a Nutshell
Scalar DB: A library that makes non-ACID databases ACID-compliant
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
AIOUG : OTNYathra - Troubleshooting and Diagnosing Oracle Database 12.2 and O...
Deploying openstack using ansible
Apache Kafka at LinkedIn

What's hot (20)

PDF
Producer Performance Tuning for Apache Kafka
PPTX
A Deep Dive into Kafka Controller
PDF
What's Coming in CloudStack 4.19
PDF
InnoDB Internal
PPTX
Kubernetes #6 advanced scheduling
PDF
QEMU Disk IO Which performs Better: Native or threads?
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PDF
Real life challenges and configurations when implementing HCL Sametime v12.0....
PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
PDF
Ceph Block Devices: A Deep Dive
PDF
The InnoDB Storage Engine for MySQL
PDF
Blazing Performance with Flame Graphs
PDF
PostgreSQL and RAM usage
PPTX
Netflix Data Pipeline With Kafka
ODP
Stream processing using Kafka
PDF
DPDK: Multi Architecture High Performance Packet Processing
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PDF
AWS Certified SysOps Administrator Official Study Guide.pdf
PDF
Optimizing RocksDB for Open-Channel SSDs
Producer Performance Tuning for Apache Kafka
A Deep Dive into Kafka Controller
What's Coming in CloudStack 4.19
InnoDB Internal
Kubernetes #6 advanced scheduling
QEMU Disk IO Which performs Better: Native or threads?
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Real life challenges and configurations when implementing HCL Sametime v12.0....
Linux Memory Management with CMA (Contiguous Memory Allocator)
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Ceph Block Devices: A Deep Dive
The InnoDB Storage Engine for MySQL
Blazing Performance with Flame Graphs
PostgreSQL and RAM usage
Netflix Data Pipeline With Kafka
Stream processing using Kafka
DPDK: Multi Architecture High Performance Packet Processing
Where is my bottleneck? Performance troubleshooting in Flink
AWS Certified SysOps Administrator Official Study Guide.pdf
Optimizing RocksDB for Open-Channel SSDs
Ad

Similar to Pulsar Storage on BookKeeper _Seamless Evolution (20)

PPTX
How Big Data is Transforming the Data Center
PDF
Hands-on Workshop: Apache Pulsar
PDF
OSS Presentation by Kevin Halgren
PDF
Infoboom future-storage-aug2011-v3
PDF
Infoboom future-storage-aug2011-v3
PDF
Scaling Apache Pulsar to 10 Petabytes/Day
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
PDF
In-Memory Databases, Trends and Technologies (2012)
PPTX
Flash memory summit enterprise udate 2019
PDF
Scalar Decisions: Emerging Trends and Technologies in Storage
PDF
lessons from managing a pulsar cluster
PDF
NVMe over Fibre Channel Introduction
PPTX
How Pulsar Stores Your Data - Pulsar Summit NA 2021
PPTX
Storage Benchmark Kit
PPTX
hStorage-DB
PDF
Scaling Apache Pulsar to 10 PB/day
PDF
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
PDF
Interactive querying of streams using Apache Pulsar_Jerry peng
PDF
BM Brings Enterprise Functionality to Mid-Range Storage
How Big Data is Transforming the Data Center
Hands-on Workshop: Apache Pulsar
OSS Presentation by Kevin Halgren
Infoboom future-storage-aug2011-v3
Infoboom future-storage-aug2011-v3
Scaling Apache Pulsar to 10 Petabytes/Day
Lessons from managing a Pulsar cluster (Nutanix)
HPC DAY 2017 | HPE Storage and Data Management for Big Data
In-Memory Databases, Trends and Technologies (2012)
Flash memory summit enterprise udate 2019
Scalar Decisions: Emerging Trends and Technologies in Storage
lessons from managing a pulsar cluster
NVMe over Fibre Channel Introduction
How Pulsar Stores Your Data - Pulsar Summit NA 2021
Storage Benchmark Kit
hStorage-DB
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Interactive querying of streams using Apache Pulsar_Jerry peng
BM Brings Enterprise Functionality to Mid-Range Storage
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Global journeys: estimating international migration
PDF
Foundation of Data Science unit number two notes
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
.pdf is not working space design for the following data for the following dat...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Moving the Public Sector (Government) to a Digital Adoption
Introduction-to-Cloud-ComputingFinal.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
Global journeys: estimating international migration
Foundation of Data Science unit number two notes
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx

Pulsar Storage on BookKeeper _Seamless Evolution

  • 1. Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. Pulsar Storage on BookKeeper Seamless Evolution June 17, 2020 Joe Francis joef@verizonmedia.com Rajan Dhabalia rdhabalia@verizonmedia.com
  • 2. Speakers 2 Joe Francis Director, Verizon Media Rajan Dhabalia Principal Software Engineer, Verizon Media
  • 3. Agenda ● Pulsar in Verizon Media ● Benchmarking for production use ● Pulsar IO Isolation ● BookKeeper with different storage devices ● Case-study: Kafka use case on Pulsar ● Future 3
  • 4. Verizon Media & Pulsar ● Developed as a hosted pub-sub service within Yahoo/VMG ○ open-sourced in 2016 ● Global deployment ○ 6 DC (Asia, Europe, US) ○ full mesh replication ● Mission critical use cases ○ Serving applications ○ Lower latency bus for use by other low latency services ○ Write availability 4
  • 5. ● Most benchmark numbers do not test production scenarios ○ Messaging systems work well when ■ data fits in memory ■ no disk I/O in critical path (write or read) ● Pulsar was designed to work well under real world work load.. ○ Lagging consumers, replay ■ Backlog read from disks will occur. ○ Disks and brokers crash/fail ■ Pulsar ack guarantee: data is synced to disk on 2+ hosts ○ Latencies remain unaffected by load variations ■ backlog reads (I/O isolation) ■ failures (instantaneous recovery) ● Cost matters ○ Compute ($) vs Storage ($$) ● Benchmark for production use !!! 5 Benchmarking for production
  • 6. 6 Data paths RAM Journal Data Broker ( Cache: RAM) Bookie Application Producer ackack RAM Journal Data Bookie ack Application Consumer Application Consumer Cold Reads
  • 9. - HDD - Fast low latency sequential writes on HDD with battery backed RAID controller - Random seek time is much longer for HDD - Economical - Journal Device - Fast sequential writes - Ledger Device - Sequential writes on single entry-log data file for multiple streams - Most of the IOPs is utilized for - Backlog draining (cold reads) - Reads and writes on Index files First Generation Storage - HDD 9 - JOURNAL-Device HDD with RAID10 - DATA-Device HDD with RAID10 - Index: Interleaved index files
  • 10. Optimizing random IOs for Indexing 10 - Index on interleaved file - One index file for each topic - Random IO while updating index - Scaling number of topics increases random IOs and file handles - Index on Rocks DB - LSM based embedded key-value store - Used as a library within bookie process; no additional operational efforts - Less write-amplification and better compression - Drastically reduces random IOPs for indexing - Small footprint ( < 10 GB); mostly in RAM
  • 11. Second Generation: SSD/NVMe 11 SSD/NVMe - SSD provides better performance for sequential and random I/O - NVMe supports large command queue (64K) with parallel IO Journal Device - Bookie can use multiple journal directories to utilize parallel write on NVMe - Achieve 3x Pulsar throughput with low latency, compared to HDD Ledger Device - Significantly faster random reads than HDD - Faster backlog draining while doing cold reads for multiple topics - JOURNAL-Device NVMe/SSD - DATA-Device NVMe/SSD - Index: RocksDB
  • 12. Storage Device: Sequential Vs Random IO 12
  • 14. Storage Evolution & Pulsar Adaptation: PMEM 14 PMEM ● Highest performing block storage device ● Ultra fast, super high throughput with consistent low latency ● Expensive; well suited as small device for WRITE intensive use cases Journal Device ● WAL/journal is proven design in Databases ○ transactional storage and recovery ○ high throughput ● Write optimized append only structure ● Does not require much storage and keeps short lived transactional data ● Using PMEM for journal device ○ adds < 5% cost for each bookie ○ Increases Pulsar throughput 5x times, and with low publish latency
  • 15. Pulsar Performance with Different BK-Journal Device 15 Performance configuration ● Enabled fsync on every published message ● Publish throughput with backlog draining ● SLA: 5ms (99%lie latency) ○ HDD: 120MB ○ SSD: 200MB ○ NVMe: 350MB ○ PMEM: 600MB
  • 16. ● Cost and Throughput ■ Using PMEM for journal adds < 5% more cost per host but reduce overall cost and cluster footprints ■ Achieve 5x more throughput with 99%-ile @ <5ms write latency ● Cluster footprint ■ Kafka cluster : 33 Kafka Brokers ■ Pulsar cluster: 10 bookies and 16 brokers ● Pulsar broker is a stateless component and costs 1/4x than bookie ■ Overall Pulsar cluster resources ½ of the Kafka cluster Case-study: Migrate Kafka Use Case to Pulsar 16
  • 17. Case-study: Migrate Kafka Use Case to Pulsar 17 USE CASES APACHE PULSAR APACHE KAFKA Throughput with low latency Cost Geo-replication Queuing Committing messages
  • 18. Future ● Use PMDK API to access persistent memory ○ bypass the file system ○ better throughput ● Tiered Storage for historical data use cases ○ relaxed latency requirements ○ cheaper cost ○ Use cases ■ ML model training ■ audit, forensics 18
  • 19. Thank you Joe Francis joef@verizonmedia.com Rajan Dhabalia rdhabalia@verizonmedia.com