SlideShare a Scribd company logo
Building a Messaging Solutions for OVHcloud
with Apache Pulsar
Pierre Zemb
Technical Leader
Pulsar Summit 2020
$ whoami
● Pierre Zemb (@PierreZ)
● Technical Leader
● Working around distributed systems
● Apache contributor
○ HBase, Flink, Pulsar
Involved into local dev communities
2
Schedule
1. What is OVHcloud?
2. The need of a Messaging Solutions
3. The choice of Apache Pulsar
4. Overview of our infrastructure
5. Overview of our management layer
6. The quest to support Apache Kafka
7. Our ideas for the future
3
OVHcloud, a Global Cloud Provider
● 30 data centers globally
● Our own high-quality global
network, committed to the
highest security standards
● NSX and vRack Secure your
platform with micro-
segmentation of private L2
that spans global data
centers
● SSL Gateway Service: Up to
10,000 concurrent
connections. Optional
Anycast DNS service.
● Highest compliance and
certification standards
● Anti-DDoS: Highly resilient
Layer 4-7 DDoS protection
built into the network
Providing a platform
Compute
Providing a platform
Compute messaging?
Let’s build a messaging solution! Been there...
● OVHcloud started a beta called “Queue as a service” in 2015
● Based on Apache Kafka
● Multi-tenant cluster
● Beta closed in 2018
● Massively used internally
What we learn from Queue As a Service
From users:
● Users wants not only Kafka, but queing as well:
○ RabbitMQ
○ MQTT
○ ...
● They want to support old versions of Kafka’s protocol
● Data encryption?
What we learn from Apache Kafka
From us:
● No built-in {multi-tenancy, geo-replication}
● Creating a topic is not cost-free
● infinite retention isn't possible
● no tiered storage
● Operations are not very convenient
○ we cannot "just" scale storage
○ a consumer reading old data can slow down the whole
broker
What we learn from Apache Kafka
Disclaimer:
● We ♥ Apache Kada
● We have far more messages in Kada than Pulsar within OVHcloud
● For certain use cases, we need an alternaeve
Let’s build a messaging solution!
Messaging system
Pulsar Kafka RabbitMQ ...
Messaging solueon
What we choose
as an infrastructure
provider
What we
are exposing
to customers
Let’s build a messaging solution!
Requirements for the foundation of a messaging solution:
● has multi-tenancy
● can be used for queuing and streaming
● can be easily extend
● has lower operational cost at scale
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Apache Pulsar’s TL;DR
❏ What Pulsar Provides
✓ Mul$-Tenancy
✓ Security
✓ TLS Encryp$on
✓ Authen$ca$on, Authoriza$on
✓ Geo-replica$on
✓ Queuing and streaming seman$cs
✓ Tiered storage
✓ Schema
✓ Integra$ons with big data ecosystem (Flink / Spark / Presto)
Let's deploy Apache Pulsar!
🚀
Our deployment
TODO drawing: remove producer
Add pulsar-proxy above pulsar-broker
Add haproxy above pulsar-proxy
Bookkeeper's tuning
● Enabled
○ Z Garbage Collector, also known as ZGC
○ Prometheus exporter
● configured:
○ multiple journalDirectory to better exploit SSD throughput
○ one ledgerDirectory per HDD
Pulsar's configuration
● Started with
○ 3 bookies to use when creating a ledger (ensemble)
○ 3 copies to store for each message
(writeQuorum)
○ 2 guaranteed copies
(ackQuorum)
● Now running 4/2/2 layouts
○ Increase striped writes
Lesson learned:
avoid having the ensemble equals to the number of bookies
Some benchmark!
Sending a small string as value as fast as we can from 8 VMs to two partitions
1.8 millions of msg/s/partitions
Some benchmark!
Bookkeeper outage
Lesson learned: learn Bookkeeper's CLI
Lesson learned: learn Bookkeeper's CLI
Meet Bookkeeper's friend: the Auditor
Meet Bookkeeper's friend: the Auditor
Meet Bookkeeper's friend: the Auditor
Let's manage Apache Pulsar!
🚀
Our management layer
Management
µservice
● create topic
● create tokens
● set retention
● ...
Sync
Our management layer
● Written in Go
● Cluster-aware
● Push topic's configuration to clusters
● Pull topic's usage from clusters
● Generate valid JWT's token
Our management layer
● WriFen in Go
● Cluster-aware
● Push topic's configuraHon to clusters
● Pull topic's usage from clusters
● Generate valid JWT's token
Lessons learned:
Pulling topics usage is costly, we should report them to management (PIP?)
How we handled tenancy
Geo replication, tiered-storage, retention and others are on the namespace-level
We ended up mapping topic to namespaces, which result in using one namespace
per topic
We closed the admin API to our users to enforce this behavior
How we will improve tenancy
https://github.com/apache/pulsar/wiki/PIP-39:-Namespace-Change-Events
How we will improve tenancy
Lesson learned:
closing the admin API is costly, as we need to rewrite all calls only to forward
them
PIP-39 + cluster usage report = opening back admin API
Opening ioStream beta!
Now we have our messaging system
Now we have our messaging system
Let's start Kafka-proxy!
Kafka-proxy, OVHcloud version
We first implemented KoP has a proxy PoC in Rust:
● Rust async was out in nightly compiler when we started
● We wanted no GC on proxy layers
● Rust has awesome libraries at TCP-level
Our goal was to convert TCP frames from KaSa to Pulsar
Kafka-proxy, OVHcloud version
Kafka-proxy, OVHcloud version
Kafka-proxy, OVHcloud version
Kafka-proxy, OVHcloud version
Kafka-proxy, OVHcloud version
● Working at TCP layer enables performance
● nice PoC to discover both protocols
● Rust is blazing fast
● Proxify production is easy
● We could bump old version of Kafka frames for
old Kafka clients
● Rewrite everything
● Some things were hard to proxify:
○ Group coordinator
○ Offsets management
● Difficult to open-source (different language)
And then we saw this 😍
Apache Pulsar's protocol handler
https://www.ovh.com/blog/announcing-kafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar/
Apache Pulsar's protocol handler
Apache Pulsar's protocol handler
Apache Pulsar's protocol handler
Apache Pulsar's protocol handler
Apache Pulsar’s TL;DR
❏ What Pulsar Provides
✓ Multi-Tenancy
✓ Security
✓ TLS Encryption
✓ Authentication, Authorization
✓ Geo-replication
✓ Queuing and streaming semantics
✓ Tiered storage
✓ Schema
✓ Integrations with big data ecosystem (Flink / Spark / Presto)
Apache Pulsar’s TL;DR
❏ What Pulsar Provides
✓ Mul$-Tenancy
✓ Security
✓ TLS Encryp$on
✓ Authen$ca$on, Authoriza$on
✓ Geo-replica$on
✓ Queuing and streaming seman$cs
✓ Tiered storage
✓ Schema
✓ Integra$ons with big data ecosystem (Flink / Spark / Presto)
✓ Addi$onal ecosystems
✓ KaSa
Thanks!
Do you have questions?
Slides
Twitter
Github
https://pierrezemb.fr
PierreZ
PierreZ
Bonus 😍
Our deployment
Bookkeeper
STOR-2
● Intel Xeon-D 1541
● 32GB DDR4 ECC
● 4x HDD 12TB
● 2x SSD 240GB
Pulsar
ADVANCE-4
● AMD Epyc 7351P
● 128GB DDR4 ECC
● 2x SSD NVMe RAID
Our ideas for the future
● Open back the admin API
○ Will allow users to easily use features like
■ schema, $ered-storage, geo replica$on, ...
■ order topics from code, ...
● Upgrade cluster
● Deploy Presto, KoP and WebSockets
● More protocols!
● Add encryp$on on Bookkeeper's layer
● Create "managed-topics"
○ a special namespace with topics populated by OVHcloud
○ See events/logs from other products

More Related Content

PDF
Pulsar Summit Asia - Running a secure pulsar cluster
PDF
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
PDF
Transaction preview of Apache Pulsar
PDF
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
PDF
Scaling customer engagement with apache pulsar
PDF
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
PDF
Building a FaaS with pulsar
Pulsar Summit Asia - Running a secure pulsar cluster
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
Transaction preview of Apache Pulsar
How Apache Pulsar Helps Tencent Process Tens of Billions of Transactions Effi...
Scaling customer engagement with apache pulsar
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Building a FaaS with pulsar

What's hot (20)

PDF
Securing your Pulsar Cluster with Vault_Chris Kellogg
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PDF
Strata London 2018: Multi-everything with Apache Pulsar
PDF
Apache Pulsar Seattle - Meetup
PDF
Apache Pulsar at Yahoo! Japan
PDF
Securing Kafka
PDF
Kafka Security 101 and Real-World Tips
PDF
How Orange Financial combat financial frauds over 50M transactions a day usin...
PDF
Streaming millions of Contact Center interactions in (near) real-time with Pu...
PDF
Stream-Native Processing with Pulsar Functions
PDF
Query Pulsar Streams using Apache Flink
PDF
Getting Pulsar Spinning_Addison Higham
PDF
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
PDF
Devoxx Morocco 2016 - Microservices with Kafka
PDF
Apache Kafka - Martin Podval
PDF
Pulsar Storage on BookKeeper _Seamless Evolution
PDF
Transaction Support in Pulsar 2.5.0
PDF
Open keynote_carolyn&matteo&sijie
PDF
When apache pulsar meets apache flink
PDF
Integrating Apache Pulsar with Big Data Ecosystem
Securing your Pulsar Cluster with Vault_Chris Kellogg
Lessons from managing a Pulsar cluster (Nutanix)
Strata London 2018: Multi-everything with Apache Pulsar
Apache Pulsar Seattle - Meetup
Apache Pulsar at Yahoo! Japan
Securing Kafka
Kafka Security 101 and Real-World Tips
How Orange Financial combat financial frauds over 50M transactions a day usin...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Stream-Native Processing with Pulsar Functions
Query Pulsar Streams using Apache Flink
Getting Pulsar Spinning_Addison Higham
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
Devoxx Morocco 2016 - Microservices with Kafka
Apache Kafka - Martin Podval
Pulsar Storage on BookKeeper _Seamless Evolution
Transaction Support in Pulsar 2.5.0
Open keynote_carolyn&matteo&sijie
When apache pulsar meets apache flink
Integrating Apache Pulsar with Big Data Ecosystem
Ad

Similar to Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb (20)

PDF
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
Music city data Hail Hydrate! from stream to lake
PDF
(Current22) Let's Monitor The Conditions at the Conference
PDF
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
PDF
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
PDF
ContainerDayVietnam2016: Kubernetes State-of-the-art Container Management Pla...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
PDF
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
PDF
Stories from running Kafka on K8S.pdf
PDF
Sanger OpenStack presentation March 2017
PDF
Structured Streaming with Kafka
PDF
Containers - Portable, repeatable user-oriented application delivery. Build, ...
PDF
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
PDF
Ippevent : openshift Introduction
PDF
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
PDF
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
PDF
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Music city data Hail Hydrate! from stream to lake
(Current22) Let's Monitor The Conditions at the Conference
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
ContainerDayVietnam2016: Kubernetes State-of-the-art Container Management Pla...
Cloud lunch and learn real-time streaming in azure
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
Stories from running Kafka on K8S.pdf
Sanger OpenStack presentation March 2017
Structured Streaming with Kafka
Containers - Portable, repeatable user-oriented application delivery. Build, ...
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
Ippevent : openshift Introduction
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Knowledge Engineering Part 1
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
annual-report-2024-2025 original latest.
Business Acumen Training GuidePresentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
Quality review (1)_presentation of this 21
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Knowledge Engineering Part 1
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
Mega Projects Data Mega Projects Data
annual-report-2024-2025 original latest.

Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb

  • 1. Building a Messaging Solutions for OVHcloud with Apache Pulsar Pierre Zemb Technical Leader Pulsar Summit 2020
  • 2. $ whoami ● Pierre Zemb (@PierreZ) ● Technical Leader ● Working around distributed systems ● Apache contributor ○ HBase, Flink, Pulsar Involved into local dev communities 2
  • 3. Schedule 1. What is OVHcloud? 2. The need of a Messaging Solutions 3. The choice of Apache Pulsar 4. Overview of our infrastructure 5. Overview of our management layer 6. The quest to support Apache Kafka 7. Our ideas for the future 3
  • 4. OVHcloud, a Global Cloud Provider ● 30 data centers globally ● Our own high-quality global network, committed to the highest security standards ● NSX and vRack Secure your platform with micro- segmentation of private L2 that spans global data centers ● SSL Gateway Service: Up to 10,000 concurrent connections. Optional Anycast DNS service. ● Highest compliance and certification standards ● Anti-DDoS: Highly resilient Layer 4-7 DDoS protection built into the network
  • 7. Let’s build a messaging solution! Been there... ● OVHcloud started a beta called “Queue as a service” in 2015 ● Based on Apache Kafka ● Multi-tenant cluster ● Beta closed in 2018 ● Massively used internally
  • 8. What we learn from Queue As a Service From users: ● Users wants not only Kafka, but queing as well: ○ RabbitMQ ○ MQTT ○ ... ● They want to support old versions of Kafka’s protocol ● Data encryption?
  • 9. What we learn from Apache Kafka From us: ● No built-in {multi-tenancy, geo-replication} ● Creating a topic is not cost-free ● infinite retention isn't possible ● no tiered storage ● Operations are not very convenient ○ we cannot "just" scale storage ○ a consumer reading old data can slow down the whole broker
  • 10. What we learn from Apache Kafka Disclaimer: ● We ♥ Apache Kada ● We have far more messages in Kada than Pulsar within OVHcloud ● For certain use cases, we need an alternaeve
  • 11. Let’s build a messaging solution! Messaging system Pulsar Kafka RabbitMQ ... Messaging solueon What we choose as an infrastructure provider What we are exposing to customers
  • 12. Let’s build a messaging solution! Requirements for the foundation of a messaging solution: ● has multi-tenancy ● can be used for queuing and streaming ● can be easily extend ● has lower operational cost at scale
  • 14. Apache Pulsar’s TL;DR ❏ What Pulsar Provides ✓ Mul$-Tenancy ✓ Security ✓ TLS Encryp$on ✓ Authen$ca$on, Authoriza$on ✓ Geo-replica$on ✓ Queuing and streaming seman$cs ✓ Tiered storage ✓ Schema ✓ Integra$ons with big data ecosystem (Flink / Spark / Presto)
  • 15. Let's deploy Apache Pulsar! 🚀
  • 16. Our deployment TODO drawing: remove producer Add pulsar-proxy above pulsar-broker Add haproxy above pulsar-proxy
  • 17. Bookkeeper's tuning ● Enabled ○ Z Garbage Collector, also known as ZGC ○ Prometheus exporter ● configured: ○ multiple journalDirectory to better exploit SSD throughput ○ one ledgerDirectory per HDD
  • 18. Pulsar's configuration ● Started with ○ 3 bookies to use when creating a ledger (ensemble) ○ 3 copies to store for each message (writeQuorum) ○ 2 guaranteed copies (ackQuorum) ● Now running 4/2/2 layouts ○ Increase striped writes Lesson learned: avoid having the ensemble equals to the number of bookies
  • 19. Some benchmark! Sending a small string as value as fast as we can from 8 VMs to two partitions 1.8 millions of msg/s/partitions
  • 21. Lesson learned: learn Bookkeeper's CLI
  • 22. Lesson learned: learn Bookkeeper's CLI
  • 26. Let's manage Apache Pulsar! 🚀
  • 27. Our management layer Management µservice ● create topic ● create tokens ● set retention ● ... Sync
  • 28. Our management layer ● Written in Go ● Cluster-aware ● Push topic's configuration to clusters ● Pull topic's usage from clusters ● Generate valid JWT's token
  • 29. Our management layer ● WriFen in Go ● Cluster-aware ● Push topic's configuraHon to clusters ● Pull topic's usage from clusters ● Generate valid JWT's token Lessons learned: Pulling topics usage is costly, we should report them to management (PIP?)
  • 30. How we handled tenancy Geo replication, tiered-storage, retention and others are on the namespace-level We ended up mapping topic to namespaces, which result in using one namespace per topic We closed the admin API to our users to enforce this behavior
  • 31. How we will improve tenancy https://github.com/apache/pulsar/wiki/PIP-39:-Namespace-Change-Events
  • 32. How we will improve tenancy Lesson learned: closing the admin API is costly, as we need to rewrite all calls only to forward them PIP-39 + cluster usage report = opening back admin API
  • 34. Now we have our messaging system
  • 35. Now we have our messaging system
  • 37. Kafka-proxy, OVHcloud version We first implemented KoP has a proxy PoC in Rust: ● Rust async was out in nightly compiler when we started ● We wanted no GC on proxy layers ● Rust has awesome libraries at TCP-level Our goal was to convert TCP frames from KaSa to Pulsar
  • 42. Kafka-proxy, OVHcloud version ● Working at TCP layer enables performance ● nice PoC to discover both protocols ● Rust is blazing fast ● Proxify production is easy ● We could bump old version of Kafka frames for old Kafka clients ● Rewrite everything ● Some things were hard to proxify: ○ Group coordinator ○ Offsets management ● Difficult to open-source (different language)
  • 43. And then we saw this 😍
  • 44. Apache Pulsar's protocol handler https://www.ovh.com/blog/announcing-kafka-on-pulsar-bring-native-kafka-protocol-support-to-apache-pulsar/
  • 49. Apache Pulsar’s TL;DR ❏ What Pulsar Provides ✓ Multi-Tenancy ✓ Security ✓ TLS Encryption ✓ Authentication, Authorization ✓ Geo-replication ✓ Queuing and streaming semantics ✓ Tiered storage ✓ Schema ✓ Integrations with big data ecosystem (Flink / Spark / Presto)
  • 50. Apache Pulsar’s TL;DR ❏ What Pulsar Provides ✓ Mul$-Tenancy ✓ Security ✓ TLS Encryp$on ✓ Authen$ca$on, Authoriza$on ✓ Geo-replica$on ✓ Queuing and streaming seman$cs ✓ Tiered storage ✓ Schema ✓ Integra$ons with big data ecosystem (Flink / Spark / Presto) ✓ Addi$onal ecosystems ✓ KaSa
  • 51. Thanks! Do you have questions? Slides Twitter Github https://pierrezemb.fr PierreZ PierreZ
  • 53. Our deployment Bookkeeper STOR-2 ● Intel Xeon-D 1541 ● 32GB DDR4 ECC ● 4x HDD 12TB ● 2x SSD 240GB Pulsar ADVANCE-4 ● AMD Epyc 7351P ● 128GB DDR4 ECC ● 2x SSD NVMe RAID
  • 54. Our ideas for the future ● Open back the admin API ○ Will allow users to easily use features like ■ schema, $ered-storage, geo replica$on, ... ■ order topics from code, ... ● Upgrade cluster ● Deploy Presto, KoP and WebSockets ● More protocols! ● Add encryp$on on Bookkeeper's layer ● Create "managed-topics" ○ a special namespace with topics populated by OVHcloud ○ See events/logs from other products