SlideShare a Scribd company logo
Open keynote_carolyn&matteo&sijie
Welcome to the first-ever Pulsar Summit,
hosted by:
&
Platinum
Gold
Community
Media
A Big Thanks to Our Sponsors
A Big Thanks to the Program Committee
Sijie Guo Matteo Merli Jia Zhai Jesse Anderson Nozomi Kurihara
Jerry Peng Ben Lorica Dave Fisher Yuvaraj Loganathan
A Big Thanks to the Speakers
Apache Pulsar in 2020
Pulsar Community & Ecosystem
● Major Product
Releases & Updates
● Monthly Webinars
● Weekly Trainings
○ TGIP Every Friday
at 1pm PT
● Case Studies, White
Papers & Use Cases
Apache Pulsar in 2020
Pulsar Community & Ecosystem
Get Involved!
#1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case studies and
more
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
Get Involved!
#1. Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
○ #PulsarSummit - connect with fellow attendees in real-time
○ #Job-Board - post and search Pulsar-related jobs
#2. Join Pulsar Summit Newsletter List
○ Learn about upcoming webinars, product releases, case
studies and more. At the bottom of the
#3. Follow @apache_pulsar on Twitter
#4. Take the Post-Summit Survey!
Apache Pulsar in 2020
First-Ever Apache Pulsar Summit
● 36 speakers
● 35+ sessions
● 550+ attendee sign-ups
● 300+ companies represented
Conference Overview & Logistics
In-Person Summit
San Francisco, CA
March 2020
Virtual Summit
June 2020
Conference Overview & Logistics
2 Days / 3 Tracks / 3 Zoom Links
TRACK 1 TRACK 2 TRACK 3TIME
8:30-9:10
9:20-10
10:20-10:50
11-11:40
Keynote: Adoption, Use Cases & The Future of Pulsar (TRACK 1)
Keynote: Why Splunk Chose Pulsar, by Karthik Ramasamy (TRACK 1)
Scaling Customer
Engagement...
Kafka on Pulsar...
Pulsar Storage on
Bookkeeper...
Getting Pulsar
Spinning...
Event Propagation
across…
Pulsar Functions Deep
Dive...
Pulsar Summit Track Moderators
Track 1: Carolyn King
Marketing, StreamNative
Track 2: Jun Wang
Marketing & Events
Track 3: Rosalie Bartlett
Sr Community Mgr, Verizon Media
Contact us at events@streamnative.io with any questions!
KEYNOTE SESSION:
Messaging and Event Streaming
Adoption, Use Cases and the Future of Pulsar
Matteo Merli / Splunk
Sijie Guo / StreamNative
Who are we?
- Sijie Guo (@sijieg)
- Co-Founder, StreamNative
- PMC Member of Pulsar/BookKeeper
- Ex Co-Founder, Streamlio
- Matteo Merli (@merlimat)
- Sr. Principal Engineer, Splunk
- Co-creator and PMC chair of Pulsar
- Ex Co-Founder, Streamlio
Splunk
Splunk provides operational intelligence software that monitors,
reports, and analyzes real-time machine data.
Splunk acquired Streamlio in Nov 2019 as part of an expanded
investment in the data streaming space.
Splunk is using Pulsar in multiple product lines and it’s deeply
committed in further developing Pulsar and fostering its community.
StreamNative
Founded by the developers of Apache Pulsar and Apache BookKeeper,
StreamNative enables companies to access enterprise data as real-time
event streams.
- Contributing and helping the Pulsar/BK community grow
- Helping people resolve business problems using Pulsar
- Providing managed Pulsar services and enterprise support
Agenda
- How Organizations are Using Pulsar Today / Sijie Guo
- What is Driving Pulsar Adoption / Sijie Guo
- The Future of Apache Pulsar / Matteo Merli
What is Apache Pulsar?
Pulsar is a cloud-native messaging
and event streaming platform
Pulsar’s Global Adoption
Splunk
The Data-to-Everything™
Platform
- Industry: IT
- Adoption: +6 months
- Market Cap: 29B
#1 Splunk Data Stream Processor
based on Apache Pulsar
#2 Streaming and Batch connectors
#3 Pulsar as a Service for Splunk cloud
products
Case Study Highlights
Narvar
Intelligent Customer
Experience Platform
- Industry: Retail
- Adoption: 1.5 years
- Scale: 50k txns/second
- Mission Critical Applications
#1 Real time transactional messaging
#2 Data integration with Data Lake
#3 Complex event processing
#4 Heavy Pulsar Functions user
Case Study Highlights
Instructure
Educational Technology
Company
- Industry: Education
- Adoption: 1+ years
- Scale: 8 AWS regions, 50k
msgs/sec in the busiest region
#1 Low-cost
#2 Easy to manage long term retention
#3 Unified messaging model
Case Study Highlights
Clever Cloud
PaaS Company
- Industry: Cloud Computing
- Adoption: 1+ years
- Use Cases: log ingestion pipeline
and Function as a Service
#1 Multi-tenant queue system
#2 Proxy Architecture
#3 Presto and S3 integration
Case Study Highlights
Tencent
The Wechat Company
- Industry: Internet
- Use case: Financial & Log pipeline
- Adoption: 2+ years
- Scale: 10s billions of financial txns
every day
#1 Powering Tencent Billing Platform
#2 Data transfer layer for federated
machine learning platform
#3 Replace Kafka for its logging pipeline
in Tencent Games
Case Study Highlights
Huya Live
Live Streaming Service
- Market Cap: 4B
- Use case: Log collection
- Adoption: 1+ year
- Scale: 15 millions msgs/sec
#1 Replace Kafka in its log pipeline
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
Yum China
American Fortune 500
fast-food company
- Revenue: 8B
- Use case: Notification & Order
Processing
- Adoption: 6+ months
#1 Replace RabbitMQ
#2 Instant scalability
#3 Multi Tenancy
Case Study Highlights
Global Adoption
What is driving Pulsar Adoption?
-
Insights from the Pulsar User Survey 2020
What value do Pulsar bring to your organization?
#1 Increased Agility
#2 Unlocks New Use
Cases for the Business
#3 Reduced Costs
#4 Improved Customer
Experience
What are the top 3 highlights for Pulsar?
#1 Architecture Design
#2 Scalability
#3 Resiliency
Top Use-Cases
#1 Asynchronous
Applications
#2 Building Core Business
Applications
#3 ETL / Data Pipelines
Most-Used Features
#1 Pub/Sub
#2 Multi-Tenancy
#3 Functions
#4 Tiered Storage
#5 Connectors
42%
Consider Pulsar to replace two or more messaging systems,
Because Pulsar is a unified messaging and event streaming platform.
Apache Pulsar: 18-mo Growth Rate
7x
growth
Apache Pulsar today
The Apache Pulsar community has shaped the current Pulsar as -
A cloud-native messaging and event streaming platform
-
Pub/Sub
Store
Process
The future of Apache Pulsar
History
✓ 2012 — Pulsar inception at Yahoo
✓ Mandate: scalable, multi-tenant messaging service
✓ 2016 — Open-Sourced by Yahoo
✓ 2017 — Project migrated to Apache Software Foundation
✓ 2018
✓ Promoted to Top-Level Apache project
✓ Apache Pulsar 2.0 released
✓ 2019 — Adoption increase
Pulsar today
● Huge growth for the project
○ Scope and features
○ Community
● We have been able to build a lot without changing the core of the
system
● The path has been linear:
● Focus on providing the best infra for messaging and event streaming
● Build on the architectural strength of Pulsar
● Listen to the users, make their life easier
1. Pub-Sub implemented over distributed log-storage
2. Schema
3. Pulsar Functions
4. Pulsar IO
5. Pulsar SQL
6. Tiered Storage
Pulsar evolution
Messaging
-
Publish and consume events at scale from anywhere using any
protocols and languages
Protocol Handler (*oP)
- KoP: Kafka-on-Pulsar
- AoP: AMQP-on-Pulsar
- MoP: MQTT-on-Pulsar
New Features
- Transaction Support: lot of progress, coming in 2.7
- REST API to produce / consume
- Readonly brokers:
- High fanout
- Scale brokers on demand without affecting ownership
- Exclusive Producer
- Single writer to provide fencing and leader election for applications
Partitions auto-scaling
- Partitions are an artifact of scaling
- System complexity should be hidden
- Pulsar should be able to automatically manage partitions:
- Increase / Decrease based on load
- Retain ordering
- Remove duality of partitioned/non-partitioned topics
- Pulsar uses ZooKeeper as a metadata store and coordination
service
- Work in ongoing to abstract the metadata access layer
- Soon it will be possible to choose from different backend
implementations
- In future, we would provide out of the box metadata support in
Pulsar
Pluggable metadata store
Storage
-
Continue to push the boundaries for truly scalable stream storage
Performance - Operability - Cost
- A lot of behind the scenes work has been happening on performance
- Continuously ensure that Pulsar + BookKeeper are the most effective
platform to store data in every environment:
- Cloud / Multi-Cloud
- On-Prem
- Efficiently support wildly different requirements:
- Strong consistency and durability
- Low cost and huge throughput
Storage
1. Distributed log-storage
2. Schema — Structured storage
3. Tiered Storage — Infinite stream capacity
4. Topic Compaction — Table & Stream duality
5. Key-Value — Functions state access
Storage evolution
Storage - Columnar Offloader
State Store
Key-Value store used by Pulsar Functions
- Maturing Global State
- Hot-replicas for super-fast failovers
- Monitoring
- Extensive testing
- Change data capture of state updates
- Local read access to cached values
- Efficiently support read-intensive data accesses
Processing
-
Built-In processing + Integrate with existing platforms
Processing
Process event streams in real-time at scale
- Pulsar Functions ⟶ lightweight / serverless compute
- Pulsar-Flink / Pulsar-Spark ⟶ batch and stream processing
- Pulsar SQL ⟶ interactive queries with Presto
Pulsar-Flink
- FLIP-72: Introduce Pulsar Connector
- Batch reader: for batch processing
- Segment reader
- Bypassing brokers
- Read segments from Apache BookKeeper and Tiered Storage
- Sub-stream reader: for scale-out stream processing
- Key_Shared subscription & readers
- Read from brokers
- Scale the processing parallelism beyond the number of partitions
Event storage API
- Provide multiple access layers to the data
- Stream Reader: read events in a partition-based order
- Sub-stream Reader: read events in key-based order
- Segment Readers: read segments from Apache BookKeeper and Tiered Storage
- Integrations
- Pulsar-Flink
- Pulsar-Spark
- Pulsar-Presto
- Pluggable Language Runtime
- Function registry
- Share and reuse functions
- Operability
- Functions Versioning
- Upgrade / Rollback
- A/B testing
- Connectors (Batch / Stream connectors)
Pulsar Functions
Function Mesh
- PIP-66: Function Mesh
- Compose multiple
sources, sinks and
functions together
- YAML Config or DSL
Function Mesh
Managements Tools
- Pulsar Manager
- Support Schema, Functions & Connectors
- Integrate with BookKeeper Visual Manager
- PulsarCtl - Go based CLI admin tool
- Pulsar Helm Chart
- Kubernetes Operator
- Tenant & Topic level configuration policies
- Broker interceptors
- Pluggable provider for tracing and extending broker capabilities
- System Events/Topics - Change Event Streams
Open keynote_carolyn&matteo&sijie

More Related Content

PDF
Getting Pulsar Spinning_Addison Higham
PDF
Pulsar Functions Deep Dive_Sanjeev kulkarni
PDF
Interactive querying of streams using Apache Pulsar_Jerry peng
PDF
Large scale log pipeline using Apache Pulsar_Nozomi
PDF
Apache Pulsar at Yahoo! Japan
PDF
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
PDF
Pulsar for Kafka People
PDF
Transaction preview of Apache Pulsar
Getting Pulsar Spinning_Addison Higham
Pulsar Functions Deep Dive_Sanjeev kulkarni
Interactive querying of streams using Apache Pulsar_Jerry peng
Large scale log pipeline using Apache Pulsar_Nozomi
Apache Pulsar at Yahoo! Japan
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
Pulsar for Kafka People
Transaction preview of Apache Pulsar

What's hot (20)

PDF
Scaling customer engagement with apache pulsar
PDF
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
PDF
Serverless Event Streaming with Pulsar Functions
PDF
Building a FaaS with pulsar
PDF
Apache Pulsar and Github
PDF
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
PPTX
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
PDF
Query Pulsar Streams using Apache Flink
PDF
Kafka on Pulsar
PDF
How Orange Financial combat financial frauds over 50M transactions a day usin...
PDF
Stream-Native Processing with Pulsar Functions
PDF
Integrating Apache Pulsar with Big Data Ecosystem
PDF
Pulsar Storage on BookKeeper _Seamless Evolution
PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
PDF
Five years of operating a large scale globally replicated Pulsar installation...
PDF
[March sn meetup] apache pulsar + apache nifi for cloud data lake
PDF
Kafka and Spark Streaming
PPTX
Kafka blr-meetup-presentation - Kafka internals
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PPTX
Architecture of a Kafka camus infrastructure
Scaling customer engagement with apache pulsar
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Serverless Event Streaming with Pulsar Functions
Building a FaaS with pulsar
Apache Pulsar and Github
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Query Pulsar Streams using Apache Flink
Kafka on Pulsar
How Orange Financial combat financial frauds over 50M transactions a day usin...
Stream-Native Processing with Pulsar Functions
Integrating Apache Pulsar with Big Data Ecosystem
Pulsar Storage on BookKeeper _Seamless Evolution
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Five years of operating a large scale globally replicated Pulsar installation...
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Kafka and Spark Streaming
Kafka blr-meetup-presentation - Kafka internals
Lessons from managing a Pulsar cluster (Nutanix)
Architecture of a Kafka camus infrastructure
Ad

Similar to Open keynote_carolyn&matteo&sijie (20)

PDF
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
Timothy Spann: Apache Pulsar for ML
PDF
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
PPTX
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
PDF
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
PDF
Music city data Hail Hydrate! from stream to lake
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Open Source Bristol 30 March 2022
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
Apache Pulsar @Splunk
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
What We Learned From Building a Modern Messaging and Streaming System for Cloud
PDF
Apache Pulsar Development 101 with Python
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
bigdata 2022_ FLiP Into Pulsar Apps
Building an Event Streaming Architecture with Apache Pulsar
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann: Apache Pulsar for ML
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Music city data Hail Hydrate! from stream to lake
Cloud lunch and learn real-time streaming in azure
Open Source Bristol 30 March 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Apache Pulsar @Splunk
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
What We Learned From Building a Modern Messaging and Streaming System for Cloud
Apache Pulsar Development 101 with Python
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Managing Community Partner Relationships
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Leprosy and NLEP programme community medicine
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Database Infoormation System (DBIS).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Predictive modeling basics in data cleaning process
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Transcultural that can help you someday.
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
IMPACT OF LANDSLIDE.....................
PDF
annual-report-2024-2025 original latest.
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Managing Community Partner Relationships
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Leprosy and NLEP programme community medicine
Microsoft Core Cloud Services powerpoint
Database Infoormation System (DBIS).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Predictive modeling basics in data cleaning process
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Transcultural that can help you someday.
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Introduction to Data Science and Data Analysis
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
ISS -ESG Data flows What is ESG and HowHow
IMPACT OF LANDSLIDE.....................
annual-report-2024-2025 original latest.
Pilar Kemerdekaan dan Identi Bangsa.pptx

Open keynote_carolyn&matteo&sijie

  • 2. Welcome to the first-ever Pulsar Summit, hosted by: &
  • 4. A Big Thanks to the Program Committee Sijie Guo Matteo Merli Jia Zhai Jesse Anderson Nozomi Kurihara Jerry Peng Ben Lorica Dave Fisher Yuvaraj Loganathan
  • 5. A Big Thanks to the Speakers
  • 7. Pulsar Community & Ecosystem ● Major Product Releases & Updates ● Monthly Webinars ● Weekly Trainings ○ TGIP Every Friday at 1pm PT ● Case Studies, White Papers & Use Cases
  • 9. Pulsar Community & Ecosystem
  • 10. Get Involved! #1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com ○ #PulsarSummit - connect with fellow attendees in real-time ○ #Job-Board - post and search Pulsar-related jobs #2. Join Pulsar Summit Newsletter List ○ Learn about upcoming webinars, product releases, case studies and more #3. Follow @apache_pulsar on Twitter #4. Take the Post-Summit Survey!
  • 11. Get Involved! #1. Join the Pulsar Slack channel - Apache-Pulsar.slack.com ○ #PulsarSummit - connect with fellow attendees in real-time ○ #Job-Board - post and search Pulsar-related jobs #2. Join Pulsar Summit Newsletter List ○ Learn about upcoming webinars, product releases, case studies and more. At the bottom of the #3. Follow @apache_pulsar on Twitter #4. Take the Post-Summit Survey!
  • 12. Apache Pulsar in 2020 First-Ever Apache Pulsar Summit ● 36 speakers ● 35+ sessions ● 550+ attendee sign-ups ● 300+ companies represented
  • 13. Conference Overview & Logistics In-Person Summit San Francisco, CA March 2020 Virtual Summit June 2020
  • 14. Conference Overview & Logistics 2 Days / 3 Tracks / 3 Zoom Links TRACK 1 TRACK 2 TRACK 3TIME 8:30-9:10 9:20-10 10:20-10:50 11-11:40 Keynote: Adoption, Use Cases & The Future of Pulsar (TRACK 1) Keynote: Why Splunk Chose Pulsar, by Karthik Ramasamy (TRACK 1) Scaling Customer Engagement... Kafka on Pulsar... Pulsar Storage on Bookkeeper... Getting Pulsar Spinning... Event Propagation across… Pulsar Functions Deep Dive...
  • 15. Pulsar Summit Track Moderators Track 1: Carolyn King Marketing, StreamNative Track 2: Jun Wang Marketing & Events Track 3: Rosalie Bartlett Sr Community Mgr, Verizon Media Contact us at events@streamnative.io with any questions!
  • 16. KEYNOTE SESSION: Messaging and Event Streaming Adoption, Use Cases and the Future of Pulsar Matteo Merli / Splunk Sijie Guo / StreamNative
  • 17. Who are we? - Sijie Guo (@sijieg) - Co-Founder, StreamNative - PMC Member of Pulsar/BookKeeper - Ex Co-Founder, Streamlio - Matteo Merli (@merlimat) - Sr. Principal Engineer, Splunk - Co-creator and PMC chair of Pulsar - Ex Co-Founder, Streamlio
  • 18. Splunk Splunk provides operational intelligence software that monitors, reports, and analyzes real-time machine data. Splunk acquired Streamlio in Nov 2019 as part of an expanded investment in the data streaming space. Splunk is using Pulsar in multiple product lines and it’s deeply committed in further developing Pulsar and fostering its community.
  • 19. StreamNative Founded by the developers of Apache Pulsar and Apache BookKeeper, StreamNative enables companies to access enterprise data as real-time event streams. - Contributing and helping the Pulsar/BK community grow - Helping people resolve business problems using Pulsar - Providing managed Pulsar services and enterprise support
  • 20. Agenda - How Organizations are Using Pulsar Today / Sijie Guo - What is Driving Pulsar Adoption / Sijie Guo - The Future of Apache Pulsar / Matteo Merli
  • 21. What is Apache Pulsar?
  • 22. Pulsar is a cloud-native messaging and event streaming platform
  • 24. Splunk The Data-to-Everything™ Platform - Industry: IT - Adoption: +6 months - Market Cap: 29B #1 Splunk Data Stream Processor based on Apache Pulsar #2 Streaming and Batch connectors #3 Pulsar as a Service for Splunk cloud products Case Study Highlights
  • 25. Narvar Intelligent Customer Experience Platform - Industry: Retail - Adoption: 1.5 years - Scale: 50k txns/second - Mission Critical Applications #1 Real time transactional messaging #2 Data integration with Data Lake #3 Complex event processing #4 Heavy Pulsar Functions user Case Study Highlights
  • 26. Instructure Educational Technology Company - Industry: Education - Adoption: 1+ years - Scale: 8 AWS regions, 50k msgs/sec in the busiest region #1 Low-cost #2 Easy to manage long term retention #3 Unified messaging model Case Study Highlights
  • 27. Clever Cloud PaaS Company - Industry: Cloud Computing - Adoption: 1+ years - Use Cases: log ingestion pipeline and Function as a Service #1 Multi-tenant queue system #2 Proxy Architecture #3 Presto and S3 integration Case Study Highlights
  • 28. Tencent The Wechat Company - Industry: Internet - Use case: Financial & Log pipeline - Adoption: 2+ years - Scale: 10s billions of financial txns every day #1 Powering Tencent Billing Platform #2 Data transfer layer for federated machine learning platform #3 Replace Kafka for its logging pipeline in Tencent Games Case Study Highlights
  • 29. Huya Live Live Streaming Service - Market Cap: 4B - Use case: Log collection - Adoption: 1+ year - Scale: 15 millions msgs/sec #1 Replace Kafka in its log pipeline #2 Instant scalability #3 Multi Tenancy Case Study Highlights
  • 30. Yum China American Fortune 500 fast-food company - Revenue: 8B - Use case: Notification & Order Processing - Adoption: 6+ months #1 Replace RabbitMQ #2 Instant scalability #3 Multi Tenancy Case Study Highlights
  • 32. What is driving Pulsar Adoption? - Insights from the Pulsar User Survey 2020
  • 33. What value do Pulsar bring to your organization? #1 Increased Agility #2 Unlocks New Use Cases for the Business #3 Reduced Costs #4 Improved Customer Experience
  • 34. What are the top 3 highlights for Pulsar? #1 Architecture Design #2 Scalability #3 Resiliency
  • 35. Top Use-Cases #1 Asynchronous Applications #2 Building Core Business Applications #3 ETL / Data Pipelines
  • 36. Most-Used Features #1 Pub/Sub #2 Multi-Tenancy #3 Functions #4 Tiered Storage #5 Connectors
  • 37. 42% Consider Pulsar to replace two or more messaging systems, Because Pulsar is a unified messaging and event streaming platform.
  • 38. Apache Pulsar: 18-mo Growth Rate 7x growth
  • 39. Apache Pulsar today The Apache Pulsar community has shaped the current Pulsar as - A cloud-native messaging and event streaming platform - Pub/Sub Store Process
  • 40. The future of Apache Pulsar
  • 41. History ✓ 2012 — Pulsar inception at Yahoo ✓ Mandate: scalable, multi-tenant messaging service ✓ 2016 — Open-Sourced by Yahoo ✓ 2017 — Project migrated to Apache Software Foundation ✓ 2018 ✓ Promoted to Top-Level Apache project ✓ Apache Pulsar 2.0 released ✓ 2019 — Adoption increase
  • 42. Pulsar today ● Huge growth for the project ○ Scope and features ○ Community ● We have been able to build a lot without changing the core of the system ● The path has been linear: ● Focus on providing the best infra for messaging and event streaming ● Build on the architectural strength of Pulsar ● Listen to the users, make their life easier
  • 43. 1. Pub-Sub implemented over distributed log-storage 2. Schema 3. Pulsar Functions 4. Pulsar IO 5. Pulsar SQL 6. Tiered Storage Pulsar evolution
  • 44. Messaging - Publish and consume events at scale from anywhere using any protocols and languages
  • 45. Protocol Handler (*oP) - KoP: Kafka-on-Pulsar - AoP: AMQP-on-Pulsar - MoP: MQTT-on-Pulsar
  • 46. New Features - Transaction Support: lot of progress, coming in 2.7 - REST API to produce / consume - Readonly brokers: - High fanout - Scale brokers on demand without affecting ownership - Exclusive Producer - Single writer to provide fencing and leader election for applications
  • 47. Partitions auto-scaling - Partitions are an artifact of scaling - System complexity should be hidden - Pulsar should be able to automatically manage partitions: - Increase / Decrease based on load - Retain ordering - Remove duality of partitioned/non-partitioned topics
  • 48. - Pulsar uses ZooKeeper as a metadata store and coordination service - Work in ongoing to abstract the metadata access layer - Soon it will be possible to choose from different backend implementations - In future, we would provide out of the box metadata support in Pulsar Pluggable metadata store
  • 49. Storage - Continue to push the boundaries for truly scalable stream storage Performance - Operability - Cost
  • 50. - A lot of behind the scenes work has been happening on performance - Continuously ensure that Pulsar + BookKeeper are the most effective platform to store data in every environment: - Cloud / Multi-Cloud - On-Prem - Efficiently support wildly different requirements: - Strong consistency and durability - Low cost and huge throughput Storage
  • 51. 1. Distributed log-storage 2. Schema — Structured storage 3. Tiered Storage — Infinite stream capacity 4. Topic Compaction — Table & Stream duality 5. Key-Value — Functions state access Storage evolution
  • 52. Storage - Columnar Offloader
  • 53. State Store Key-Value store used by Pulsar Functions - Maturing Global State - Hot-replicas for super-fast failovers - Monitoring - Extensive testing - Change data capture of state updates - Local read access to cached values - Efficiently support read-intensive data accesses
  • 54. Processing - Built-In processing + Integrate with existing platforms
  • 55. Processing Process event streams in real-time at scale - Pulsar Functions ⟶ lightweight / serverless compute - Pulsar-Flink / Pulsar-Spark ⟶ batch and stream processing - Pulsar SQL ⟶ interactive queries with Presto
  • 56. Pulsar-Flink - FLIP-72: Introduce Pulsar Connector - Batch reader: for batch processing - Segment reader - Bypassing brokers - Read segments from Apache BookKeeper and Tiered Storage - Sub-stream reader: for scale-out stream processing - Key_Shared subscription & readers - Read from brokers - Scale the processing parallelism beyond the number of partitions
  • 57. Event storage API - Provide multiple access layers to the data - Stream Reader: read events in a partition-based order - Sub-stream Reader: read events in key-based order - Segment Readers: read segments from Apache BookKeeper and Tiered Storage - Integrations - Pulsar-Flink - Pulsar-Spark - Pulsar-Presto
  • 58. - Pluggable Language Runtime - Function registry - Share and reuse functions - Operability - Functions Versioning - Upgrade / Rollback - A/B testing - Connectors (Batch / Stream connectors) Pulsar Functions
  • 59. Function Mesh - PIP-66: Function Mesh - Compose multiple sources, sinks and functions together - YAML Config or DSL
  • 61. Managements Tools - Pulsar Manager - Support Schema, Functions & Connectors - Integrate with BookKeeper Visual Manager - PulsarCtl - Go based CLI admin tool - Pulsar Helm Chart - Kubernetes Operator - Tenant & Topic level configuration policies - Broker interceptors - Pluggable provider for tracing and extending broker capabilities - System Events/Topics - Change Event Streams