SlideShare a Scribd company logo
Pulsar Virtual Summit Europe 2021
Interactive Analytics on
Pulsar with Pulsar SQL
Axel Sirota
AI and Coud Consultant
@AxelSirota
Who am I?
QR to my Pluralsight
courses
QR to my O’Reilly
trainings
–Microsoft Certified Trainer
–Author, Instructor and Editor at Pluralsight, O’Reilly
Media, and Develop Intelligence
–AI and Cloud Consultant
Pulsar Virtual Summit Europe 2021
Catalogue
• A Simple Scenario
• Inspecting and Debugging Topics with Pulsar SQL
• Interactive Analytics
Pulsar Virtual Summit Europe 2021
Catalogue
• A Simple Scenario
Pulsar Virtual Summit Europe 2021
Ann
a,28
,$50
Application Instance Pulsar Deployment
File Source
Pulsar Function
Ingress topic
Processed
topic
Pulsar Virtual Summit Europe 2021
1. You check the status on the Pulsar Function and there
are some exceptions
2. And you haven’t set a log topic for each Pulsar function
(at least it happened to us)
3. You don’t want downtime to debug locally
Some issues appear…
What can you do?
Pulsar Virtual Summit Europe 2021
Catalogue
• Inspecting and Debugging Topics with Pulsar SQL
Pulsar Virtual Summit Europe 2021
Pulsar SQL enhances the Pulsar Presto connector to query
topics interactively
One can check every message that passed through the
topic easily and in a safe manner
It is lightweight, simple, enables high concurrent access,
and you can reuse existing Presto clusters
Introducing… Pulsar SQL
Pulsar Virtual Summit Europe 2021
BookKeeper
Pulsar Broker Presto
Bookie 1 Bookie 2 Bookie 3
Presto
Connector
content page
Configuration file
Specify where are the zookeepers
and brokers
connector.name=pulsar
pulsar.broker-service-url=https://my-pulsar-
deployment.com
pulsar.zookeeper-uri=https://my-pulsar-
deployment.com:2181
Put in
conf/presto/catalog/pulsar.proper
ties
content page
Two commands and magic
Start the worker inside the Presto
cluster
->./bin/pulsar sql-worker start
Running in 6896
content page
Two commands and magic
->./bin/pulsar sql
presto>
Start the console
So simple, yet so powerful!
Pulsar Virtual Summit Europe 2021
The Full Architecture
Pulsar Virtual Summit Europe 2021
1. Validate schemas in a readable SQL format
2. Easily debug bad messages that make Pulsar Functions
fail unexpectedly
3. Leverage SQL tools and queries for analytics
But… why should we use it?
What can you do?
Pulsar Virtual Summit Europe 2021
Catalogue
• Interactive Analytics
Pulsar Virtual Summit Europe 2021
Equivalence
Pulsar Presto
Namespaces Schemas
Topics Tables
Fields Columns
Unserialized
message
__value__
Pulsar Virtual Summit Europe 2021
presto> show columns from pulsar."public/default"."voo";
Column | Type | Extra | Comment
-------------------+-----------+-------+-----------------------------------------------------------------------------
__value__ | varchar | | The value of the message with primitive type schema
__partition__ | integer | | The partition number which the message belongs to
__event_time__ | timestamp | | Application defined timestamp in milliseconds of when the event occurred
__publish_time__ | timestamp | | The timestamp in milliseconds of when event as published
__message_id__ | varchar | | The message ID of the message used to generate this row
__sequence_id__ | bigint | | The sequence ID of the message used to generate this row
__producer_name__ | varchar | | The name of the producer that publish the message used to generate this row
__key__ | varchar | | The partition key for the topic
__properties__ | varchar | | User defined properties
(9 rows)
Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic without Schema in public/pulsar-summit
SELECT * from “public/pulsar-summit”.metrics
__value__
2021-09-13,12
2021-09-14,9
2021-09-15,15
Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
Date Metric
2021-09-13 12
2021-09-14 9
2021-09-15 15
SELECT * from “public/pulsar-summit”.metrics
Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
SELECT count(1) from “public/pulsar-summit”.metrics where Metric > 10
Count
3
2021-10-15, 120
Pulsar Virtual Summit Europe 2021
2021-09-13, 12 2021-09-14, 9 2021-09-15, 15
metrics topic with Schema in public/pulsar-summit (Date, Metric)
Select as month(Date) as month, SUM(Metric) as agg_metric
from “public/pulsar-summit”.metrics
group by 1, order by 2 DESC
Month agg_metric
10 120
9 36
2021-10-15, 120
Pulsar Virtual Summit Europe 2021
1. Interactively debug topics without open subscriptions
2. Audit who send each message, when, where, what did it
send, and how much it took
3. Do analytics on the messages flowing through Pulsar
If you need to…
Then Pulsar SQL is what you look for!
And all of this without affecting production performance
Pulsar Virtual Summit Europe 2021
Thanks!!
Questions?
Axel Sirota
AI and Coud Consultant
@AxelSirota

More Related Content

PDF
Tracking Apache Pulsar Messages with Apache SkyWalking - Pulsar Virtual Summi...
PPTX
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
PDF
Serverless Event Streaming with Pulsar Functions
PDF
Apache Pulsar at Yahoo! Japan
PDF
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
PDF
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
PDF
Open keynote_carolyn&matteo&sijie
ODP
Introduction to Apache Kafka- Part 1
Tracking Apache Pulsar Messages with Apache SkyWalking - Pulsar Virtual Summi...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
Serverless Event Streaming with Pulsar Functions
Apache Pulsar at Yahoo! Japan
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Open keynote_carolyn&matteo&sijie
Introduction to Apache Kafka- Part 1

What's hot (20)

PDF
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
PDF
Kafka and Spark Streaming
PDF
Real time cloud native open source streaming of any data to apache solr
PDF
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
PDF
Getting Pulsar Spinning_Addison Higham
PDF
StreamNative FLiP into scylladb - scylla summit 2022
PDF
Integrating Apache Pulsar with Big Data Ecosystem
PPTX
Kafka connect-london-meetup-2016
PDF
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
PDF
Building event streaming pipelines using Apache Pulsar
PPTX
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
PPTX
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
PPTX
Cloud streaming presentation
PDF
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
PPTX
Architecture of a Kafka camus infrastructure
PDF
Interactive querying of streams using Apache Pulsar_Jerry peng
PDF
Big data conference europe real-time streaming in any and all clouds, hybri...
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Kafka and Spark Streaming
Real time cloud native open source streaming of any data to apache solr
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
Getting Pulsar Spinning_Addison Higham
StreamNative FLiP into scylladb - scylla summit 2022
Integrating Apache Pulsar with Big Data Ecosystem
Kafka connect-london-meetup-2016
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Function Mesh: Complex Streaming Jobs Made Simple - Pulsar Summit NA 2021
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Building event streaming pipelines using Apache Pulsar
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
Cloud streaming presentation
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Architecture of a Kafka camus infrastructure
Interactive querying of streams using Apache Pulsar_Jerry peng
Big data conference europe real-time streaming in any and all clouds, hybri...
Ad

Similar to Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europe 2021 (20)

PPTX
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
PDF
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
PPS
Sql Nexus
PPT
Spectra Cx V3.2 Webcast 19 May 2010
PDF
Openobject bi
PPTX
Generating Code with Oracle SQL Developer Data Modeler
PPTX
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
PDF
Apache spark 2.4 and beyond
PDF
Allan_John_R_Salgado-MCSD.NET, MCTS,MCPD-Resume(LinkedIn)
PDF
Getting Started with Apache Spark on Kubernetes
PDF
NoSQL and MySQL: News about JSON
PPTX
Apache Pulsar: A Foundation Backbone for Clever Cloud - Pulsar Virtual Summit...
PDF
MuleSoft Manchester Meetup #3 slides 31st March 2020
PPT
Using AWR for SQL Analysis
PDF
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
PDF
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
PPTX
Confoo 2021 -- MySQL New Features
PDF
NEW TOP FEATURES COMING TO SALESFORCE RELEASE WINTER 23 RELEASE BY NBSCONSULTING
PDF
UDP Report
PDF
Monitoring Cloud Native Applications with Prometheus
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Sql Nexus
Spectra Cx V3.2 Webcast 19 May 2010
Openobject bi
Generating Code with Oracle SQL Developer Data Modeler
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Apache spark 2.4 and beyond
Allan_John_R_Salgado-MCSD.NET, MCTS,MCPD-Resume(LinkedIn)
Getting Started with Apache Spark on Kubernetes
NoSQL and MySQL: News about JSON
Apache Pulsar: A Foundation Backbone for Clever Cloud - Pulsar Virtual Summit...
MuleSoft Manchester Meetup #3 slides 31st March 2020
Using AWR for SQL Analysis
EclipseCon 2016 - OCCIware : one Cloud API to rule them all
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open Wide
Confoo 2021 -- MySQL New Features
NEW TOP FEATURES COMING TO SALESFORCE RELEASE WINTER 23 RELEASE BY NBSCONSULTING
UDP Report
Monitoring Cloud Native Applications with Prometheus
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.

Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europe 2021

  • 1. Pulsar Virtual Summit Europe 2021 Interactive Analytics on Pulsar with Pulsar SQL Axel Sirota AI and Coud Consultant @AxelSirota
  • 2. Who am I? QR to my Pluralsight courses QR to my O’Reilly trainings –Microsoft Certified Trainer –Author, Instructor and Editor at Pluralsight, O’Reilly Media, and Develop Intelligence –AI and Cloud Consultant
  • 3. Pulsar Virtual Summit Europe 2021 Catalogue • A Simple Scenario • Inspecting and Debugging Topics with Pulsar SQL • Interactive Analytics
  • 4. Pulsar Virtual Summit Europe 2021 Catalogue • A Simple Scenario
  • 5. Pulsar Virtual Summit Europe 2021 Ann a,28 ,$50 Application Instance Pulsar Deployment File Source Pulsar Function Ingress topic Processed topic
  • 6. Pulsar Virtual Summit Europe 2021 1. You check the status on the Pulsar Function and there are some exceptions 2. And you haven’t set a log topic for each Pulsar function (at least it happened to us) 3. You don’t want downtime to debug locally Some issues appear… What can you do?
  • 7. Pulsar Virtual Summit Europe 2021 Catalogue • Inspecting and Debugging Topics with Pulsar SQL
  • 8. Pulsar Virtual Summit Europe 2021 Pulsar SQL enhances the Pulsar Presto connector to query topics interactively One can check every message that passed through the topic easily and in a safe manner It is lightweight, simple, enables high concurrent access, and you can reuse existing Presto clusters Introducing… Pulsar SQL
  • 9. Pulsar Virtual Summit Europe 2021 BookKeeper Pulsar Broker Presto Bookie 1 Bookie 2 Bookie 3 Presto Connector
  • 10. content page Configuration file Specify where are the zookeepers and brokers connector.name=pulsar pulsar.broker-service-url=https://my-pulsar- deployment.com pulsar.zookeeper-uri=https://my-pulsar- deployment.com:2181 Put in conf/presto/catalog/pulsar.proper ties
  • 11. content page Two commands and magic Start the worker inside the Presto cluster ->./bin/pulsar sql-worker start Running in 6896
  • 12. content page Two commands and magic ->./bin/pulsar sql presto> Start the console So simple, yet so powerful!
  • 13. Pulsar Virtual Summit Europe 2021 The Full Architecture
  • 14. Pulsar Virtual Summit Europe 2021 1. Validate schemas in a readable SQL format 2. Easily debug bad messages that make Pulsar Functions fail unexpectedly 3. Leverage SQL tools and queries for analytics But… why should we use it? What can you do?
  • 15. Pulsar Virtual Summit Europe 2021 Catalogue • Interactive Analytics
  • 16. Pulsar Virtual Summit Europe 2021 Equivalence Pulsar Presto Namespaces Schemas Topics Tables Fields Columns Unserialized message __value__
  • 17. Pulsar Virtual Summit Europe 2021 presto> show columns from pulsar."public/default"."voo"; Column | Type | Extra | Comment -------------------+-----------+-------+----------------------------------------------------------------------------- __value__ | varchar | | The value of the message with primitive type schema __partition__ | integer | | The partition number which the message belongs to __event_time__ | timestamp | | Application defined timestamp in milliseconds of when the event occurred __publish_time__ | timestamp | | The timestamp in milliseconds of when event as published __message_id__ | varchar | | The message ID of the message used to generate this row __sequence_id__ | bigint | | The sequence ID of the message used to generate this row __producer_name__ | varchar | | The name of the producer that publish the message used to generate this row __key__ | varchar | | The partition key for the topic __properties__ | varchar | | User defined properties (9 rows)
  • 18. Pulsar Virtual Summit Europe 2021 2021-09-13, 12 2021-09-14, 9 2021-09-15, 15 metrics topic without Schema in public/pulsar-summit SELECT * from “public/pulsar-summit”.metrics __value__ 2021-09-13,12 2021-09-14,9 2021-09-15,15
  • 19. Pulsar Virtual Summit Europe 2021 2021-09-13, 12 2021-09-14, 9 2021-09-15, 15 metrics topic with Schema in public/pulsar-summit (Date, Metric) Date Metric 2021-09-13 12 2021-09-14 9 2021-09-15 15 SELECT * from “public/pulsar-summit”.metrics
  • 20. Pulsar Virtual Summit Europe 2021 2021-09-13, 12 2021-09-14, 9 2021-09-15, 15 metrics topic with Schema in public/pulsar-summit (Date, Metric) SELECT count(1) from “public/pulsar-summit”.metrics where Metric > 10 Count 3 2021-10-15, 120
  • 21. Pulsar Virtual Summit Europe 2021 2021-09-13, 12 2021-09-14, 9 2021-09-15, 15 metrics topic with Schema in public/pulsar-summit (Date, Metric) Select as month(Date) as month, SUM(Metric) as agg_metric from “public/pulsar-summit”.metrics group by 1, order by 2 DESC Month agg_metric 10 120 9 36 2021-10-15, 120
  • 22. Pulsar Virtual Summit Europe 2021 1. Interactively debug topics without open subscriptions 2. Audit who send each message, when, where, what did it send, and how much it took 3. Do analytics on the messages flowing through Pulsar If you need to… Then Pulsar SQL is what you look for! And all of this without affecting production performance
  • 23. Pulsar Virtual Summit Europe 2021 Thanks!! Questions? Axel Sirota AI and Coud Consultant @AxelSirota