SlideShare a Scribd company logo
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Timothy Spann
Developer Advocate, StreamNative
Apache Pulsar
Development 101
with Python
Tim Spann
Developer Advocate
StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience
15+ years of experience with streaming
technologies including Pulsar, Flink, Spark, NiFi, Big
Data, Cloud, MXNet, IoT, Python and more.
Today, he helps to grow the Pulsar community
sharing rich technical knowledge and experience at
both global conferences and through individual
conversations.
https://streamnative.io/pulsar-python/
Example Sensor Device
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Python Application for ADS-B Data
Diagram
Python App REST CALL
LOGGING
ANALYTICS
SEND TO
PULSAR
https://github.com/tspannhw/FLiP-Py-ADS-B
Apache Pulsar Training
● Instructor-led courses
○ Pulsar Fundamentals
○ Pulsar Developers
○ Pulsar Operations
● On-demand learning with labs
● 300+ engineers, admins and architects trained!
StreamNative Academy
Now Available
FREE On-Demand
Pulsar Training
Academy.StreamNative.io
What is Apache Pulsar?
Unified
Messaging
Platform
Guaranteed
Message Delivery Resiliency Infinite
Scalability
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata Store
(ZK, RocksDB, etcd, …)
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that
producers use to transmit messages to
subscribed consumers.
● Messages belong to a topic and contain an
arbitrary payload.
● Brokers handle connections and routes
messages between producers /
consumers.
● Subscriptions are named configuration
rules that determine how messages are
delivered to consumers.
● Consumers receive messages.
Subscription Modes
Different subscription modes have
different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active consumers,
no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Messages - the Basic Unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT), RoP (RocketMQ)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages (Java,
Python, Go)
● Can leverage 3rd-party
libraries
Pulsar Functions
#!/usr/bin/env python
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
logger.info("Message Content: {0}".format(input))
msg_id = context.get_message_id()
row = { }
row['id'] = str(msg_id)
json_string = json.dumps(row)
return json_string
Entire Function
Pulsar
Functions
Function Mesh
Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting,
transforming, and outputting data.
Function Mesh, another StreamNative project, makes it easier for developers to create entire
applications built from sources, functions, and sinks all through a declarative API.
Function Execution
MQTT
On Pulsar
(MoP)
Kafka On
Pulsar (KoP)
Spark + Pulsar
https://pulsar.apache.org/docs/en/adaptors-spark/
val dfPulsar = spark.readStream.format("
pulsar")
.option("
service.url", "pulsar://pulsar1:6650")
.option("
admin.url", "http://pulsar1:8080
")
.option("
topic", "persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("
console")
.option("truncate", false).start()
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 3.2.0
/_/
Using Scala version 2.12.15
(OpenJDK 64-Bit Server VM, Java 11.0.11)
● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink?
SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Streaming FLiP-Py Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
CDC
Apps
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Python Functions
A serverless event streaming
framework
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
Python 3 Coding
Code Along With Tim
<<DEMO>>
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.9.1/apache-pulsar-2.9.1-bi
n.tar.gz
tar xvfz apache-pulsar-2.9.1-bin.tar.gz
cd apache-pulsar-2.9.1
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
<or> Run in StreamNative Cloud
Scan the QR code to earn
$200 in cloud credit
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conference
bin/pulsar-admin namespaces create conference/pythonweb
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conference
bin/pulsar-admin topics create persistent://conference/pythonweb/first
bin/pulsar-admin topics list conference/pythonweb
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.9.1[all]'
# Depending on Platform May Need to Build C++ Client
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
Building a Python 3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('persistent://conference/pythonweb/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
Building a Python 3 Cloud Producer Oath
python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t
persistent://public/default/pyth --auth-params
'{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json",
"audience":"urn:sn:pulsar:name:myclustr"}'
from pulsar import Client, AuthenticationOauth2
parse = argparse.ArgumentParser(prog=prod.py')
parse.add_argument('-su', '--service-url', dest='service_url', type=str,
required=True)
args = parse.parse_args()
client = pulsar.Client(args.service_url,
authentication=AuthenticationOauth2(args.auth_params))
https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
Example Avro Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import AvroSchema
class thermal(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
thermalschema = AvroSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/pi-thermal-avro',
schema=thermalschema,properties={"producer-name": "thrm" })
thermalRec = thermal()
thermalRec.uuid = "unique-name"
producer.send(thermalRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Thermal
Example Json Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import JsonSchema
class weather(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
wschema = JsonSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/weathe
r,schema=wschema,properties={"producer-name": "wthr" })
weatherRec = weather()
weatherRec.uuid = "unique-name"
producer.send(weatherRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Weather
Building a Python3 Consumer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer =
client.subscribe('persistent://conference/pythonweb/first',subscription_na
me='my-sub')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
MQTT from Python
pip3 install paho-mqtt
import paho.mqtt.client as mqtt
client = mqtt.Client("rpi4-iot")
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
client.connect("pulsar-server.com", 1883, 180)
client.publish("persistent://public/default/mqtt-2",
payload=json_string,qos=0,retain=True)
https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022
Web Sockets from Python
pip3 install websocket-client
import websocket, base64, json
topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/webtopic1'
ws = websocket.create_connection(topic)
message = "Hello Python Web Conference"
message_bytes = message.encode('ascii')
base64_bytes = base64.b64encode(message_bytes)
base64_message = base64_bytes.decode('ascii')
ws.send(json.dumps({'payload' : base64_message,'properties': {'device' :
'jetson2gb','protocol' : 'websockets'},'context' : 5}))
response = json.loads(ws.recv())
https://pulsar.apache.org/docs/en/client-libraries-websocket/
https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py
https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py
Kafka from Python
pip3 install kafka-python
from kafka import KafkaProducer
from kafka.errors import KafkaError
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3)
producer.send('topic-kafka-1', json.dumps(row).encode('utf-8'))
producer.flush()
https://github.com/streamnative/kop
https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts
Pulsar IO Functions in Python
https://github.com/tspannhw/pulsar-pychat-function
Pulsar IO Functions in Python
bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py
--classname "sentiment.Chat" --inputs "persistent://public/default/chat"
--log-topic "persistent://public/default/logs" --name Chat --output
"persistent://public/default/chatresult"
https://github.com/tspannhw/pulsar-pychat-function
Pulsar IO Functions in Python
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
msg_id = context.get_message_id()
fields = json.loads(input)
https://github.com/tspannhw/pulsar-pychat-function
Python For Pulsar on Pi
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
Let’s Keep in Touch
Tim Spann
Developer Advocate
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
https://streamnative.io/pulsar-python/
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Tim Spann
Thank you!
tim@streamnative.io
@PaaSDev
github.com/tspannhw

More Related Content

PDF
AWS ELB
PPTX
Introduction to GCP presentation
PPTX
A visual introduction to Apache Kafka
PDF
Oracle Cloud Infrastructure
PPTX
Apache kafka
PDF
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
PDF
An Introduction to Apache Kafka
PPT
Workshop: API Management
AWS ELB
Introduction to GCP presentation
A visual introduction to Apache Kafka
Oracle Cloud Infrastructure
Apache kafka
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pu...
An Introduction to Apache Kafka
Workshop: API Management

What's hot (20)

PDF
API Management Solution Powerpoint Presentation Slides
PPTX
Kafka presentation
PPTX
Introduction to Apache Kafka
PPT
Data power use cases
PDF
Design patterns for microservice architecture
PPTX
Introduction to Apache Kafka
PPTX
Apache HBase™
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
An Introduction to AWS
PDF
A Tour of Google Cloud Platform
PPTX
Kafka 101
PDF
Evolution of Cloud Computing
PPTX
How To Become A Cloud Engineer | Cloud Engineer Salary | Cloud Computing Engi...
PPTX
Aws Solution Architecture Associate - summary
PDF
Scalability, Availability & Stability Patterns
PPTX
PPTX
Introduction to snowflake
PDF
Introduction to Oracle Cloud Infrastructure Services
PPTX
AWS Simple Storage Service (s3)
PPTX
Azure Cloud PPT
API Management Solution Powerpoint Presentation Slides
Kafka presentation
Introduction to Apache Kafka
Data power use cases
Design patterns for microservice architecture
Introduction to Apache Kafka
Apache HBase™
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
An Introduction to AWS
A Tour of Google Cloud Platform
Kafka 101
Evolution of Cloud Computing
How To Become A Cloud Engineer | Cloud Engineer Salary | Cloud Computing Engi...
Aws Solution Architecture Associate - summary
Scalability, Availability & Stability Patterns
Introduction to snowflake
Introduction to Oracle Cloud Infrastructure Services
AWS Simple Storage Service (s3)
Azure Cloud PPT
Ad

Similar to Apache Pulsar Development 101 with Python (20)

PDF
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
Python web conference 2022 apache pulsar development 101 with python (f li-...
PDF
Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Timothy Spann: Apache Pulsar for ML
PDF
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
[March sn meetup] apache pulsar + apache nifi for cloud data lake
PDF
(Current22) Let's Monitor The Conditions at the Conference
PDF
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PDF
OSA Con 2022: Streaming Data Made Easy
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Deep Dive into Building Streaming Applications with Apache Pulsar
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann: Apache Pulsar for ML
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
[March sn meetup] apache pulsar + apache nifi for cloud data lake
(Current22) Let's Monitor The Conditions at the Conference
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ai tools demonstartion for schools and inter college
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Essential Infomation Tech presentation.pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
2025 Textile ERP Trends: SAP, Odoo & Oracle
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
Navsoft: AI-Powered Business Solutions & Custom Software Development
Understanding Forklifts - TECH EHS Solution
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Odoo Companies in India – Driving Business Transformation.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ai tools demonstartion for schools and inter college
Design an Analysis of Algorithms I-SECS-1021-03
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
wealthsignaloriginal-com-DS-text-... (1).pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Essential Infomation Tech presentation.pptx

Apache Pulsar Development 101 with Python

  • 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Timothy Spann Developer Advocate, StreamNative Apache Pulsar Development 101 with Python
  • 2. Tim Spann Developer Advocate StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations. https://streamnative.io/pulsar-python/
  • 4. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 5. Python Application for ADS-B Data Diagram Python App REST CALL LOGGING ANALYTICS SEND TO PULSAR https://github.com/tspannhw/FLiP-Py-ADS-B
  • 6. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! StreamNative Academy Now Available FREE On-Demand Pulsar Training Academy.StreamNative.io
  • 7. What is Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  • 8. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Store (ZK, RocksDB, etcd, …)
  • 9. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  • 10. Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 11. Messages - the Basic Unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 12. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT), RoP (RocketMQ) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 13. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries Pulsar Functions
  • 14. #!/usr/bin/env python from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() logger.info("Message Content: {0}".format(input)) msg_id = context.get_message_id() row = { } row['id'] = str(msg_id) json_string = json.dumps(row) return json_string Entire Function Pulsar Functions
  • 15. Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API.
  • 19. Spark + Pulsar https://pulsar.apache.org/docs/en/adaptors-spark/ val dfPulsar = spark.readStream.format(" pulsar") .option(" service.url", "pulsar://pulsar1:6650") .option(" admin.url", "http://pulsar1:8080 ") .option(" topic", "persistent://public/default/airquality").load() val pQuery = dfPulsar.selectExpr("*") .writeStream.format(" console") .option("truncate", false).start() ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_ version 3.2.0 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.11)
  • 20. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink?
  • 21. SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea
  • 22. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 23. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 24. Streaming FLiP-Py Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  • 25. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Python Functions A serverless event streaming framework
  • 26. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 27. Python 3 Coding Code Along With Tim <<DEMO>>
  • 28. Run a Local Standalone Bare Metal wget https://archive.apache.org/dist/pulsar/pulsar-2.9.1/apache-pulsar-2.9.1-bi n.tar.gz tar xvfz apache-pulsar-2.9.1-bin.tar.gz cd apache-pulsar-2.9.1 bin/pulsar standalone (For Pulsar SQL Support) bin/pulsar sql-worker start https://pulsar.apache.org/docs/en/standalone/
  • 29. <or> Run in StreamNative Cloud Scan the QR code to earn $200 in cloud credit
  • 30. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create conference bin/pulsar-admin namespaces create conference/pythonweb bin/pulsar-admin tenants list bin/pulsar-admin namespaces list conference bin/pulsar-admin topics create persistent://conference/pythonweb/first bin/pulsar-admin topics list conference/pythonweb
  • 31. Install Python 3 Pulsar Client pip3 install pulsar-client=='2.9.1[all]' # Depending on Platform May Need to Build C++ Client For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi https://pulsar.apache.org/docs/en/client-libraries-python/
  • 32. Building a Python 3 Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer = client.create_producer('persistent://conference/pythonweb/first') producer.send(('Simple Text Message').encode('utf-8')) client.close()
  • 33. Building a Python 3 Cloud Producer Oath python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t persistent://public/default/pyth --auth-params '{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json", "audience":"urn:sn:pulsar:name:myclustr"}' from pulsar import Client, AuthenticationOauth2 parse = argparse.ArgumentParser(prog=prod.py') parse.add_argument('-su', '--service-url', dest='service_url', type=str, required=True) args = parse.parse_args() client = pulsar.Client(args.service_url, authentication=AuthenticationOauth2(args.auth_params)) https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
  • 34. Example Avro Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import AvroSchema class thermal(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') thermalschema = AvroSchema(thermal) producer = client.create_producer(topic='persistent://public/default/pi-thermal-avro', schema=thermalschema,properties={"producer-name": "thrm" }) thermalRec = thermal() thermalRec.uuid = "unique-name" producer.send(thermalRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Thermal
  • 35. Example Json Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import JsonSchema class weather(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') wschema = JsonSchema(thermal) producer = client.create_producer(topic='persistent://public/default/weathe r,schema=wschema,properties={"producer-name": "wthr" }) weatherRec = weather() weatherRec.uuid = "unique-name" producer.send(weatherRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Weather
  • 36. Building a Python3 Consumer import pulsar client = pulsar.Client('pulsar://localhost:6650') consumer = client.subscribe('persistent://conference/pythonweb/first',subscription_na me='my-sub') while True: msg = consumer.receive() print("Received message: '%s'" % msg.data()) consumer.acknowledge(msg) client.close()
  • 37. MQTT from Python pip3 install paho-mqtt import paho.mqtt.client as mqtt client = mqtt.Client("rpi4-iot") row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() client.connect("pulsar-server.com", 1883, 180) client.publish("persistent://public/default/mqtt-2", payload=json_string,qos=0,retain=True) https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022
  • 38. Web Sockets from Python pip3 install websocket-client import websocket, base64, json topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/webtopic1' ws = websocket.create_connection(topic) message = "Hello Python Web Conference" message_bytes = message.encode('ascii') base64_bytes = base64.b64encode(message_bytes) base64_message = base64_bytes.decode('ascii') ws.send(json.dumps({'payload' : base64_message,'properties': {'device' : 'jetson2gb','protocol' : 'websockets'},'context' : 5})) response = json.loads(ws.recv()) https://pulsar.apache.org/docs/en/client-libraries-websocket/ https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py
  • 39. Kafka from Python pip3 install kafka-python from kafka import KafkaProducer from kafka.errors import KafkaError row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3) producer.send('topic-kafka-1', json.dumps(row).encode('utf-8')) producer.flush() https://github.com/streamnative/kop https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts
  • 40. Pulsar IO Functions in Python https://github.com/tspannhw/pulsar-pychat-function
  • 41. Pulsar IO Functions in Python bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py --classname "sentiment.Chat" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult" https://github.com/tspannhw/pulsar-pychat-function
  • 42. Pulsar IO Functions in Python from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() msg_id = context.get_message_id() fields = json.loads(input) https://github.com/tspannhw/pulsar-pychat-function
  • 43. Python For Pulsar on Pi ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101
  • 44. Let’s Keep in Touch Tim Spann Developer Advocate @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw https://streamnative.io/pulsar-python/
  • 45. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Tim Spann Thank you! tim@streamnative.io @PaaSDev github.com/tspannhw