SlideShare a Scribd company logo
Building Modern Data
Streaming Apps with
Python
Tim Spann
Developer Advocate
2
Tim Spann
Developer
Advocate
at StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience:
● 15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Python and more.
● Today, he helps to grow the Pulsar community sharing rich
technical knowledge and experience at both global
conferences and through individual conversations.
https://bit.ly/32dAJft
FLiP Stack Weekly
This week in Apache Flink, Apache
Pulsar, Apache NiFi, Apache Spark and
open source friends.
4
Building
Real-Time
Requires a Team
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
Cloud native with decoupled
storage and compute layers.
Built-in compatibility with your
existing code and messaging
infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management
and oversight.
Elastic horizontal and vertical
scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model
supports a wide array of use cases.
Compatible with the tools you use
to store, analyze, and process data.
Pulsar Features
Messages - the basic unit of Pulsar
7
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Integrated Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
8
DevOps: Pulsar Shell
https://pulsar.apache.org/docs/next/administration-pulsar-shell/
Welcome to Pulsar shell!
Service URL: pulsar://localhost:6650/
Admin URL: http://localhost:8080/
Type help to get started or try the autocompletion (TAB button).
Type exit or quit to end the shell session.
default(localhost)>
9
The FliPN kitten crosses the stream
4 ways with Apache Pulsar
10
Kafka on Pulsar (KoP)
11
Data Offloaders
(Tiered Storage)
Client Libraries
StreamNative Pulsar ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
… and more!
… and more!
13
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.
from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
fields = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(fields["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(fields["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
ML Function
14
Starting a Function - Distributed Cluster
Once compiled into a JAR, start a Pulsar Function in a distributed cluster:
15
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create meetup
bin/pulsar-admin namespaces create meetup/newjersey
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list meetup
bin/pulsar-admin topics create persistent://meetup/newjersey/first
bin/pulsar-admin topics list meetup/newjersey
16
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.9.1[all]'
# Depending on Platform May Need C++ Client Built
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
Building a Python3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer
client.create_producer('persistent://conf/ete/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t
persistent://public/default/pyth --auth-params
'{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json",
"audience":"urn:sn:pulsar:name:myclustr"}'
from pulsar import Client, AuthenticationOauth2
parse = argparse.ArgumentParser(prog=prod.py')
parse.add_argument('-su', '--service-url', dest='service_url', type=str,
required=True)
args = parse.parse_args()
client = pulsar.Client(args.service_url,
authentication=AuthenticationOauth2(args.auth_params))
https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
Producer with OAuth to Cloud
import pulsar
from pulsar.schema import *
from pulsar.schema import AvroSchema
class thermal(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
thermalschema = AvroSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/pi-thermal-avro',
schema=thermalschema,properties={"producer-name": "thrm" })
thermalRec = thermal()
thermalRec.uuid = "unique-name"
producer.send(thermalRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Thermal
Example Avro Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import JsonSchema
class weather(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
wsc = JsonSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro
perties={"producer-name": "wthr" })
weatherRec = weather()
weatherRec.uuid = "unique-name"
producer.send(weatherRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Weather
https://github.com/tspannhw/FLiP-PulsarDevPython101
Example JSON Schema Usage
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer =
client.subscribe('persistent://conf/ete/first',subscription_name='mine')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
Building a Python Producer
pip3 install paho-mqtt
import paho.mqtt.client as mqtt
client = mqtt.Client("rpi4-iot")
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
client.connect("pulsar-server.com", 1883, 180)
client.publish("persistent://public/default/mqtt-2",
payload=json_string,qos=0,retain=True)
https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022
Sending MQTT Messages
pip3 install websocket-client
import websocket, base64, json
topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1'
ws = websocket.create_connection(topic)
message = "Hello Philly ETE Conference"
message_bytes = message.encode('ascii')
base64_bytes = base64.b64encode(message_bytes)
base64_message = base64_bytes.decode('ascii')
ws.send(json.dumps({'payload' : base64_message,'properties': {'device' :
'macbook'},'context' : 5}))
response = json.loads(ws.recv())
https://pulsar.apache.org/docs/en/client-libraries-websocket/
https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py
https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py
Sending Websocket Messages
pip3 install kafka-python
from kafka import KafkaProducer
from kafka.errors import KafkaError
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3)
producer.send('topic-kafka-1', json.dumps(row).encode('utf-8'))
producer.flush()
https://github.com/streamnative/kop
https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts
Sending Kafka Messages
bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py
--classname "sentiment.Chat" --inputs "persistent://public/default/chat"
--log-topic "persistent://public/default/logs" --name Chat --output
"persistent://public/default/chatresult"
https://github.com/tspannhw/pulsar-pychat-function
DevOps: Deploying Functions
27
Apache
Pulsar
in Action
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
https://streamnative.io/pulsar-python/
28
Tim Spann
Developer Advocate
at StreamNative

More Related Content

PDF
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
ODP
Stream processing using Kafka
PPTX
Introduction to Azure Machine Learning
PDF
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
PDF
Cloud & Data Center Networking
PPTX
Netflix Data Pipeline With Kafka
PPTX
Apache Pinot Meetup Sept02, 2020
PDF
Data Pipelines with Apache Kafka
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
Stream processing using Kafka
Introduction to Azure Machine Learning
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Cloud & Data Center Networking
Netflix Data Pipeline With Kafka
Apache Pinot Meetup Sept02, 2020
Data Pipelines with Apache Kafka

What's hot (20)

PDF
Machine Learning with PyCarent + MLflow
PDF
Azure Arc by K.Narisorn // Azure Multi-Cloud
PDF
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
PDF
Error Handling with Kafka: From Patterns to Code
PDF
Introduction to Microsoft Fabric.pdf
PDF
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PPTX
Kafka Intro With Simple Java Producer Consumers
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
PDF
Processing edges on apache giraph
PPTX
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PPTX
Innovation anywhere with microsoft azure arc
PDF
Hello, kafka! (an introduction to apache kafka)
PPTX
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PPTX
Microsoft Fabric Introduction
Machine Learning with PyCarent + MLflow
Azure Arc by K.Narisorn // Azure Multi-Cloud
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Error Handling with Kafka: From Patterns to Code
Introduction to Microsoft Fabric.pdf
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Kafka Intro With Simple Java Producer Consumers
Dataflow Management From Edge to Core with Apache NiFi
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Processing edges on apache giraph
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Architect’s Open-Source Guide for a Data Mesh Architecture
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Innovation anywhere with microsoft azure arc
Hello, kafka! (an introduction to apache kafka)
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Simplifying Real-Time Architectures for IoT with Apache Kudu
Hadoop Summit Tokyo Apache NiFi Crash Course
Microsoft Fabric Introduction
Ad

Similar to Building Modern Data Streaming Apps with Python (20)

PDF
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
PDF
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
PDF
Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
PDF
Python web conference 2022 apache pulsar development 101 with python (f li-...
PDF
Fast Streaming into Clickhouse with Apache Pulsar
PDF
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PDF
The Next Generation of Streaming
PDF
Apache Pulsar Development 101 with Python
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
PDF
[March sn meetup] apache pulsar + apache nifi for cloud data lake
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Real time cloud native open source streaming of any data to apache solr
PDF
Sql bits apache nifi 101 Introduction and best practices
PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
Serverless Event Streaming Applications as Functionson K8
PDF
DevNexus: Apache Pulsar Development 101 with Java
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python web conference 2022 apache pulsar development 101 with python (f li-...
Fast Streaming into Clickhouse with Apache Pulsar
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
The Next Generation of Streaming
Apache Pulsar Development 101 with Python
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Real time cloud native open source streaming of any data to apache solr
Sql bits apache nifi 101 Introduction and best practices
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Serverless Event Streaming Applications as Functionson K8
DevNexus: Apache Pulsar Development 101 with Java
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Building Modern Data Streaming Apps with Python

  • 1. Building Modern Data Streaming Apps with Python Tim Spann Developer Advocate
  • 2. 2 Tim Spann Developer Advocate at StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 3. https://bit.ly/32dAJft FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  • 5. Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 6. Cloud native with decoupled storage and compute layers. Built-in compatibility with your existing code and messaging infrastructure. Geographic redundancy and high availability included. Centralized cluster management and oversight. Elastic horizontal and vertical scalability. Seamless and instant partitioning rebalancing with no downtime. Flexible subscription model supports a wide array of use cases. Compatible with the tools you use to store, analyze, and process data. Pulsar Features
  • 7. Messages - the basic unit of Pulsar 7 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 8. Integrated Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers 8
  • 9. DevOps: Pulsar Shell https://pulsar.apache.org/docs/next/administration-pulsar-shell/ Welcome to Pulsar shell! Service URL: pulsar://localhost:6650/ Admin URL: http://localhost:8080/ Type help to get started or try the autocompletion (TAB button). Type exit or quit to end the shell session. default(localhost)> 9
  • 10. The FliPN kitten crosses the stream 4 ways with Apache Pulsar 10
  • 11. Kafka on Pulsar (KoP) 11
  • 12. Data Offloaders (Tiered Storage) Client Libraries StreamNative Pulsar ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines … and more! … and more!
  • 13. 13 Pulsar Functions ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
  • 14. from pulsar import Function from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import json class Chat(Function): def __init__(self): pass def process(self, input, context): fields = json.loads(input) sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(fields["comment"]) row = { } row['id'] = str(msg_id) if ss['compound'] < 0.00: row['sentiment'] = 'Negative' else: row['sentiment'] = 'Positive' row['comment'] = str(fields["comment"]) json_string = json.dumps(row) return json_string Entire Function ML Function 14
  • 15. Starting a Function - Distributed Cluster Once compiled into a JAR, start a Pulsar Function in a distributed cluster: 15
  • 16. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create meetup bin/pulsar-admin namespaces create meetup/newjersey bin/pulsar-admin tenants list bin/pulsar-admin namespaces list meetup bin/pulsar-admin topics create persistent://meetup/newjersey/first bin/pulsar-admin topics list meetup/newjersey 16
  • 17. Install Python 3 Pulsar Client pip3 install pulsar-client=='2.9.1[all]' # Depending on Platform May Need C++ Client Built For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi https://pulsar.apache.org/docs/en/client-libraries-python/
  • 18. Building a Python3 Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer client.create_producer('persistent://conf/ete/first') producer.send(('Simple Text Message').encode('utf-8')) client.close()
  • 19. python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t persistent://public/default/pyth --auth-params '{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json", "audience":"urn:sn:pulsar:name:myclustr"}' from pulsar import Client, AuthenticationOauth2 parse = argparse.ArgumentParser(prog=prod.py') parse.add_argument('-su', '--service-url', dest='service_url', type=str, required=True) args = parse.parse_args() client = pulsar.Client(args.service_url, authentication=AuthenticationOauth2(args.auth_params)) https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py https://github.com/tspannhw/FLiP-Pi-BreakoutGarden Producer with OAuth to Cloud
  • 20. import pulsar from pulsar.schema import * from pulsar.schema import AvroSchema class thermal(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') thermalschema = AvroSchema(thermal) producer = client.create_producer(topic='persistent://public/default/pi-thermal-avro', schema=thermalschema,properties={"producer-name": "thrm" }) thermalRec = thermal() thermalRec.uuid = "unique-name" producer.send(thermalRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Thermal Example Avro Schema Usage
  • 21. import pulsar from pulsar.schema import * from pulsar.schema import JsonSchema class weather(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') wsc = JsonSchema(thermal) producer = client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro perties={"producer-name": "wthr" }) weatherRec = weather() weatherRec.uuid = "unique-name" producer.send(weatherRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Weather https://github.com/tspannhw/FLiP-PulsarDevPython101 Example JSON Schema Usage
  • 22. import pulsar client = pulsar.Client('pulsar://localhost:6650') consumer = client.subscribe('persistent://conf/ete/first',subscription_name='mine') while True: msg = consumer.receive() print("Received message: '%s'" % msg.data()) consumer.acknowledge(msg) client.close() Building a Python Producer
  • 23. pip3 install paho-mqtt import paho.mqtt.client as mqtt client = mqtt.Client("rpi4-iot") row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() client.connect("pulsar-server.com", 1883, 180) client.publish("persistent://public/default/mqtt-2", payload=json_string,qos=0,retain=True) https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022 Sending MQTT Messages
  • 24. pip3 install websocket-client import websocket, base64, json topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1' ws = websocket.create_connection(topic) message = "Hello Philly ETE Conference" message_bytes = message.encode('ascii') base64_bytes = base64.b64encode(message_bytes) base64_message = base64_bytes.decode('ascii') ws.send(json.dumps({'payload' : base64_message,'properties': {'device' : 'macbook'},'context' : 5})) response = json.loads(ws.recv()) https://pulsar.apache.org/docs/en/client-libraries-websocket/ https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py Sending Websocket Messages
  • 25. pip3 install kafka-python from kafka import KafkaProducer from kafka.errors import KafkaError row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3) producer.send('topic-kafka-1', json.dumps(row).encode('utf-8')) producer.flush() https://github.com/streamnative/kop https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts Sending Kafka Messages
  • 26. bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py --classname "sentiment.Chat" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult" https://github.com/tspannhw/pulsar-pychat-function DevOps: Deploying Functions