SlideShare a Scribd company logo
Using the FLiPN
Stack for Edge AI
Tim Spann | Developer Advocate
Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
4
Introducing the FLiPN stack which
combines Apache Flink, Apache NiFi,
Apache Pulsar and other Apache
tools to build fast applications for IoT,
AI, rapid ingest. FLiPN provides a
quick set of tools to build applications
at any scale for any streaming and
IoT use cases. Tools Apache Flink,
Apache Pulsar, Apache NiFi, MiNiFi,
Apache MXNet, DJL.AI References
5
● Learn how to build an end-to-end streaming edge app
● How to pull messages from Pulsar topics
● Building a data stream for IoT with NiFi
● Using Apache Flink + Apache Pulsar
● Using Apache Spark / Delta Lake / + Apache Pulsar
AGENDA
IoT DATA
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
● Apache Flink
● Apache Pulsar
● Pulsar Functions
● Apache NiFi
● Apache Spark
● Python, Java, Golang
FLIP(N)(S) STACK
https://streamnative.io/blog/engineering/2022-04-14-what-the-flip-is-the-flip-stack/
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
➔ Perform in Real-Time
➔ Process Events as They Happen
➔ Joining Streams with SQL
➔ Find Anomalies Immediately
➔ Ordering and Arrival Semantics
➔ Continuous Streams of Data
DATA STREAMING
Sensors <->
STREAMING EDGE APPS
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream
STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Sensors <->
Apps
WHAT IS APACHE PULSAR?
101
Unified
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Infinite
Scalability
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
PULSAR: UNIFIED MESSAGING + DATA
STREAMING
PULSAR: UNIFIED MESSAGING + DATA
STREAMING
.. and Streaming
Works best in situations where the order
of messages is important—for example,
data ingestion.
Kafka and Amazon Kinesis are examples
of messaging systems that use
streaming semantics for consuming
messages.
PULSAR PUB/SUB MODEL
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar, and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it.
Topic
Topic
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
UNIFIED
MESSAGING MODEL
Building
Microservices
Asynchronous
Communication
Building Real Time
Applications
Highly Resilient
Tiered storage
17
PULSAR BENEFITS
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an “ensemble”
to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
KEY PULSAR CONCEPTS: ARCHITECTURE
MetaData
Storage
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
MESSAGES - THE BASIC UNIT OF PULSAR
MULTI-TENANCY MODEL
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Topic URI structure: persistent://[tenant]/[namespace]/[topic]
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP On
Pulsar
(AoP)
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
SCHEMA REGISTRY
Schema Registry
schema-1
(value=Avro/Protobuf/JSON)
schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
APACHE PULSAR TO IO SINK
https://pulsar.apache.org/docs/en/io-overview/
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
PULSAR FUNCTIONS
PULSAR IO FUNCTIONS IN PYTHON
https://github.com/tspannhw/pulsar-pychat-function
● Apache Pulsar’s two-tier architecture separates the compute and storage
layers, and interact with one another over a TCP/IP connection. This allows us
to run the computing layer (Broker) on either Edge servers or IoT Gateway
devices.
● Pulsar’s serverless computing framework, know as Pulsar Functions, can run
inside the Broker as threads. Effectively “stretching” the data processing layer.
EDGE COMPUTING WITH PULSAR
● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a
thread pool. This framework can be used as the execution environment for ML
models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can
directly receive incoming data from the sensor hubs and store it in a topic.
BENEFITS OF EDGE COMPUTING WITH
PULSAR
PULSAR FUNCTION – 3RD PARTY LIBRARIES
● You can leverage 3rd
party
libraries within Pulsar Functions
● DeepLearning4J
● JPMML
● DJL.AI
● Keras
● Pulsar Functions are able to
support:
● A variety of ML model types.
● Models developed with
different languages and
toolkits
INTERACTION WITH MODEL SERVERS
● You can write an application in any language that works
with Apache Pulsar via libraries and calls any model server
(AWSLabs Multi-Model Server, Cloudera Data Science
Workbench Model Server and many others) via REST or
other protocols.
● Pulsar Functions can execute local models or call model
servers.
ML
• Visual Question and Answer
• NLP (Natural Language Processing)
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
• Content-based Recommendations
• Predictive Maintenance
• Fault Detection
• Fraud Detection
• Time-Series Predictions
• Naive Bayes
APACHE MXNET MODEL ZOO
• CaffeNet
• SqueezeNet v1.1
• Inception v3
• Single Shot Detection (SSD)
• VGG16
• VGG19
• ResidualNet 152
• LSTM
https://mxnet.apache.org/api/python/docs/api/gluon/model_zoo/index.html
WHAT IS APACHE NiFi?
APACHE NIFI PULSAR CONNECTOR
https://streamnative.io/apache-nifi-connector/
WHY APACHE NIFI?
https://www.influxdata.com/integration/mqtt-monitoring/
https://www.influxdata.com/integration/mqtt-monitoring/
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
APACHE NIFI PULSAR CONNECTOR
https://github.com/streamnative/pulsar-nifi-bundle
APACHE NIFI PULSAR CONNECTOR
APACHE NIFI PULSAR CONNECTOR
APACHE NIFI PULSAR CONNECTOR
WHAT IS APACHE FLINK?
● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
APACHE FLINK
https://flink.apache.org/2019/05/03/pulsar-flink.html
https://github.com/streamnative/pulsar-flink
https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on-
streamnative-cloud
FLINK + PULSAR
SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from
airquality group by parameterName, reportingArea
FLINK SQL
WHAT IS APACHE SPARK?
SPARK + PULSAR
https://pulsar.apache.org/docs/en/adaptors-spark/
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("console")
.option("truncate", false).start()
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 3.2.0
/_/
Using Scala version 2.12.15
(OpenJDK 64-Bit Server VM, Java 11.0.11)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
DEMO TIME
EDGE FLOW
EDGE FLOW EXAMPLE 2
● BUFFER
● BATCH
● ROUTE
● FILTER
● AGGREGATE
● ENRICH
● REPLICATE
● DEDUPE
● DECOUPLE
● DISTRIBUTE
MODEL CLASSIFICATION
https://medium.com/@tspann/building-a-simple-streaming-real-time-chat-app-7f3b2cf1561d
https://streamnative.io/blog/engineering/2022-03-17-streaming-real-time-chat-messages-into-scylla-with-apache-pulsar/
MORE RESOURCES
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
DEPLOYING AI WITH
AN EVENT-DRIVEN
PLATFORM
https://dzone.com/articles/deploying-ai-with-an-
event-driven-platform
FLIP STACK WEEKLY
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
FREE BOOK!
https://streamnative.io/download/manning
-ebook-apache-pulsar-in-action
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using
Pulsar’s unified messaging and
streaming platform.
Apache Pulsar Training
• Instructor-led courses
• Pulsar Fundamentals
• Pulsar Developers
• Pulsar Operations
• On-demand learning with labs
• 300+ engineers, admins and architects trained!
STREAMNATIVE ACADEMY
Now Available
On-Demand
Pulsar Training
Academy.StreamNative.i
o
Hosted by
Save Your Spot Now
Use code SUMMIT20
to get 20% off.
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Summit Highlights:
● 5 Keynotes
● 12 Breakout Sessions
● 1 Amazing Happy Hour
Speakers from:
Pulsar Summit
San Francisco
Sponsorship
Prospectus
Community Sponsorships Available
Help engage and connect the Apache Pulsar
community by becoming an official sponsor
for Pulsar Summit San Francisco 2022! Learn
more about the requirements and benefits of
becoming a community sponsor.
Hosted by
[Webinar]
Building Microservices
Watch Now Learn More
[Blog post]
Event-Driven Microservices
CODE
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
● https://github.com/tspannhw/nifi-djlsentimentanalysis-pr
ocessor
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw

More Related Content

PDF
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
PDF
(Current22) Let's Monitor The Conditions at the Conference
PDF
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
PDF
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
PDF
Timothy Spann: Apache Pulsar for ML
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
PDF
[March sn meetup] apache pulsar + apache nifi for cloud data lake
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
(Current22) Let's Monitor The Conditions at the Conference
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann: Apache Pulsar for ML
bigdata 2022_ FLiP Into Pulsar Apps
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
[March sn meetup] apache pulsar + apache nifi for cloud data lake

Similar to Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) (20)

PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
Serverless Event Streaming Applications as Functionson K8
PDF
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
Cloud lunch and learn real-time streaming in azure
PDF
[AI Dev World 2022] Build ML Enhanced Event Streaming
PDF
JConf.dev 2022 - Apache Pulsar Development 101 with Java
PDF
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
Sink Your Teeth into Streaming at Any Scale
PDF
Music city data Hail Hydrate! from stream to lake
PDF
OSA Con 2022: Streaming Data Made Easy
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
PDF
Python web conference 2022 apache pulsar development 101 with python (f li-...
PDF
Deep Dive into Building Streaming Applications with Apache Pulsar
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Serverless Event Streaming Applications as Functionson K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Cloud lunch and learn real-time streaming in azure
[AI Dev World 2022] Build ML Enhanced Event Streaming
JConf.dev 2022 - Apache Pulsar Development 101 with Java
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Building an Event Streaming Architecture with Apache Pulsar
Sink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Music city data Hail Hydrate! from stream to lake
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Deep Dive into Building Streaming Applications with Apache Pulsar
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Ad

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
medical staffing services at VALiNTRY
PDF
top salesforce developer skills in 2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Essential Infomation Tech presentation.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Introduction to Artificial Intelligence
PPTX
ai tools demonstartion for schools and inter college
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
System and Network Administraation Chapter 3
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
Understanding Forklifts - TECH EHS Solution
medical staffing services at VALiNTRY
top salesforce developer skills in 2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Reimagine Home Health with the Power of Agentic AI​
How to Choose the Right IT Partner for Your Business in Malaysia
Essential Infomation Tech presentation.pptx
L1 - Introduction to python Backend.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Odoo POS Development Services by CandidRoot Solutions
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Introduction to Artificial Intelligence
ai tools demonstartion for schools and inter college
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PTS Company Brochure 2025 (1).pdf.......
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
System and Network Administraation Chapter 3
VVF-Customer-Presentation2025-Ver1.9.pptx

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)

  • 1. Using the FLiPN Stack for Edge AI Tim Spann | Developer Advocate
  • 2. Tim Spann Developer Advocate Tim Spann, Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFI Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. 4 Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest. FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases. Tools Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI References
  • 5. 5 ● Learn how to build an end-to-end streaming edge app ● How to pull messages from Pulsar topics ● Building a data stream for IoT with NiFi ● Using Apache Flink + Apache Pulsar ● Using Apache Spark / Delta Lake / + Apache Pulsar AGENDA
  • 6. IoT DATA IoT Ingestion: High-volume streaming sources, sensors, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Other Sources: Transit data, news, twitter, status feeds, REST data, stock data and more.
  • 7. ● Apache Flink ● Apache Pulsar ● Pulsar Functions ● Apache NiFi ● Apache Spark ● Python, Java, Golang FLIP(N)(S) STACK https://streamnative.io/blog/engineering/2022-04-14-what-the-flip-is-the-flip-stack/
  • 9. ➔ Perform in Real-Time ➔ Process Events as They Happen ➔ Joining Streams with SQL ➔ Find Anomalies Immediately ➔ Ordering and Arrival Semantics ➔ Continuous Streams of Data DATA STREAMING
  • 10. Sensors <-> STREAMING EDGE APPS StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Sensors <-> Apps
  • 11. WHAT IS APACHE PULSAR?
  • 13. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. PULSAR: UNIFIED MESSAGING + DATA STREAMING
  • 14. PULSAR: UNIFIED MESSAGING + DATA STREAMING .. and Streaming Works best in situations where the order of messages is important—for example, data ingestion. Kafka and Amazon Kinesis are examples of messaging systems that use streaming semantics for consuming messages.
  • 15. PULSAR PUB/SUB MODEL Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar, and it handles all communication. Subscriber receives data from publisher and never directly interacts with it. Topic Topic
  • 16. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging UNIFIED MESSAGING MODEL
  • 18. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery KEY PULSAR CONCEPTS: ARCHITECTURE MetaData Storage
  • 19. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. MESSAGES - THE BASIC UNIT OF PULSAR
  • 20. MULTI-TENANCY MODEL Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster Topic URI structure: persistent://[tenant]/[namespace]/[topic]
  • 21. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 25. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  • 26. SCHEMA REGISTRY Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 27. APACHE PULSAR TO IO SINK https://pulsar.apache.org/docs/en/io-overview/
  • 29. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. PULSAR FUNCTIONS
  • 30. PULSAR IO FUNCTIONS IN PYTHON https://github.com/tspannhw/pulsar-pychat-function
  • 31. ● Apache Pulsar’s two-tier architecture separates the compute and storage layers, and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. EDGE COMPUTING WITH PULSAR
  • 32. ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. BENEFITS OF EDGE COMPUTING WITH PULSAR
  • 33. PULSAR FUNCTION – 3RD PARTY LIBRARIES ● You can leverage 3rd party libraries within Pulsar Functions ● DeepLearning4J ● JPMML ● DJL.AI ● Keras ● Pulsar Functions are able to support: ● A variety of ML model types. ● Models developed with different languages and toolkits
  • 34. INTERACTION WITH MODEL SERVERS ● You can write an application in any language that works with Apache Pulsar via libraries and calls any model server (AWSLabs Multi-Model Server, Cloudera Data Science Workbench Model Server and many others) via REST or other protocols. ● Pulsar Functions can execute local models or call model servers.
  • 35. ML • Visual Question and Answer • NLP (Natural Language Processing) • Sentiment Analysis • Text Classification • Named Entity Recognition • Content-based Recommendations • Predictive Maintenance • Fault Detection • Fraud Detection • Time-Series Predictions • Naive Bayes
  • 36. APACHE MXNET MODEL ZOO • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG16 • VGG19 • ResidualNet 152 • LSTM https://mxnet.apache.org/api/python/docs/api/gluon/model_zoo/index.html
  • 37. WHAT IS APACHE NiFi?
  • 38. APACHE NIFI PULSAR CONNECTOR https://streamnative.io/apache-nifi-connector/
  • 39. WHY APACHE NIFI? https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 40. APACHE NIFI PULSAR CONNECTOR https://github.com/streamnative/pulsar-nifi-bundle
  • 41. APACHE NIFI PULSAR CONNECTOR
  • 42. APACHE NIFI PULSAR CONNECTOR
  • 43. APACHE NIFI PULSAR CONNECTOR
  • 44. WHAT IS APACHE FLINK?
  • 45. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite APACHE FLINK
  • 47. SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea
  • 49. WHAT IS APACHE SPARK?
  • 50. SPARK + PULSAR https://pulsar.apache.org/docs/en/adaptors-spark/ val dfPulsar = spark.readStream.format("pulsar") .option("service.url", "pulsar://pulsar1:6650") .option("admin.url", "http://pulsar1:8080") .option("topic", "persistent://public/default/airquality").load() val pQuery = dfPulsar.selectExpr("*") .writeStream.format("console") .option("truncate", false).start() ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_ version 3.2.0 /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.11)
  • 57. ● BUFFER ● BATCH ● ROUTE ● FILTER ● AGGREGATE ● ENRICH ● REPLICATE ● DEDUPE ● DECOUPLE ● DISTRIBUTE
  • 62. DEPLOYING AI WITH AN EVENT-DRIVEN PLATFORM https://dzone.com/articles/deploying-ai-with-an- event-driven-platform
  • 63. FLIP STACK WEEKLY This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 65. Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 66. Apache Pulsar Training • Instructor-led courses • Pulsar Fundamentals • Pulsar Developers • Pulsar Operations • On-demand learning with labs • 300+ engineers, admins and architects trained! STREAMNATIVE ACADEMY Now Available On-Demand Pulsar Training Academy.StreamNative.i o
  • 67. Hosted by Save Your Spot Now Use code SUMMIT20 to get 20% off. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Summit Highlights: ● 5 Keynotes ● 12 Breakout Sessions ● 1 Amazing Happy Hour Speakers from:
  • 68. Pulsar Summit San Francisco Sponsorship Prospectus Community Sponsorships Available Help engage and connect the Apache Pulsar community by becoming an official sponsor for Pulsar Summit San Francisco 2022! Learn more about the requirements and benefits of becoming a community sponsor. Hosted by
  • 69. [Webinar] Building Microservices Watch Now Learn More [Blog post] Event-Driven Microservices
  • 70. CODE ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 ● https://github.com/tspannhw/nifi-djlsentimentanalysis-pr ocessor
  • 71. Let’s Keep in Touch! Tim Spann Developer Advocate PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw