SlideShare a Scribd company logo
BUILD ML ENHANCED
EVENT STREAMING
APPLICATIONS WITH JAVA
MICROSERVICES
Tim Spann | Developer Advocate
● Introduction
● What is Apache Pulsar?
● Function
● Apache NiFi
● Continuous ETL Flink
● Demo
● Q&A
Tim Spann
Developer Advocate
Tim Spann
Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big
Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience
at both global conferences and through individual conversations.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata
Storage
Pulsar Cluster
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
Sources, Sinks and Processing
● Lightweight computation similar
to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Java Functions
A serverless event
streaming framework
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
https://www.influxdata.com/integration/mqtt-monitoring/
https://www.influxdata.com/integration/mqtt-monitoring/
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Apache NiFi Basics
Apache NiFi - Apache Pulsar Connector
● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink
Apache Flink Job Dashboard
[AI Dev World 2022] Build ML Enhanced Event Streaming
NLP Streaming Architecture
Apache Pulsar Training
● Instructor-led courses
○ Pulsar Fundamentals
○ Pulsar Developers
○ Pulsar Operations
● On-demand learning with labs
● 300+ engineers, admins and
architects trained!
Now Available
On-Demand
Pulsar Training
Academy.StreamNative.io
StreamNative Academy
● https://github.com/tspannhw/pulsar-pychat-function
● https://streamnative.io/apache-nifi-connector/
● https://nightlies.apache.org/flink/flink-docs-master/docs/conne
ctors/datastream/pulsar/
● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-o
n-streamnative-cloud
● https://github.com/streamnative/flink-example
● https://pulsar.apache.org/docs/en/adaptors-spark/
Apache Pulsar Links
Deploying AI With an
Event-Driven
Platform
https://dzone.com/trendreports/enterprise-ai-1
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
Let’s Keep in Touch

More Related Content

PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
PDF
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
PDF
Music city data Hail Hydrate! from stream to lake
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PDF
Citizen Streaming Engineer - A How To
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
PDF
Serverless Event Streaming Applications as Functions on K8
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Music city data Hail Hydrate! from stream to lake
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Citizen Streaming Engineer - A How To
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Serverless Event Streaming Applications as Functions on K8
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022

Similar to [AI Dev World 2022] Build ML Enhanced Event Streaming (20)

PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
ApacheCon2022_Citizen Streaming Engineer - A How To
PDF
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
PDF
Apache Pulsar Development 101 with Python
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Hail hydrate! from stream to lake using open source
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
PDF
[March sn meetup] apache pulsar + apache nifi for cloud data lake
PDF
The Next Generation of Streaming
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
Using the flipn stack for edge ai (flink, nifi, pulsar)
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
PDF
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
PDF
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Using FLiP with influxdb for edgeai iot at scale 2022
ApacheCon2022_Citizen Streaming Engineer - A How To
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Apache Pulsar Development 101 with Python
bigdata 2022_ FLiP Into Pulsar Apps
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Cloud lunch and learn real-time streaming in azure
Hail hydrate! from stream to lake using open source
DBCC 2021 - FLiP Stack for Cloud Data Lakes
[March sn meetup] apache pulsar + apache nifi for cloud data lake
The Next Generation of Streaming
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Building an Event Streaming Architecture with Apache Pulsar
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
System and Network Administraation Chapter 3
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administration Chapter 2
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Digital Strategies for Manufacturing Companies
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPT
Introduction Database Management System for Course Database
Nekopoi APK 2025 free lastest update
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Softaken Excel to vCard Converter Software.pdf
ISO 45001 Occupational Health and Safety Management System
System and Network Administraation Chapter 3
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administration Chapter 2
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo POS Development Services by CandidRoot Solutions
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Digital Strategies for Manufacturing Companies
How to Choose the Right IT Partner for Your Business in Malaysia
How to Migrate SBCGlobal Email to Yahoo Easily
ManageIQ - Sprint 268 Review - Slide Deck
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 2 - PM Management and IT Context
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Introduction Database Management System for Course Database

[AI Dev World 2022] Build ML Enhanced Event Streaming

  • 1. BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH JAVA MICROSERVICES Tim Spann | Developer Advocate
  • 2. ● Introduction ● What is Apache Pulsar? ● Function ● Apache NiFi ● Continuous ETL Flink ● Demo ● Q&A
  • 3. Tim Spann Developer Advocate Tim Spann Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 5. Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 6. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Storage Pulsar Cluster
  • 7. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  • 10. • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) Sources, Sinks and Processing
  • 11. ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions A serverless event streaming framework Pulsar Functions
  • 12. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 13. https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 components • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Apache NiFi Basics
  • 14. Apache NiFi - Apache Pulsar Connector
  • 15. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink
  • 16. Apache Flink Job Dashboard
  • 19. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! Now Available On-Demand Pulsar Training Academy.StreamNative.io StreamNative Academy
  • 20. ● https://github.com/tspannhw/pulsar-pychat-function ● https://streamnative.io/apache-nifi-connector/ ● https://nightlies.apache.org/flink/flink-docs-master/docs/conne ctors/datastream/pulsar/ ● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-o n-streamnative-cloud ● https://github.com/streamnative/flink-example ● https://pulsar.apache.org/docs/en/adaptors-spark/ Apache Pulsar Links
  • 21. Deploying AI With an Event-Driven Platform https://dzone.com/trendreports/enterprise-ai-1