SlideShare a Scribd company logo
Adding Generative AI to
Real-Time Streaming
Pipelines
Tim Spann
Principal Developer Advocate
Nov-2024
2
AGENDA
Introduction
Overview
GenAI Architecture
Streaming Projects
Demos
Resources
Q&A
3
Tim Spann
Twitter: @PaasDev // Blog: datainmotion.dev
Principal Developer Advocate / Field Engineer
NY AI Meetups
ex-Pivotal, ex-Cloudera, ex-StreamNative,
ex-PwC, ex-HPE, ex-E&Y.
https://medium.com/@tspann
https://github.com/tspannhw
4
This week in Apache NiFi, Apache Flink,
Apache Kafka, ML, AI, Apache Spark, Apache
Iceberg, Python, Java, LLM, GenAI, Vector
DB and Open Source friends.
https://bit.ly/32dAJft
https://www.meetup.com/futureofdata-
princeton/
AI + Streaming Weekly by Tim Spann
The challenge of Unstructured Data
● Problem: Unstructured data comes in lots of forms, no easy
way to interact with it all
● Solution: Vector embeddings
● How: Neural networks e.g. embedding models
Vector
Databases
Unstructured Data is Everywhere
Unstructured data is any data that does not conform
to a predefined data model.
Currently, 90% of unstructured data is never
analyzed.
Images Videos and
more!
Text
Image from Nvidia
Vector Search Overview
How Similarity Search Works
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
Real-Time Pipelines Can Help
External Context Ingest
Ingesting, routing, clean, enrich, transforming,
parsing, chunking and vectorizing structured,
unstructured, semistructured, binary data and
documents
Prompt engineering
Crafting and structuring queries to optimize
LLM responses
Context Retrieval
Enhancing LLM with external context such as
Retrieval Augmented Generation (RAG)
Roundtrip Interface
Act as a Discord, REST, Kafka, SQL, Slack bot to
roundtrip discussions
https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
NiFi 2.0.0 Features
● Python Integration
● Parameters
● JDK 21+
● JSON Flow Serialization
● Rules Engine for Development
Assistance
● Run Process Group as Stateless
● flow.json.gz
https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
Extract Company Names
● Python 3.10+
● Hugging Face, NLP, SpaCY, PyTorch
https://github.com/tspannhw/FLaNK-python-ExtractCompanyName-processor
CaptionImage
● Python 3.10+
● Hugging Face
● Salesforce/blip-image-captioning-large
● Generate Captions for Images
● Adds captions to FlowFile Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
RESNetImageClassification
● Python 3.10+
● Hugging Face
● Transformers
● Pytorch
● Datasets
● microsoft/resnet-50
● Adds classification label to FlowFile
Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
NSFWImageDetection
● Python 3.10+
● Hugging Face
● Transformers
● Falconsai/nsfw_image_detection
● Adds normal and nsfw to FlowFile
Attributes
● Gives score on safety of image
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
FacialEmotionsImageDetection
● Python 3.10+
● Hugging Face
● Transformers
● facial_emotions_image_detection
● Image Classification
● Adds labels/scores to FlowFile Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
Let’s do a metamorphosis on your data. Don’t fear changing data.
You don’t need to be a brilliant writer to stream
data.
Franz Kafka was a German-speaking
Bohemian novelist and short-story writer,
widely regarded as one of the major figures of
20th-century literature. His work fuses
elements of realism and the fantastic.
Wikipedia
YES, FRANZ, IT’S KAFKA
Open Source Edition
•Apache NiFi in Docker
•Try new features
quickly
•Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv
vgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
● NiFi 1.25 and NiFi 2.0.0-M2
https://hub.docker.com/r/apache/nifi
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47
Raspberry Pi AI Kit Hailo
Edge AI
https://medium.com/@tspann/from-the-edge-to-the-cloud-and-back-again-01095e95a783
Raspberry Pi AI Kit Hailo
Edge AI Pose Estimation
https://medium.com/cloudera-inc/streaming-street-cams-to-yolo-v8-with-python-and-nifi-to-minio-s3-3277e73723ce
Street Cameras

More Related Content

PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
DSSML24_tspann_CodelessGenerativeAIPipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
28March2024-Codeless-Generative-AI-Pipelines
PDF
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PPTX
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
PDF
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
28March2024-Codeless-Generative-AI-Pipelines
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Similar to TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines (20)

PDF
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
PPTX
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
PPTX
10 Big Data Technologies you Didn't Know About
PDF
Thamme Gowda's Summer2016- NASA JPL Internship
PDF
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
PDF
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
PDF
Powering tensor flow with big data using apache beam, flink, and spark cern...
PDF
Cracking the nut, solving edge ai with apache tools and frameworks
PPTX
Simplifying training deep and serving learning models with big data in python...
PDF
Scalable OCR with NiFi and Tesseract
PDF
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
TensorFlow 101
PDF
Apache Deep Learning 201 - Barcelona DWS March 2019
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Apache Deep Learning 201 - Philly Open Source
PDF
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
PPTX
Tensorflow a brief introduction (1).pptx
PDF
Exascale Deep Learning for Climate Analytics
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
10 Big Data Technologies you Didn't Know About
Thamme Gowda's Summer2016- NASA JPL Internship
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Powering tensor flow with big data using apache beam, flink, and spark cern...
Cracking the nut, solving edge ai with apache tools and frameworks
Simplifying training deep and serving learning models with big data in python...
Scalable OCR with NiFi and Tesseract
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
TensorFlow 101
Apache Deep Learning 201 - Barcelona DWS March 2019
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Apache Deep Learning 201 - Philly Open Source
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Tensorflow a brief introduction (1).pptx
Exascale Deep Learning for Climate Analytics
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
PDF
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
PDF
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
PDF
09-18-2024 NYC Meetup Vector Databases 102
PDF
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
PDF
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
PDF
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
09-25-2024 NJX Venture Summit Introduction to Unstructured Data
09-19-2024 AI Camp Hybrid Seach - Milvus for Vector Database
09-18-2024 NYC Meetup Vector Databases 102
09-26-2024 Conf 42 Kube Native: Unleashing the Potential of Cloud Native Open...
09-12-2024 - Milvus, Vector database used for Sensor Data RAG
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
Ad

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Business Ppt On Nestle.pptx huunnnhhgfvu
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data_Analytics_and_PowerBI_Presentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Clinical guidelines as a resource for EBP(1).pdf
Business Acumen Training GuidePresentation.pptx
Moving the Public Sector (Government) to a Digital Adoption
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines