SlideShare a Scribd company logo
© 2023 Cloudera, Inc. All rights reserved.
Data-in-Motion to
Supercharge AI
Tim Spann
Principal Developer Advocate
23-August-2023
© 2023 Cloudera, Inc. All rights reserved. 2
Tim Spann
@PaasDev www.datainmotion.dev
github.com/tspannhw medium.com/@tspann
Principal Developer Advocate
Princeton Future of Data Meetup
ex-Pivotal, ex-Hortonworks, ex-StreamNative,
ex-PwC, ex-EY, ex-HPE.
Apache NiFi x Apache Kafka x Apache Flink x AI
© 2023 Cloudera, Inc. All rights reserved. 3
REAL-TIME REQUIRES A PLATFORM
SQL
Stream
Builder
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 4
Cloudera + LLMs
Knowledge Repository
Data Storage / Management
Data Preparation
Data Engineering
LLM Fine Tuning Process
Training Framework
LLM Serving
Serving Framework
Key:
CPU Task
GPU Task
CML
CDE
CDP
Vector DB
CDF
Streaming Classification
Real-Time Model Deployment
© 2023 Cloudera, Inc. All rights reserved.
Run collection and streaming on any cloud, server, container, bare metal, device or VM
Data Sources Cloudera Data
Flow
Cloudera
Streaming
Analytics
Cloudera
Streams
Processing
Kafka
Lake House
INGEST
© 2023 Cloudera, Inc. All rights reserved.
ENRICH
© 2023 Cloudera, Inc. All rights reserved.
FUNNEL
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
© 2023 Cloudera, Inc. All rights reserved.
DISTRIBUTE
DEPLOY
https://github.com/tspannhw/FLaNK-Edge-Models
© 2023 Cloudera, Inc. All rights reserved.
STORE
© 2019 Cloudera, Inc. All rights reserved. 10
APACHE NIFI WITH PYTHON CUSTOM PROCESSORS
Python as a First Class Citizen
https://github.com/apache/nifi/blob/614947e4ac6798ad80817e82514c39349d5faacb/nifi-docs/src/main/
asciidoc/python-developer-guide.adoc
11
© 2023 Cloudera, Inc. All rights reserved.
Future of Data - Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python, Java,
AI, ML, LLM and Open Source friends.
https://bit.ly/32dAJft
© 2023 Cloudera, Inc. All rights reserved. 13
CSP
Community
Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv
vgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
© 2023 Cloudera, Inc. All rights reserved. 15
Resources
threads.net/@tspannhw

More Related Content

PDF
Introduction to Apache NiFi 1.11.4
PDF
Deploying Flink on Kubernetes - David Anderson
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Apache Spark Core – Practical Optimization
PDF
Spark and S3 with Ryan Blue
PDF
Apache Nifi Crash Course
PDF
Comparing Accumulo, Cassandra, and HBase
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Introduction to Apache NiFi 1.11.4
Deploying Flink on Kubernetes - David Anderson
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Apache Spark Core – Practical Optimization
Spark and S3 with Ryan Blue
Apache Nifi Crash Course
Comparing Accumulo, Cassandra, and HBase
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...

What's hot (20)

PDF
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
PDF
Apache NiFi Meetup - Princeton NJ 2016
PDF
Apache Kafka
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Blockchain and Apache NiFi
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
PPTX
Apache Flink Training: System Overview
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PDF
Nifi workshop
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Changelog Stream Processing with Apache Flink
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
PDF
Data ingestion and distribution with apache NiFi
PDF
State of the Trino Project
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache NiFi Meetup - Princeton NJ 2016
Apache Kafka
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Blockchain and Apache NiFi
Building an Event Streaming Architecture with Apache Pulsar
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Apache Flink Training: System Overview
Dynamic Rule-based Real-time Market Data Alerts
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Nifi workshop
Evening out the uneven: dealing with skew in Flink
Changelog Stream Processing with Apache Flink
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Introduction to Apache NiFi dws19 DWS - DC 2019
APACHE KAFKA / Kafka Connect / Kafka Streams
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Data ingestion and distribution with apache NiFi
State of the Trino Project
Ad

Similar to AIDEVDAY_ Data-in-Motion to Supercharge AI (20)

PDF
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
PDF
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
PDF
Building Real-Time Travel Alerts
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
PDF
Flink sql for continuous sql etl apps & Apache NiFi devops
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
PDF
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
PDF
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
PDF
Meetup Streaming Data Pipeline Development
PDF
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
PDF
Introduction to Apache NiFi 1.10
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
PDF
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
PDF
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
PDF
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
PDF
BigDataFest Building Modern Data Streaming Apps
PDF
BigDataFest_ Building Modern Data Streaming Apps
PDF
big data fest building modern data streaming apps
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-Time Travel Alerts
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Flink sql for continuous sql etl apps & Apache NiFi devops
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Meetup Streaming Data Pipeline Development
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Introduction to Apache NiFi 1.10
GSJUG: Mastering Data Streaming Pipelines 09May2023
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
BigDataFest Building Modern Data Streaming Apps
BigDataFest_ Building Modern Data Streaming Apps
big data fest building modern data streaming apps
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PDF
top salesforce developer skills in 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPT
Introduction Database Management System for Course Database
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
L1 - Introduction to python Backend.pptx
PDF
medical staffing services at VALiNTRY
PPTX
ai tools demonstartion for schools and inter college
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
AI in Product Development-omnex systems
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Online Work Permit System for Fast Permit Processing
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
top salesforce developer skills in 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction Database Management System for Course Database
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Navsoft: AI-Powered Business Solutions & Custom Software Development
Wondershare Filmora 15 Crack With Activation Key [2025
L1 - Introduction to python Backend.pptx
medical staffing services at VALiNTRY
ai tools demonstartion for schools and inter college
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
AI in Product Development-omnex systems
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Online Work Permit System for Fast Permit Processing
How Creative Agencies Leverage Project Management Software.pdf
ManageIQ - Sprint 268 Review - Slide Deck
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Which alternative to Crystal Reports is best for small or large businesses.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

AIDEVDAY_ Data-in-Motion to Supercharge AI

  • 1. © 2023 Cloudera, Inc. All rights reserved. Data-in-Motion to Supercharge AI Tim Spann Principal Developer Advocate 23-August-2023
  • 2. © 2023 Cloudera, Inc. All rights reserved. 2 Tim Spann @PaasDev www.datainmotion.dev github.com/tspannhw medium.com/@tspann Principal Developer Advocate Princeton Future of Data Meetup ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC, ex-EY, ex-HPE. Apache NiFi x Apache Kafka x Apache Flink x AI
  • 3. © 2023 Cloudera, Inc. All rights reserved. 3 REAL-TIME REQUIRES A PLATFORM SQL Stream Builder
  • 4. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 4 Cloudera + LLMs Knowledge Repository Data Storage / Management Data Preparation Data Engineering LLM Fine Tuning Process Training Framework LLM Serving Serving Framework Key: CPU Task GPU Task CML CDE CDP Vector DB CDF Streaming Classification Real-Time Model Deployment
  • 5. © 2023 Cloudera, Inc. All rights reserved. Run collection and streaming on any cloud, server, container, bare metal, device or VM Data Sources Cloudera Data Flow Cloudera Streaming Analytics Cloudera Streams Processing Kafka Lake House INGEST
  • 6. © 2023 Cloudera, Inc. All rights reserved. ENRICH
  • 7. © 2023 Cloudera, Inc. All rights reserved. FUNNEL https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis https://github.com/tspannhw/FLaNK-LLM
  • 8. © 2023 Cloudera, Inc. All rights reserved. DISTRIBUTE DEPLOY https://github.com/tspannhw/FLaNK-Edge-Models
  • 9. © 2023 Cloudera, Inc. All rights reserved. STORE
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 APACHE NIFI WITH PYTHON CUSTOM PROCESSORS Python as a First Class Citizen https://github.com/apache/nifi/blob/614947e4ac6798ad80817e82514c39349d5faacb/nifi-docs/src/main/ asciidoc/python-developer-guide.adoc
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. Future of Data - Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 12. FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java, AI, ML, LLM and Open Source friends. https://bit.ly/32dAJft
  • 13. © 2023 Cloudera, Inc. All rights reserved. 13 CSP Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 14. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv vgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 15. © 2023 Cloudera, Inc. All rights reserved. 15 Resources threads.net/@tspannhw