SlideShare a Scribd company logo
© 2023 Cloudera, Inc. All rights reserved.
Unlocking Financial Data with
Real-Time Pipelines
Tim Spann
Principal Developer Advocate
14-December-2023
© 2023 Cloudera, Inc. All rights reserved.
© 2023 Cloudera, Inc. All rights reserved. 3
Introduction
Overview
Apache Kafka
Apache Flink
Apache NiFi
Use Cases
Demos
Agenda (30 minutes)
© 2023 Cloudera, Inc. All rights reserved. 4
This week in Apache NiFi, Apache Flink,
Apache Kafka, ML, AI, Apache Spark, Apache
Iceberg, Python, Java and Open Source
friends.
https://bit.ly/32dAJft
https://www.meetup.com/futureofdata-
princeton/
FLaNK Stack Weekly by Tim Spann
© 2023 Cloudera, Inc. All rights reserved. 5
Confidential—Restricted
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
Future of Data - NYC + NJ + Philly + Virtual
© 2023 Cloudera, Inc. All rights reserved. 6
Tim Spann
Twitter: @PaasDev // Blog: datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://medium.com/@tspann
https://github.com/tspannhw
© 2023 Cloudera, Inc. All rights reserved. 7
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However,
managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have
emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In th
talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with
confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processin
falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of
real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data
processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in
the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabiliti
in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and
transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-leve
metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory
reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and
compliance in real-time pipelines. c. Scalability and performance optimization techniques.
Conclusion: In this talk, we will demonstrate the power of combining Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to unlock financial data's true potentia
Attendees will gain insights into how these technologies can empower financial institutions to make informed decisions, respond to market changes swiftly, and
comply with regulations effectively. Join us to explore the world of real-time data pipelines and revolutionize financial data management.
© 2023 Cloudera, Inc. All rights reserved.
OVERVIEW
© 2023 Cloudera, Inc. All rights reserved. 9
DATA VELOCITY in FINANCIAL SERVICES
Streaming capabilities vary, all enhance insight
Transaction
Data
Core Banking
Risk
Data
Behavioral
Data
Cyber
Market Data
News
Feeds
Customer
Data
Connected
Devices/
Wearables
Chat Bots
Normal
Streaming
Regulatory,
Compliance
Near-Real
Time
Streaming
Real-Time
Streaming
Social Media
© 2023 Cloudera, Inc. All rights reserved. 10
NEXT GEN PLATFORM FOR TACKLING FINANCIAL CRIME
Leveraging data and analytics across the enterprise from the Edge to AI.
Ingest
Streaming
Data
BANKING DATA
Data Flow (CDF)
Data Science
Workbench
FINANCIAL CRIME APPLICATIONS
Data
Scientists
Data
Processing
Data Engineering Data Warehouse
Operational
DB
Catalog | Schema | Security | Governance
Business
Analysts
EDGE DATA
ALTERNATIVE DATA
Enterprise Data Store
ML
Cyber Security AML
Fraud Surveillance
Analytical
Tools
BI and
Visualization
Ingest
Data at
Rest
Deploy Models
Ingest Stream
or Batch Data
Teams
speaking
the same
language
ENTERPRISE DATA
TRADING DATA
`
Ingest
1
2
3
4
11
© 2023 Cloudera, Inc. All rights reserved.
Kafka & Flink (Flink SQL with Stream SQL Builder) for real time analytics
Kafka
Kafka topics
Database
Machine
learning
Flink SQL
w/ SSB
Data Warehouse Data Viz
Monitoring
Alerting
F
in
a
n
c
e
D
a
t
a
Architecture in the context of Financial Use Cases
DataFlow / NiFi
© 2023 Cloudera, Inc. All rights reserved. 12
NIFI MEETS AI
Vector DB
AI Model
Unstructured file types
Data in Motion
With Cloudera
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
© 2023 Cloudera, Inc. All rights reserved. 13
Transactions
Data
Account
Data
Device Logs
Business Event
Logic
Data
Lakehouse
Flagged
Records
Continuous
Results
Fraud Analyst
defined suspicious transaction
Real-Time
Fraud Scoring
Freeze
transaction
Fraud
Monitoring
Dashboard
Stop Fraud When It Happens—Real Life Example
Simplified example of deployed use case
DATA RELEVANCE
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
© 2023 Cloudera, Inc. All rights reserved.
APACHE KAFKA
© 2023 Cloudera, Inc. All rights reserved.
© 2019 Cloudera, Inc. All rights reserved. 16
STREAMS MESSAGING WITH KAFKA
• Highly reliable distributed messaging system.
• Decouple applications, enables many-to-many
patterns.
• Publish-Subscribe semantics.
• Horizontal scalability.
• Efficient implementation to operate at speed with
big data volumes.
• Organized by topic to support several use cases.
© 2023 Cloudera, Inc. All rights reserved.
APACHE FLINK
© 2023 Cloudera, Inc. All rights reserved. 18
CONTINUOUS SQL
● SSB is a Continuous SQL engine
● It’s SQL, but a slightly different mental model, but with big implications
Traditional Parse/Execute/Fetch model Continuous SQL Model
Hint: The query is boundless and never finishes, and time matters
AKA: SELECT * FROM foo WHERE 1=0 -- will run forever
19
© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
© 2023 Cloudera, Inc. All rights reserved. 20
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
© 2023 Cloudera, Inc. All rights reserved.
APACHE NIFI
© 2023 Cloudera, Inc. All rights reserved. 22
Confidential—Restricted
NiFi 2.0.0-M1 is here… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
- First-class citizen Python API
- Rules Engine
- NiFi Stateless at Process Group level
- Java 21 (virtual threads, perf improvements, etc)
https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c
Closing the gap between data engineers and data scientists…
- Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot
- Scrape the internet (Sitemap) to build the knowledge base powering your chatbot
- Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
© 2023 Cloudera, Inc. All rights reserved. 23
PROVENANCE
© 2023 Cloudera, Inc. All rights reserved.
DEMO
I Can Haz Data?
© 2023 Cloudera, Inc. All rights reserved. 25
Cloudera Stream
Processing
Community Edition
● Zero to Flink in less than an hour
○ Experiment with features
○ Develop apps locally
● One docker compose file of CSP which includes:
○ All dependencies required to run
○ Kafka, Kafka Connect and Flink
○ Streams Messaging Manager
○ Schema Registry
○ SQL Stream Builder
● Licensed under the Cloudera Community License
● Community Group Hub (Discussion Forum) for CSP
● Find it on docs.cloudera.com under Applications
© 2023 Cloudera, Inc. All rights reserved.
Open Source Edition
• Apache NiFi in
Docker
• Runs in Docker
• Try new features
quickly
• Develop
applications locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
© 2023 Cloudera, Inc. All rights reserved.
https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
© 2023 Cloudera, Inc. All rights reserved. 28
https://events.dzone.com/dzone/Data-Pipelines-Investigating-the-Modern-Day-Stack
29
TH N Y U

More Related Content

PDF
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
PDF
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
PDF
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
PDF
BigDataFest_ Building Modern Data Streaming Apps
PDF
big data fest building modern data streaming apps
PDF
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
PDF
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
BigDataFest_ Building Modern Data Streaming Apps
big data fest building modern data streaming apps
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf

Similar to OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines (20)

PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
Meetup - Brasil - Data In Motion - 2023 September 19
PDF
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
PDF
BigDataFest Building Modern Data Streaming Apps
PDF
Building Real-Time Travel Alerts
PDF
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
PDF
Meetup: Streaming Data Pipeline Development
PDF
AIDEVDAY_ Data-in-Motion to Supercharge AI
PDF
Real-time Streaming Pipelines with FLaNK
PDF
Continus sql with sql stream builder
PDF
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
PDF
The Never Landing Stream with HTAP and Streaming
PDF
Apache Kafka® Use Cases for Financial Services
PPTX
HDF Powered by Apache NiFi Introduction
PDF
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
PDF
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
PDF
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
BigDataFest Building Modern Data Streaming Apps
Building Real-Time Travel Alerts
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
GSJUG: Mastering Data Streaming Pipelines 09May2023
Meetup: Streaming Data Pipeline Development
AIDEVDAY_ Data-in-Motion to Supercharge AI
Real-time Streaming Pipelines with FLaNK
Continus sql with sql stream builder
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
The Never Landing Stream with HTAP and Streaming
Apache Kafka® Use Cases for Financial Services
HDF Powered by Apache NiFi Introduction
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Real time stock processing with apache nifi, apache flink and apache kafka
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Ad

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Foundation of Data Science unit number two notes
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PPTX
Database Infoormation System (DBIS).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Computer network topology notes for revision
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Foundation of Data Science unit number two notes
.pdf is not working space design for the following data for the following dat...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
Database Infoormation System (DBIS).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Computer network topology notes for revision
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

  • 1. © 2023 Cloudera, Inc. All rights reserved. Unlocking Financial Data with Real-Time Pipelines Tim Spann Principal Developer Advocate 14-December-2023
  • 2. © 2023 Cloudera, Inc. All rights reserved.
  • 3. © 2023 Cloudera, Inc. All rights reserved. 3 Introduction Overview Apache Kafka Apache Flink Apache NiFi Use Cases Demos Agenda (30 minutes)
  • 4. © 2023 Cloudera, Inc. All rights reserved. 4 This week in Apache NiFi, Apache Flink, Apache Kafka, ML, AI, Apache Spark, Apache Iceberg, Python, Java and Open Source friends. https://bit.ly/32dAJft https://www.meetup.com/futureofdata- princeton/ FLaNK Stack Weekly by Tim Spann
  • 5. © 2023 Cloudera, Inc. All rights reserved. 5 Confidential—Restricted @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ... Future of Data - NYC + NJ + Philly + Virtual
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 Tim Spann Twitter: @PaasDev // Blog: datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw
  • 7. © 2023 Cloudera, Inc. All rights reserved. 7 Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In th talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence. Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processin falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. Key Points to be Covered: Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing. Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data. Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabiliti in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers. Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources. Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-leve metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions. Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies. Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability and performance optimization techniques. Conclusion: In this talk, we will demonstrate the power of combining Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to unlock financial data's true potentia Attendees will gain insights into how these technologies can empower financial institutions to make informed decisions, respond to market changes swiftly, and comply with regulations effectively. Join us to explore the world of real-time data pipelines and revolutionize financial data management.
  • 8. © 2023 Cloudera, Inc. All rights reserved. OVERVIEW
  • 9. © 2023 Cloudera, Inc. All rights reserved. 9 DATA VELOCITY in FINANCIAL SERVICES Streaming capabilities vary, all enhance insight Transaction Data Core Banking Risk Data Behavioral Data Cyber Market Data News Feeds Customer Data Connected Devices/ Wearables Chat Bots Normal Streaming Regulatory, Compliance Near-Real Time Streaming Real-Time Streaming Social Media
  • 10. © 2023 Cloudera, Inc. All rights reserved. 10 NEXT GEN PLATFORM FOR TACKLING FINANCIAL CRIME Leveraging data and analytics across the enterprise from the Edge to AI. Ingest Streaming Data BANKING DATA Data Flow (CDF) Data Science Workbench FINANCIAL CRIME APPLICATIONS Data Scientists Data Processing Data Engineering Data Warehouse Operational DB Catalog | Schema | Security | Governance Business Analysts EDGE DATA ALTERNATIVE DATA Enterprise Data Store ML Cyber Security AML Fraud Surveillance Analytical Tools BI and Visualization Ingest Data at Rest Deploy Models Ingest Stream or Batch Data Teams speaking the same language ENTERPRISE DATA TRADING DATA ` Ingest 1 2 3 4
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. Kafka & Flink (Flink SQL with Stream SQL Builder) for real time analytics Kafka Kafka topics Database Machine learning Flink SQL w/ SSB Data Warehouse Data Viz Monitoring Alerting F in a n c e D a t a Architecture in the context of Financial Use Cases DataFlow / NiFi
  • 12. © 2023 Cloudera, Inc. All rights reserved. 12 NIFI MEETS AI Vector DB AI Model Unstructured file types Data in Motion With Cloudera Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams
  • 13. © 2023 Cloudera, Inc. All rights reserved. 13 Transactions Data Account Data Device Logs Business Event Logic Data Lakehouse Flagged Records Continuous Results Fraud Analyst defined suspicious transaction Real-Time Fraud Scoring Freeze transaction Fraud Monitoring Dashboard Stop Fraud When It Happens—Real Life Example Simplified example of deployed use case DATA RELEVANCE
  • 15. © 2023 Cloudera, Inc. All rights reserved. APACHE KAFKA
  • 16. © 2023 Cloudera, Inc. All rights reserved. © 2019 Cloudera, Inc. All rights reserved. 16 STREAMS MESSAGING WITH KAFKA • Highly reliable distributed messaging system. • Decouple applications, enables many-to-many patterns. • Publish-Subscribe semantics. • Horizontal scalability. • Efficient implementation to operate at speed with big data volumes. • Organized by topic to support several use cases.
  • 17. © 2023 Cloudera, Inc. All rights reserved. APACHE FLINK
  • 18. © 2023 Cloudera, Inc. All rights reserved. 18 CONTINUOUS SQL ● SSB is a Continuous SQL engine ● It’s SQL, but a slightly different mental model, but with big implications Traditional Parse/Execute/Fetch model Continuous SQL Model Hint: The query is boundless and never finishes, and time matters AKA: SELECT * FROM foo WHERE 1=0 -- will run forever
  • 19. 19 © 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 20. © 2023 Cloudera, Inc. All rights reserved. 20 SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 21. © 2023 Cloudera, Inc. All rights reserved. APACHE NIFI
  • 22. © 2023 Cloudera, Inc. All rights reserved. 22 Confidential—Restricted NiFi 2.0.0-M1 is here… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 - First-class citizen Python API - Rules Engine - NiFi Stateless at Process Group level - Java 21 (virtual threads, perf improvements, etc) https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c Closing the gap between data engineers and data scientists… - Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot - Scrape the internet (Sitemap) to build the knowledge base powering your chatbot - Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
  • 23. © 2023 Cloudera, Inc. All rights reserved. 23 PROVENANCE
  • 24. © 2023 Cloudera, Inc. All rights reserved. DEMO I Can Haz Data?
  • 25. © 2023 Cloudera, Inc. All rights reserved. 25 Cloudera Stream Processing Community Edition ● Zero to Flink in less than an hour ○ Experiment with features ○ Develop apps locally ● One docker compose file of CSP which includes: ○ All dependencies required to run ○ Kafka, Kafka Connect and Flink ○ Streams Messaging Manager ○ Schema Registry ○ SQL Stream Builder ● Licensed under the Cloudera Community License ● Community Group Hub (Discussion Forum) for CSP ● Find it on docs.cloudera.com under Applications
  • 26. © 2023 Cloudera, Inc. All rights reserved. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 27. © 2023 Cloudera, Inc. All rights reserved. https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
  • 28. © 2023 Cloudera, Inc. All rights reserved. 28 https://events.dzone.com/dzone/Data-Pipelines-Investigating-the-Modern-Day-Stack