SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Integrating Apache NiFi and Apache Flink
Feb 4th 2016
Bryan Bende – Member of Technical Staff
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Outline
• Introduction to NiFi
• NiFi Site-To-Site
• Flink + NiFi Integration
• Use Case Discussion
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About Me
• Member of Technical Staff at Hortonworks
• Apache NiFi Committer & PMC Member since June 2015
• Contributed NiFi + Flink Streaming Integration
• Twitter: @bbende / Blog: bryanbende.com
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introduction to Apache NiFi
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi
• Powerful and reliable system to process and
distribute data
• Directed graphs of data routing and transformation
• Web-based User Interface for creating, monitoring,
& controlling data flows
• Highly configurable - modify data flow at runtime,
dynamically prioritize data
• Data Provenance tracks data through entire
system
• Easily extensible through development of custom
components
[1] https://nifi.apache.org/
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Terminology
FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
Processor
• Performs the work, can access FlowFiles
Connection
• Links between processors
• Queues that can be dynamically prioritized
Process Group
• Set of processors and their connections
• Receive data via input ports, send data via output ports
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - User Interface
• Drag and drop processors to build a flow
• Start, stop, and configure components in real time
• View errors and corresponding error messages
• View statistics and health of data flow
• Create templates of common processor & connections
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Provenance
• Tracks data at each point as it flows
through the system
• Records, indexes, and makes
events available for display
• Handles fan-in/fan-out, i.e. merging
and splitting data
• View attributes and content at given
points in time
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Queue Prioritization
• Configure a prioritizer per
connection
• Determine what is important for your
data – time based, arrival order,
importance of a data set
• Funnel many connections down to a
single connection to prioritize across
data sets
• Develop your own prioritizer if
needed
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Extensibility
Built from the ground up with extensions in mind
Service-loader pattern for…
• Processors
• Controller Services
• Reporting Tasks
• Prioritizers
Extensions packaged as NiFi Archives (NARs)
• Deploy NiFi lib directory and restart
• Provides ClassLoader isolation
• Same model as standard components
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Site-To-Site
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Site-To-Site
• Direct communication between two NiFi instances
• Push to Input Port on receiver, or Pull from Output Port on source
• Communicate between clusters, standalone instances, or both
• Handles load balancing and reliable delivery
• Secure connections using certificates (optional)
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Push
• Source connects Remote Process Group to Input Port on destination
• Site-To-Site takes care of load balancing across the nodes in the cluster
NCM
Node 1
Input Port
Node 2
Input Port
Standalone NiFi
RPG
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Pull
• Destination connects Remote Process Group to Output Port on the source
• If source was a cluster, each node would pull from each node in cluster
NCM
Node 1
RPG
Node 2
RPG
Standalone NiFi
Output Port
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Client
• Code for Site-To-Site broken out into reusable module
• https://github.com/apache/nifi/tree/master/nifi-commons/nifi-site-to-site-client
• Can be used from any Java program to push/pull from NiFi
Java Program
Site-To-Site Client
Node 1
Output Port
NCM
Node 2
Output Port
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flink + NiFi Integration
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flink + NiFi Integration
• Use Site-To-Site Client in Flink Streaming
• NiFiSource to pull data from NiFi Output Port
• NiFiSink to push data to NiFi Input Port
• NiFiDataPacket to represent data to/from NiFi (think FlowFile)
public interface NiFiDataPacket {
byte[] getContent();
Map<String, String> getAttributes();
}
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Source Example
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
SiteToSiteClientConfig clientConfig = new
SiteToSiteClient.Builder()
.url("http://localhost:8080/nifi")
.portName("Data for Flink")
.requestBatchCount(…)
.buildConfig();
SourceFunction<NiFiDataPacket> nifiSource = new
NiFiSource(clientConfig);
DataStream<NiFiDataPacket> streamSource =
env.addSource(nifiSource);
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Sink Example
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
SiteToSiteClientConfig clientConfig = new
SiteToSiteClient.Builder()
.url("http://localhost:8080/nifi")
.portName("Data from Flink")
.buildConfig();
// Creates a NiFiDataPacket from incoming data of a given type
// Here we are creating NiFiDataPackets for each String
NiFiDataPacketBuilder<String> dpb = ...
DataStreamSink<String> dataStream = ...
.addSink(new NiFiSink<>(clientConfig, dpb));
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Case Discussion
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Drive Data to Flink for Analysis
NiFi Flink
NiFi
NiFi
• Drive data from sources to central data center for analysis
• Tiered collection approach at various locations, think regional data centers
Edge
Edge
Core
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamically Adjusting Data Flow
• Push analytic results from Flink back to NiFi
• Push results back to edge locations/devices to change behavior
NiFi Flink
NiFi
NiFi
Edge
Edge
Core
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
1. Logs filtered by level and sent from Edge -> Core
2. Flink produces new filter levels based on rate & sends back to core
3. Edge polls core for new filter levels & updates filtering
Example: Dynamic Log Collection
Core NiFi
Flink
Edge NiFi
Logs Logs
New Filters
Logs Output Log Input Log Output
Result Input Store Result
Service Fetch ResultPoll Service
Filter
New Filters
New
Filters
Poll
Analytic
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Edge NiFi
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Core NiFi
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Flink Streaming
StreamExecutionEnvironment env = ...
SiteToSiteClientConfig clientConfig = getSourceConfig(props);
DataStream<NiFiDataPacket> streamSource =
env.addSource(new NiFiSource(clientConfig));
int windowMs = ...
LogLevelFlatMap logLevelFlatMap = new LogLevelFlatMap(...);
DataStream<LogLevels> counts =
streamSource.flatMap(logLevelFlatMap)
.timeWindowAll(Time.of(windowSize, TimeUnit.MILLISECONDS))
.apply(new LogLevelWindowCounter());
double rate = ...
SiteToSiteClientConfig sinkConfig = getSinkConfig(props);
NiFiDataPacketBuilder<LogLevels> builder = new DictionaryBuilder(window, rate);
counts.addSink(new NiFiSink<>(sinkConfig, builder));
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Full Flow
NiFi Flink
NiFi
NiFi
Edge
Edge
Core
Logs
Logs
Logs
New Filters
New Filters
New Filters
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
• Use NiFi to drive data from sources to Flink
• Leverage Flink results to adjust your dataflows
Sources
• [1] https://nifi.apache.org/
Resources
• https://github.com/bbende/nifi-streaming-examples
• https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming
• https://flink.apache.org/news/2015/02/09/streaming-example.html
Contact Info:
• Email: bbende@hortonworks.com
• Twitter: @bbende
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you

More Related Content

PPTX
Real-Time Data Flows with Apache NiFi
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
PDF
High-Performance Advanced Analytics with Spark-Alchemy
PPTX
Hive+Tez: A performance deep dive
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Monitoring with prometheus
PPTX
Apache Flink: API, runtime, and project roadmap
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Real-Time Data Flows with Apache NiFi
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
High-Performance Advanced Analytics with Spark-Alchemy
Hive+Tez: A performance deep dive
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Monitoring with prometheus
Apache Flink: API, runtime, and project roadmap
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...

What's hot (20)

PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
PPTX
Elastic - ELK, Logstash & Kibana
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PPTX
Netflix viewing data architecture evolution - QCon 2014
PDF
Introduction to elasticsearch
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Apache Hadoop YARNとマルチテナントにおけるリソース管理
PPTX
Grafana optimization for Prometheus
PPTX
Elastic search overview
PDF
Running Apache NiFi with Apache Spark : Integration Options
PDF
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
PPTX
Local Secondary Indexes in Apache Phoenix
PDF
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
PDF
Vector databases and neural search
PDF
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
PPTX
Apache NiFi Crash Course Intro
PPTX
Tag based policies using Apache Atlas and Ranger
PDF
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Flink: Real-World Use Cases for Streaming Analytics
Elastic - ELK, Logstash & Kibana
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Netflix viewing data architecture evolution - QCon 2014
Introduction to elasticsearch
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Grafana optimization for Prometheus
Elastic search overview
Running Apache NiFi with Apache Spark : Integration Options
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
Local Secondary Indexes in Apache Phoenix
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
How to understand and analyze Apache Hive query execution plan for performanc...
Vector databases and neural search
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Apache NiFi Crash Course Intro
Tag based policies using Apache Atlas and Ranger
Apache Arrow Flight: A New Gold Standard for Data Transport
Ad

Similar to Integrating Apache NiFi and Apache Flink (20)

PPTX
Apache NiFi in the Hadoop Ecosystem
PPTX
Apache NiFi in the Hadoop Ecosystem
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
PDF
Integrating Apache NiFi and Apache Apex
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PPTX
Integrating NiFi and Apex
PDF
Nifi workshop
PPTX
State of the Apache NiFi Ecosystem & Community
PDF
Apache Nifi Crash Course
PDF
Apache Nifi Crash Course
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
PPTX
Apache NiFi 1.0 in Nutshell
PPTX
HDF Powered by Apache NiFi Introduction
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
NJ Hadoop Meetup - Apache NiFi Deep Dive
Integrating Apache NiFi and Apache Apex
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Integrating NiFi and Apex
Nifi workshop
State of the Apache NiFi Ecosystem & Community
Apache Nifi Crash Course
Apache Nifi Crash Course
Hortonworks Data in Motion Webinar Series - Part 1
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Dataflow Management From Edge to Core with Apache NiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
HDF: Hortonworks DataFlow: Technical Workshop
Connecting the Drops with Apache NiFi & Apache MiNiFi
Apache NiFi 1.0 in Nutshell
HDF Powered by Apache NiFi Introduction
Introduction to Apache NiFi - Seattle Scalability Meetup
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Monthly Chronicles - July 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf

Integrating Apache NiFi and Apache Flink

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Integrating Apache NiFi and Apache Flink Feb 4th 2016 Bryan Bende – Member of Technical Staff
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Outline • Introduction to NiFi • NiFi Site-To-Site • Flink + NiFi Integration • Use Case Discussion
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved About Me • Member of Technical Staff at Hortonworks • Apache NiFi Committer & PMC Member since June 2015 • Contributed NiFi + Flink Streaming Integration • Twitter: @bbende / Blog: bryanbende.com
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introduction to Apache NiFi
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi • Powerful and reliable system to process and distribute data • Directed graphs of data routing and transformation • Web-based User Interface for creating, monitoring, & controlling data flows • Highly configurable - modify data flow at runtime, dynamically prioritize data • Data Provenance tracks data through entire system • Easily extensible through development of custom components [1] https://nifi.apache.org/
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Terminology FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized Process Group • Set of processors and their connections • Receive data via input ports, send data via output ports
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - User Interface • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Provenance • Tracks data at each point as it flows through the system • Records, indexes, and makes events available for display • Handles fan-in/fan-out, i.e. merging and splitting data • View attributes and content at given points in time
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Queue Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Extensibility Built from the ground up with extensions in mind Service-loader pattern for… • Processors • Controller Services • Reporting Tasks • Prioritizers Extensions packaged as NiFi Archives (NARs) • Deploy NiFi lib directory and restart • Provides ClassLoader isolation • Same model as standard components
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Architecture OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi Site-To-Site
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi Site-To-Site • Direct communication between two NiFi instances • Push to Input Port on receiver, or Pull from Output Port on source • Communicate between clusters, standalone instances, or both • Handles load balancing and reliable delivery • Secure connections using certificates (optional)
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Push • Source connects Remote Process Group to Input Port on destination • Site-To-Site takes care of load balancing across the nodes in the cluster NCM Node 1 Input Port Node 2 Input Port Standalone NiFi RPG
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Pull • Destination connects Remote Process Group to Output Port on the source • If source was a cluster, each node would pull from each node in cluster NCM Node 1 RPG Node 2 RPG Standalone NiFi Output Port
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Site-To-Site Client • Code for Site-To-Site broken out into reusable module • https://github.com/apache/nifi/tree/master/nifi-commons/nifi-site-to-site-client • Can be used from any Java program to push/pull from NiFi Java Program Site-To-Site Client Node 1 Output Port NCM Node 2 Output Port
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Flink + NiFi Integration
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Flink + NiFi Integration • Use Site-To-Site Client in Flink Streaming • NiFiSource to pull data from NiFi Output Port • NiFiSink to push data to NiFi Input Port • NiFiDataPacket to represent data to/from NiFi (think FlowFile) public interface NiFiDataPacket { byte[] getContent(); Map<String, String> getAttributes(); }
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi Source Example StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); SiteToSiteClientConfig clientConfig = new SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Data for Flink") .requestBatchCount(…) .buildConfig(); SourceFunction<NiFiDataPacket> nifiSource = new NiFiSource(clientConfig); DataStream<NiFiDataPacket> streamSource = env.addSource(nifiSource);
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi Sink Example StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); SiteToSiteClientConfig clientConfig = new SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Data from Flink") .buildConfig(); // Creates a NiFiDataPacket from incoming data of a given type // Here we are creating NiFiDataPackets for each String NiFiDataPacketBuilder<String> dpb = ... DataStreamSink<String> dataStream = ... .addSink(new NiFiSink<>(clientConfig, dpb));
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Case Discussion
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Drive Data to Flink for Analysis NiFi Flink NiFi NiFi • Drive data from sources to central data center for analysis • Tiered collection approach at various locations, think regional data centers Edge Edge Core
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamically Adjusting Data Flow • Push analytic results from Flink back to NiFi • Push results back to edge locations/devices to change behavior NiFi Flink NiFi NiFi Edge Edge Core
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 1. Logs filtered by level and sent from Edge -> Core 2. Flink produces new filter levels based on rate & sends back to core 3. Edge polls core for new filter levels & updates filtering Example: Dynamic Log Collection Core NiFi Flink Edge NiFi Logs Logs New Filters Logs Output Log Input Log Output Result Input Store Result Service Fetch ResultPoll Service Filter New Filters New Filters Poll Analytic
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Log Collection – Edge NiFi
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Log Collection – Core NiFi
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Log Collection – Flink Streaming StreamExecutionEnvironment env = ... SiteToSiteClientConfig clientConfig = getSourceConfig(props); DataStream<NiFiDataPacket> streamSource = env.addSource(new NiFiSource(clientConfig)); int windowMs = ... LogLevelFlatMap logLevelFlatMap = new LogLevelFlatMap(...); DataStream<LogLevels> counts = streamSource.flatMap(logLevelFlatMap) .timeWindowAll(Time.of(windowSize, TimeUnit.MILLISECONDS)) .apply(new LogLevelWindowCounter()); double rate = ... SiteToSiteClientConfig sinkConfig = getSinkConfig(props); NiFiDataPacketBuilder<LogLevels> builder = new DictionaryBuilder(window, rate); counts.addSink(new NiFiSink<>(sinkConfig, builder));
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Log Collection – Full Flow NiFi Flink NiFi NiFi Edge Edge Core Logs Logs Logs New Filters New Filters New Filters
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary • Use NiFi to drive data from sources to Flink • Leverage Flink results to adjust your dataflows Sources • [1] https://nifi.apache.org/ Resources • https://github.com/bbende/nifi-streaming-examples • https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming • https://flink.apache.org/news/2015/02/09/streaming-example.html Contact Info: • Email: bbende@hortonworks.com • Twitter: @bbende
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you