SlideShare a Scribd company logo
Data Ingestion &
Distribution with
Apache NiFi
Agenda
Introduction to NiFi
Our use case for NiFi
Demo
Q&A
Introduction to
NiFi
History & Facts
Created by : NSA
Incubating : 2014
Available : 2015
Main contributors: Hortonworks
Current Stable Version : 1.1.1
Delivery Guarantees : at least once
Out of Order Processing : no
Windowing : no
Back-pressure : yes
Latency : configurable
Resource Management : native
API : REST (GUI)
Ecosystem
Stream ProcessingData Moving
Architecture
Flow Files
Basic Abstraction
● Pointer to content
● Content Attributes (key/value)
● Connection to provenance events
Repositories
● FlowFile
● Content
● Provenance
● Immutable
● Copy-on-write
Processor
Processors actually perform the work of
data routing, transformation, or
mediation between systems. Processors
have access to attributes of a given
FlowFile and its content stream.
Processors can operate on zero or more
Flow Files in a given unit of work and
either commit that work or rollback
Processor
● Basic Work Unit
● State
● Statistics
● Settings
● Input/Output
● Provenance
● Scheduling
● Logging (bulletins)
Connection
Connections provide the actual linkage
between processors. These act as
queues and allow various processes to
interact at differing rates. These queues
can be prioritized dynamically and can
have upper bounds on load, which
enable back pressure
Connection
● Queue
● Statistics
● Settings
● Prioritization
● Details
Process Group
Specific set of processes and their
connections, which can receive data
via input ports and send data out via
output ports. In this manner, process
groups allow creation of entirely new
components simply by composition of
other components
Templates
Templates tend to be highly pattern oriented and while there are often many
different ways to solve a problem, it helps greatly to be able to share those
best practices. Templates allow subject matter experts to build and publish
their flow designs and for others to benefit and collaborate on them
● XML Based
● Reusable unit
● Versioning (versioning with Git)
Data Provenance
NiFi automatically records, indexes, and makes available
provenance data as objects flow through the system even
across fan-in, fan-out, transformations, and more. This
information becomes extremely critical in supporting
compliance, troubleshooting, optimization, and other scenarios
Data Provenance
● Details
● Attributes
● Content
Controller Service
Controller Service allows
developers to share functionality
and state across the JVM in a
clean and consistent manner
● No scheduling
● No connections
● Used by Processors,
Reporting Tasks, and other
Controller Services
Reporting Tasks
Provides a capability for reporting
status, statistics, metrics, and
monitoring information to external
services
● ElastichSearchProvenanceReporter and DataDogReportingTask
Extensibility
● Ready to use maven template
● Well defined interface for each component
● Classloader Isolation (.nar files)
● Great documentation for developers
Statistics
● 200+ built in Processors
● 10+ built Control Services
● 10+ built in Reporting Tasks
Introduction Summary
● Processor
● Connection
● Processing Group
● Template
● Controller Service
● Reporting Task
Our use case for
NiFi
What was before
● Inhouse built file collector
● Footprint of 10 server
● Hard to manage, scale, extend
DWH Real Time
DWH Batch
Reports Distribution
Statistics
20TB
Data Ingested Daily
250K
Files Ingested Daily
Near Real Time Data Availability
Minimum Interval :1 min
1 TB
Data Distributed Reports
1 TB
30K
Files Exported Daily
AWS - Hadoop Ingestion
AWS - Hadoop Ingestion
Kafka Reprocessing
sFTP - HDFS Ingestion
Let’s break
something ;)
Use Cases Summary
● Web User Interface
● Configurable
● Scalable
● Easy to Manage
● Designed for Extension
Q & A
THANK
YOU

More Related Content

PDF
Apache Nifi Crash Course
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PDF
Apache Nifi Crash Course
PDF
PDF
Dataflow with Apache NiFi
PPTX
Apache Flink and what it is used for
PDF
Introduction to Apache NiFi 1.11.4
Apache Nifi Crash Course
Apache Kafka Architecture & Fundamentals Explained
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Apache Nifi Crash Course
Dataflow with Apache NiFi
Apache Flink and what it is used for
Introduction to Apache NiFi 1.11.4

What's hot (20)

PPTX
Real-Time Data Flows with Apache NiFi
PDF
Nifi workshop
PDF
NiFi Developer Guide
PDF
Introduction to Spark Streaming
PPTX
A visual introduction to Apache Kafka
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PPTX
Stability Patterns for Microservices
PDF
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
PDF
Combining logs, metrics, and traces for unified observability
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
PDF
Apache NiFi Meetup - Princeton NJ 2016
PPTX
Elastic stack Presentation
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
PPTX
Apache Spark Architecture
PPTX
Apache Beam: A unified model for batch and stream processing data
PDF
Deploying Flink on Kubernetes - David Anderson
PPTX
Kafka 101
PDF
Introduction to elasticsearch
Real-Time Data Flows with Apache NiFi
Nifi workshop
NiFi Developer Guide
Introduction to Spark Streaming
A visual introduction to Apache Kafka
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Best practices and lessons learnt from Running Apache NiFi at Renault
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Stability Patterns for Microservices
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Combining logs, metrics, and traces for unified observability
Introduction to Apache NiFi dws19 DWS - DC 2019
Apache NiFi Meetup - Princeton NJ 2016
Elastic stack Presentation
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Apache Spark Architecture
Apache Beam: A unified model for batch and stream processing data
Deploying Flink on Kubernetes - David Anderson
Kafka 101
Introduction to elasticsearch
Ad

Similar to Data ingestion and distribution with apache NiFi (20)

PDF
Automate your data flows with Apache NIFI
PDF
Prefect Paris Airflow Meetup Jeff Hale April 2023.pdf
PDF
NetflixOSS Meetup S6E1 - Titus & Containers
PDF
The Fn Project: A Quick Introduction (December 2017)
PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
PDF
Census Bureau PBOCS
PDF
3-2-1 Action! Running OpenStack Shared File System Service in Production
PDF
Apache airflow
PPTX
Data Engineer's Lunch #44: Prefect
PDF
Data Pipelines with Python - NWA TechFest 2017
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PPTX
Apache Airdrop detailed description.pptx
PDF
Bringing SDN to the Management Plane
PDF
AI made easy with Flink AI Flow
PDF
Introduction to data flow management using apache nifi
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
PDF
Open shift and docker - october,2014
PDF
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Music city data Hail Hydrate! from stream to lake
Automate your data flows with Apache NIFI
Prefect Paris Airflow Meetup Jeff Hale April 2023.pdf
NetflixOSS Meetup S6E1 - Titus & Containers
The Fn Project: A Quick Introduction (December 2017)
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Census Bureau PBOCS
3-2-1 Action! Running OpenStack Shared File System Service in Production
Apache airflow
Data Engineer's Lunch #44: Prefect
Data Pipelines with Python - NWA TechFest 2017
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Apache Airdrop detailed description.pptx
Bringing SDN to the Management Plane
AI made easy with Flink AI Flow
Introduction to data flow management using apache nifi
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Open shift and docker - october,2014
Why Wait? Realtime Ingestion With Chen Qin and Heng Zhang | Current 2022
Cloud lunch and learn real-time streaming in azure
Music city data Hail Hydrate! from stream to lake
Ad

Recently uploaded (20)

PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Fluorescence-microscope_Botany_detailed content
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
.pdf is not working space design for the following data for the following dat...
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
1_Introduction to advance data techniques.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Major-Components-ofNKJNNKNKNKNKronment.pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction-to-Cloud-ComputingFinal.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Fluorescence-microscope_Botany_detailed content

Data ingestion and distribution with apache NiFi