SlideShare a Scribd company logo
Data Ingestion using NiFi
Quick Overview
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi Layout as a service
• Key Concepts such as Flow Files, Attributes etc
• Understanding how to access the documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• Demo - Simple pipeline to copy files from Local File System and HDFS
Resources
• Code and Documentation will be available in GitHub Repository.
• Videos will be available over YouTube as part of this playlist. Videos
will be streamed for free and will be available for free for few weeks
after which they will become member only (except this one).
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Web/App Server
Web/App Server
Web/App Server
Database
Client
Client
Client
Client
Client
Client
Switch
Firewall
Switch
Firewall
Web/App Server
Web/App Server
Web/App Server
Database
Files
Databases
BI/DW
External
Apps
Data Integration
Batch or Real Time
• For batch get data from databases
by querying data from Database
• Batch Tools: Informatica, Ab Initio
etc
• For real time get data from web
server logs or database logs
• Real time tools: Goldengate to get
data from database logs, Kafka to
get data from web server logs
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Files
Databases
BI/DW
External
Apps
Data Lake
(S3, ADLS)
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Ingestion
Ingestion
Data Processing
(EMR, Databricks, Docker)
NiFi helps in Ingestion and basic scheduling
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Understanding NiFi as a service
• NiFi is a data ingestion tool and it is typically configured on edge
nodes or client nodes.
• It can be configured on multiple nodes as a cluster for HA, Fault
Tolerance and Load Balancing.
• It can be integrated with Kerberos for Security.
• NiFi is an external service and requires configuration to integrate with
Data Engineering tools like Spark, Kafka, Hadoop etc.
• NiFi is provided as one of the key services under
Cloudera/Hortonworks Distributions.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Core Concepts
Here are the core concepts of NiFi one should be familiar with. One will
understand all these concepts while exploring NiFi in depth as part of
the NiFi Workshop Series.
• Processors
• Processor Groups
• Flowfiles
• Attributes
• Controller Services
• NiFi Expression Language
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Accessing NiFi Documentation
• NiFi documentation is accessible from any processor by using usage
that is available in right click menu.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Capabilities of NiFi as a Data Ingestion Tool
• Can consume data from most of the sources into Data Lake.
• Can port the data from Data Lake to downstream systems.
• We can also take care of file format conversion while loading data into
Data Lake using NiFi.
• NiFi also provides abilities to apply almost all the standard row level
transformations either by using JOLT or SQL in an incremental fashion.
• NiFi can also be leveraged for orchestrating as well as scheduling the
Data Pipelines.
• However, NiFi might not be the most appropriate tool to load heavy
data as baseline and also not good at complex transformations.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi vs. Traditional ETL Tools
• NiFi is primarily an ingestion tool.
• It works well to extract and load the data into Data Lake with out
complex transformations.
• NiFi is very good at getting data between hops by dealing with files
rather than manipulating data.
• NiFi is capable of building simple and generic pipelines to get data
between hops with out restricting the flow with schema.
• You can build a very simple flow in minutes to get data from
thousands of files belonging to hundreds of tables into Data Lake. You
will see that as part of the demo later.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Role of NiFi in Data Engineering at Scale
• Get data from databases into data lake
• Consume data from Kafka topics into data lake
• Get data from app server log files into data lake (using Minifi)
• Get data from Data Lake into file servers.
• Get data from on-prem Data Lake into Cloud such as S3, ADLS etc.
• Get processed data from Data Lake into Databases or Data
Warehouses.
training@itversity.com
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Demo – Simple Data Pipeline
• Build a simple pipeline to get files from local file system into HDFS.
training@itversity.com

More Related Content

PPTX
Introduction to Data Engineering
PDF
Future of Data Engineering
PDF
Big Data Computing Architecture
PPTX
Big Data Introduction
PDF
Big Data Architecture Workshop - Vahid Amiri
PPTX
TechEvent Building a Data Lake
PDF
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
PDF
Big Data Architecture and Design Patterns
Introduction to Data Engineering
Future of Data Engineering
Big Data Computing Architecture
Big Data Introduction
Big Data Architecture Workshop - Vahid Amiri
TechEvent Building a Data Lake
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Big Data Architecture and Design Patterns

What's hot (19)

PDF
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
PPTX
Seamless, Real-Time Data Integration with Connect
PDF
IBM Cloud Day January 2021 Data Lake Deep Dive
PPTX
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
PPTX
Scaling Data Science on Big Data
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
PDF
Big Data Architecture
PPTX
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
PDF
Suburface 2021 IBM Cloud Data Lake
PDF
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
PDF
The Hidden Value of Hadoop Migration
PDF
Modern Data Warehouse Overview
PPTX
Architecting a datalake
PPTX
Choosing technologies for a big data solution in the cloud
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
PPTX
Next Generation Enterprise Architecture
PDF
Bi on Big Data - Strata 2016 in London
PPTX
Delta Lake with Azure Databricks
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Seamless, Real-Time Data Integration with Connect
IBM Cloud Day January 2021 Data Lake Deep Dive
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
Scaling Data Science on Big Data
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Big Data Architecture
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Suburface 2021 IBM Cloud Data Lake
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
The Hidden Value of Hadoop Migration
Modern Data Warehouse Overview
Architecting a datalake
Choosing technologies for a big data solution in the cloud
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Next Generation Enterprise Architecture
Bi on Big Data - Strata 2016 in London
Delta Lake with Azure Databricks
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Ad

Similar to Data ingestion using NiFi - Quick Overview (20)

PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
PDF
Apache NiFi User Guide
PDF
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
PPTX
Apache NiFi: A Drag and Drop Approach
PDF
Automate your data flows with Apache NIFI
PDF
AIDevWorldApacheNiFi101
PPTX
spring-cloud.pptx
PDF
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
PDF
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
PPTX
Deep learning with DL4J - Hadoop Summit 2015
PPT
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
PPTX
Integrating Apache Spark and NiFi for Data Lakes
PPTX
How to Build Deep Learning Models
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
KEY
drupal 7 amfserver presentation: integrating flash and drupal
PPTX
Stinger.Next by Alan Gates of Hortonworks
PPTX
Big data - Online Training
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PDF
HDFCloud Workshop: HDF5 in the Cloud
PDF
Introduction to Filecoin
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Apache NiFi User Guide
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
Apache NiFi: A Drag and Drop Approach
Automate your data flows with Apache NIFI
AIDevWorldApacheNiFi101
spring-cloud.pptx
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Deep learning with DL4J - Hadoop Summit 2015
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Integrating Apache Spark and NiFi for Data Lakes
How to Build Deep Learning Models
Introduction to Apache NiFi dws19 DWS - DC 2019
drupal 7 amfserver presentation: integrating flash and drupal
Stinger.Next by Alan Gates of Hortonworks
Big data - Online Training
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
HDFCloud Workshop: HDF5 in the Cloud
Introduction to Filecoin
Ad

More from Durga Gadiraju (7)

PDF
Itversity
PPTX
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
PPTX
Big Data Certifications Workshop - 201711 - Introduction and Linux Essentials
PPTX
HDPCD Spark using Python (pyspark)
PPTX
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
PPTX
Big Data Introduction - Solix empower
PPT
Oracle migrations and upgrades
Itversity
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Linux Essentials
HDPCD Spark using Python (pyspark)
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Big Data Introduction - Solix empower
Oracle migrations and upgrades

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
Major-Components-ofNKJNNKNKNKNKronment.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Logistic Regression ml machine learning.pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Knowledge Engineering Part 1
Business Acumen Training GuidePresentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Data ingestion using NiFi - Quick Overview

  • 1. Data Ingestion using NiFi Quick Overview training@itversity.com
  • 2. Agenda • Overview of NiFi • Understanding NiFi Layout as a service • Key Concepts such as Flow Files, Attributes etc • Understanding how to access the documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • Demo - Simple pipeline to copy files from Local File System and HDFS
  • 3. Resources • Code and Documentation will be available in GitHub Repository. • Videos will be available over YouTube as part of this playlist. Videos will be streamed for free and will be available for free for few weeks after which they will become member only (except this one). training@itversity.com
  • 4. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 5. Web/App Server Web/App Server Web/App Server Database Client Client Client Client Client Client Switch Firewall Switch Firewall
  • 6. Web/App Server Web/App Server Web/App Server Database Files Databases BI/DW External Apps Data Integration Batch or Real Time • For batch get data from databases by querying data from Database • Batch Tools: Informatica, Ab Initio etc • For real time get data from web server logs or database logs • Real time tools: Goldengate to get data from database logs, Kafka to get data from web server logs
  • 9. Files Databases BI/DW External Apps Data Lake (S3, ADLS) Database Application logs Mainframes IOT Device Data Modern Large Scale Data Engineering Architecture Ingestion Ingestion Data Processing (EMR, Databricks, Docker) NiFi helps in Ingestion and basic scheduling
  • 10. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 11. Understanding NiFi as a service • NiFi is a data ingestion tool and it is typically configured on edge nodes or client nodes. • It can be configured on multiple nodes as a cluster for HA, Fault Tolerance and Load Balancing. • It can be integrated with Kerberos for Security. • NiFi is an external service and requires configuration to integrate with Data Engineering tools like Spark, Kafka, Hadoop etc. • NiFi is provided as one of the key services under Cloudera/Hortonworks Distributions. training@itversity.com
  • 12. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 13. NiFi Core Concepts Here are the core concepts of NiFi one should be familiar with. One will understand all these concepts while exploring NiFi in depth as part of the NiFi Workshop Series. • Processors • Processor Groups • Flowfiles • Attributes • Controller Services • NiFi Expression Language training@itversity.com
  • 14. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 15. Accessing NiFi Documentation • NiFi documentation is accessible from any processor by using usage that is available in right click menu. training@itversity.com
  • 16. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 17. Capabilities of NiFi as a Data Ingestion Tool • Can consume data from most of the sources into Data Lake. • Can port the data from Data Lake to downstream systems. • We can also take care of file format conversion while loading data into Data Lake using NiFi. • NiFi also provides abilities to apply almost all the standard row level transformations either by using JOLT or SQL in an incremental fashion. • NiFi can also be leveraged for orchestrating as well as scheduling the Data Pipelines. • However, NiFi might not be the most appropriate tool to load heavy data as baseline and also not good at complex transformations. training@itversity.com
  • 18. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 19. NiFi vs. Traditional ETL Tools • NiFi is primarily an ingestion tool. • It works well to extract and load the data into Data Lake with out complex transformations. • NiFi is very good at getting data between hops by dealing with files rather than manipulating data. • NiFi is capable of building simple and generic pipelines to get data between hops with out restricting the flow with schema. • You can build a very simple flow in minutes to get data from thousands of files belonging to hundreds of tables into Data Lake. You will see that as part of the demo later. training@itversity.com
  • 20. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 21. Role of NiFi in Data Engineering at Scale • Get data from databases into data lake • Consume data from Kafka topics into data lake • Get data from app server log files into data lake (using Minifi) • Get data from Data Lake into file servers. • Get data from on-prem Data Lake into Cloud such as S3, ADLS etc. • Get processed data from Data Lake into Databases or Data Warehouses. training@itversity.com
  • 23. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 24. NiFi Demo – Simple Data Pipeline • Build a simple pipeline to get files from local file system into HDFS. training@itversity.com