SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Mission to NARs with
Apache NiFi
Aldrin Piri - @aldrinpiri
ApacheCon Big Data 2016
12 May 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tutorial Resources
https://github.com/apiri/nifi-mission-to-nars-workshop
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Start with a dataflow… but we can do better!
• Do better with the NiFi Framework and custom processor
• Extension Points: Processors, Controller Services, Reporting Tasks
• Process Session & Process Context
• How the API ties to the NiFi repositories
• Testing isn’t that bad!
• Share with templates!
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Adding new functionality and development approach
 Extending the platform is about leveraging expansive Java ecosystem and existing code
– Make use of open source projects and provided libraries for targeted systems and services
– Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework
 Test framework provides powerful means of testing extensions in isolation as they
would work in a live instance
 Deployment is as simple as copying the created NAR to your instance(s) lib directory
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Minimal Dependencies Needed
 Java Development Kit, version 1.7 or later
 Maven, version 3.1.0+
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Boilerplate Code is provided via Maven Archetype
 Support for creating bundles of major extension points of Processors and Controller
Services
– Processor Bundle
– Controller Service Bundle
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is a NAR?
– Bundles the developed code to provide
extensions and their dependencies
– Allows extension classloader isolation,
aiding in versioning issues that can be
pervasive in interacting with a wide variety
of systems, services, and formats
NAR == NiFi ARchive
Consider it to be an OSGi-lite package
NAR Bundle Structure
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it take to create an extension?
 Incorporating functionality from an existing library
– Create a bundle
– Include a dependency to the library
– Design User Experience
• Properties – How can this extension be configured? What are valid values for user input?
• Relationships – How will data move to the next stage of its processing?
– Wrap the core classes of the library in the framework and implement onTrigger
• ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions
• ProcessContext allows accessing defined properties which the framework has validated
– Test
– Deploy
For the majority of cases, development time is measured in hours*
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it really take to create an extension?
 Increased development effort may be needed for handling specific protocols
– Driven through manual management of sessions, when there are resources with their own
lifecycles beyond the sole onTrigger method
– Common for protocol “Listeners”
For the majority of cases, development time is still measured in hours
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Behind the Scenes
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1 C1 C1 P1 F1
BEFORE
AFTER
F2 C1 C1 P3 F2 – Clone (F1)
F1 C1 P2 F1 – Route
P1 F1 – Create
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1 C1 C1 P1 F1 - CREATE
BEFORE
AFTER
F1 C1
F1.1 C2 C2 (encrypted)
C1 (plaintext)
P2 F1.1 - MODIFY
P1 F1 - CREATE
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Quick (and dirty?) Prototyping
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prototype Dataflows Using Existing Binaries/Applications
 ExecuteProcess – Acts as a source
processor, creating FlowFiles containing
data written to STDOUT by the target
application
 ExecuteStreamCommand – Provides
content of FlowFiles to an external
application via STDIN and creates
FlowFiles containing data written STDOUT
Processors allow making external calls to applications and programs outside of the JVM
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Increased Flexibility of Prototyping via Scripting Languages
 ExecuteScript– Acts as a source processor,
creating FlowFiles containing data from a
referenced Script
 InvokeScriptedProcessor – Provides access
to the core framework API for interacting
with NiFi like a native Java processor
Processors allow using JVM friendly interpreted languages
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resources
Developer Guide
– http://nifi.apache.org/developer-guide.html
Apache NiFi Maven Archetypes
– https://cwiki.apache.org/confluence/display/NIFI/Maven+Proj
ects+for+Extensions
Mission to NARs with Apache NiFi sample bundle
– https://github.com/apiri/nifi-mission-to-nars-workshop
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks for hanging out!

More Related Content

PPTX
Webinar Series Part 5 New Features of HDF 5
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PPTX
Log Analytics Optimization
PDF
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PDF
An Apache Hive Based Data Warehouse
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
PPTX
Ozone- Object store for Apache Hadoop
Webinar Series Part 5 New Features of HDF 5
HDF: Hortonworks DataFlow: Technical Workshop
Log Analytics Optimization
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
An Apache Hive Based Data Warehouse
Connecting the Drops with Apache NiFi & Apache MiNiFi
Ozone- Object store for Apache Hadoop

What's hot (20)

PPTX
Scaling real time streaming architectures with HDF and Dell EMC Isilon
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PPTX
Apache NiFi Crash Course Intro
PPTX
Double Your Hadoop Hardware Performance with SmartSense
PPTX
Streamline Hadoop DevOps with Apache Ambari
PDF
What’s new in Apache Spark 2.3 and Spark 2.4
PDF
Scalable OCR with NiFi and Tesseract
PDF
What s new in spark 2.3 and spark 2.4
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PPTX
Building a Smarter Home with Apache NiFi and Spark
PPTX
Running Enterprise Workloads in the Cloud
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PDF
Meet HBase 2.0 and Phoenix-5.0
PDF
Attunity Hortonworks Webinar- Sept 22, 2016
PPTX
Apache NiFi Toronto Meetup
PPTX
Hive - 1455: Cloud Storage
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Apache NiFi Crash Course Intro
Double Your Hadoop Hardware Performance with SmartSense
Streamline Hadoop DevOps with Apache Ambari
What’s new in Apache Spark 2.3 and Spark 2.4
Scalable OCR with NiFi and Tesseract
What s new in spark 2.3 and spark 2.4
Apache NiFi in the Hadoop Ecosystem
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Building a Smarter Home with Apache NiFi and Spark
Running Enterprise Workloads in the Cloud
Hadoop & Cloud Storage: Object Store Integration in Production
Meet HBase 2.0 and Phoenix-5.0
Attunity Hortonworks Webinar- Sept 22, 2016
Apache NiFi Toronto Meetup
Hive - 1455: Cloud Storage
An Overview on Optimization in Apache Hive: Past, Present Future
Apache Hive 2.0: SQL, Speed, Scale
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Ad

Similar to Mission to NARs with Apache NiFi (20)

PPTX
Future of Data New Jersey - HDF 3.0 Deep Dive
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
HDF 3.1 : An Introduction to New Features
PPTX
The Avant-garde of Apache NiFi
PPTX
The Avant-garde of Apache NiFi
PPTX
Apache NiFi Crash Course - San Jose Hadoop Summit
PPTX
State of the Apache NiFi Ecosystem & Community
PDF
Running Apache NiFi with Apache Spark : Integration Options
PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
PPTX
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
PDF
Nifi workshop
PDF
Redfish and python-redfish for Software Defined Infrastructure
PDF
Apache Deep Learning 101 - DWS Berlin 2018
PPTX
De-Mystifying the Apache Phoenix QueryServer
PDF
Dataflow Management From Edge to Core with Apache NiFi
PDF
AIDevWorldApacheNiFi101
PDF
You Can't Search Without Data
Future of Data New Jersey - HDF 3.0 Deep Dive
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Apache NiFi in the Hadoop Ecosystem
HDF 3.1 : An Introduction to New Features
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Apache NiFi Crash Course - San Jose Hadoop Summit
State of the Apache NiFi Ecosystem & Community
Running Apache NiFi with Apache Spark : Integration Options
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Nifi workshop
Redfish and python-redfish for Software Defined Infrastructure
Apache Deep Learning 101 - DWS Berlin 2018
De-Mystifying the Apache Phoenix QueryServer
Dataflow Management From Edge to Core with Apache NiFi
AIDevWorldApacheNiFi101
You Can't Search Without Data
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
sap open course for s4hana steps from ECC to s4
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx

Mission to NARs with Apache NiFi

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Mission to NARs with Apache NiFi Aldrin Piri - @aldrinpiri ApacheCon Big Data 2016 12 May 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tutorial Resources https://github.com/apiri/nifi-mission-to-nars-workshop
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Start with a dataflow… but we can do better! • Do better with the NiFi Framework and custom processor • Extension Points: Processors, Controller Services, Reporting Tasks • Process Session & Process Context • How the API ties to the NiFi repositories • Testing isn’t that bad! • Share with templates!
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Adding new functionality and development approach  Extending the platform is about leveraging expansive Java ecosystem and existing code – Make use of open source projects and provided libraries for targeted systems and services – Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework  Test framework provides powerful means of testing extensions in isolation as they would work in a live instance  Deployment is as simple as copying the created NAR to your instance(s) lib directory
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Minimal Dependencies Needed  Java Development Kit, version 1.7 or later  Maven, version 3.1.0+
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Boilerplate Code is provided via Maven Archetype  Support for creating bundles of major extension points of Processors and Controller Services – Processor Bundle – Controller Service Bundle
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is a NAR? – Bundles the developed code to provide extensions and their dependencies – Allows extension classloader isolation, aiding in versioning issues that can be pervasive in interacting with a wide variety of systems, services, and formats NAR == NiFi ARchive Consider it to be an OSGi-lite package NAR Bundle Structure
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How long does it take to create an extension?  Incorporating functionality from an existing library – Create a bundle – Include a dependency to the library – Design User Experience • Properties – How can this extension be configured? What are valid values for user input? • Relationships – How will data move to the next stage of its processing? – Wrap the core classes of the library in the framework and implement onTrigger • ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions • ProcessContext allows accessing defined properties which the framework has validated – Test – Deploy For the majority of cases, development time is measured in hours*
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How long does it really take to create an extension?  Increased development effort may be needed for handling specific protocols – Driven through manual management of sessions, when there are resources with their own lifecycles beyond the sole onTrigger method – Common for protocol “Listeners” For the majority of cases, development time is still measured in hours
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Behind the Scenes
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories - Pass by reference FlowFile Content Provenance F1 C1 C1 P1 F1 BEFORE AFTER F2 C1 C1 P3 F2 – Clone (F1) F1 C1 P2 F1 – Route P1 F1 – Create
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories – Copy on Write FlowFile Content Provenance F1 C1 C1 P1 F1 - CREATE BEFORE AFTER F1 C1 F1.1 C2 C2 (encrypted) C1 (plaintext) P2 F1.1 - MODIFY P1 F1 - CREATE
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Quick (and dirty?) Prototyping
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Prototype Dataflows Using Existing Binaries/Applications  ExecuteProcess – Acts as a source processor, creating FlowFiles containing data written to STDOUT by the target application  ExecuteStreamCommand – Provides content of FlowFiles to an external application via STDIN and creates FlowFiles containing data written STDOUT Processors allow making external calls to applications and programs outside of the JVM
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Increased Flexibility of Prototyping via Scripting Languages  ExecuteScript– Acts as a source processor, creating FlowFiles containing data from a referenced Script  InvokeScriptedProcessor – Provides access to the core framework API for interacting with NiFi like a native Java processor Processors allow using JVM friendly interpreted languages
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resources Developer Guide – http://nifi.apache.org/developer-guide.html Apache NiFi Maven Archetypes – https://cwiki.apache.org/confluence/display/NIFI/Maven+Proj ects+for+Extensions Mission to NARs with Apache NiFi sample bundle – https://github.com/apiri/nifi-mission-to-nars-workshop
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thanks for hanging out!