SlideShare a Scribd company logo
Beyond Messaging
Enterprise Dataflow powered by Apache NiFi
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Aldrin Piri
3 November 2015
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About me
Senior Member of Technical Staff
Project Management Committee and Committer
@aldrinpiri
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global interactions with customers, business partners, and things
spanning different volume, velocity, bandwidth, and latency needs
Realistic View of Data Flow
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Meeting Edge Requirements
GATHER
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprints
operate with very little power
Limited Bandwidth
can create high latency
Data Availability
exceeds transmission bandwidth
Data Must Be Secured
throughout its journey
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Remote sensor delivery (Internet of Things - IoT)
• Intra-site / Inter-site / global distribution (Enterprise)
• Ingest for driving analytics (Big Data)
• Data Processing (Simple Event Processing)
Where do we find data flow?
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Messaging addresses only a small subset of the problem space
• Needed to understand the big picture
• Needed the ability to make immediate changes
• Must maintain chain of custody for data
• Rigorous security and compliance requirements
Challenges of dataflow in the enterprise
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Great options including:
• Kafka
• ActiveMQ
• Tibco
Let us consider the perfect messaging system for this talk:
• It has zero latency
• It has perfect data durability
• It supports unlimited consumers and producers
Messaging Systems as Dataflow
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“But my system needs…”
• A different format and/or schema
• To use a different protocol
• The highest priority information first
• Large objects (event batches) / Small Objects (streams)
• Authorization to the data level
• Only interested in a subset of data on a topic
• Data needs to be enriched/sanitized before it arrives
Dataflow as a messaging problem
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using Messaging
Only a subset agree
using messaging
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
CN
C1
Messaging
More issues to consider:
• How do you know what the data flow looks like?
• How is it managed?
• How is it working – today, yesterday?
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Add new systems to handle the protocol differences
• Add new systems to convert the data
• Add new systems to reorder the data
• Add new systems to filter the unauthorized data
• Add new topics to represent ‘stages of the flow’
Which leads to latency, complexity, and limited retention
Ultimately, the operations teams who handle data at flow boundaries become
responsible for managing.
How these issues are typically solved
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Real-time Data Flow
It’s not just how quickly you
move data – it’s about how
quickly you can change behavior
and seize new opportunities
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache NiFi
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
November 2014
NiFi is donated to the Apache Software Foundation
(ASF) through NSA’s Technology Transfer Program
and enters ASF’s incubator.
2006
NiagaraFiles (NiFi) was first incepted by Joe Witt at
the National Security Agency (NSA)
A Brief History
July 2015
NiFi reaches ASF top-level project status
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing,
transformation, or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages
the threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send
data via ports. A process group allows creation of entirely new
component simply by composition of its components.
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Live Demonstration
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Learn more and join us!
Apache NiFi site
http://nifi.apache.org
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you!

More Related Content

PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
PPTX
Apache NiFi Crash Course - San Jose Hadoop Summit
PDF
Dataflow with Apache NiFi
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
PDF
Apache Nifi Crash Course
PDF
Devnexus 2018 - Let Your Data Flow with Apache NiFi
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
PPTX
Apache NiFi Crash Course Intro
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Apache NiFi Crash Course - San Jose Hadoop Summit
Dataflow with Apache NiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Apache Nifi Crash Course
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Apache NiFi Crash Course Intro

What's hot (20)

PDF
Apache NiFi Record Processing
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
The Avant-garde of Apache NiFi
PPTX
Building Data Pipelines for Solr with Apache NiFi
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
Apache Nifi Crash Course
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
PDF
Introduction to data flow management using apache nifi
PDF
Apache NiFi: Ingesting Enterprise Data At Scale
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PDF
Running Apache NiFi with Apache Spark : Integration Options
PDF
Introduction to Apache NiFi 1.11.4
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
NiFi Best Practices for the Enterprise
PDF
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
PPTX
Future of Data New Jersey - HDF 3.0 Deep Dive
PPTX
Integrating NiFi and Flink
PPTX
Building a Smarter Home with Apache NiFi and Spark
PDF
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
PPTX
HDF Powered by Apache NiFi Introduction
Apache NiFi Record Processing
Dataflow Management From Edge to Core with Apache NiFi
The Avant-garde of Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
Apache NiFi in the Hadoop Ecosystem
Apache Nifi Crash Course
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Introduction to data flow management using apache nifi
Apache NiFi: Ingesting Enterprise Data At Scale
Hadoop Summit Tokyo Apache NiFi Crash Course
Running Apache NiFi with Apache Spark : Integration Options
Introduction to Apache NiFi 1.11.4
Dataflow Management From Edge to Core with Apache NiFi
NiFi Best Practices for the Enterprise
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
Future of Data New Jersey - HDF 3.0 Deep Dive
Integrating NiFi and Flink
Building a Smarter Home with Apache NiFi and Spark
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
HDF Powered by Apache NiFi Introduction
Ad

Similar to BigData Techcon - Beyond Messaging with Apache NiFi (20)

PDF
[253] apache ni fi
PDF
Apache NiFi - Flow Based Programming Meetup
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PDF
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
PPTX
State of the Apache NiFi Ecosystem & Community
PDF
Apache Nifi Crash Course
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PDF
Curing the Kafka blindness—Streams Messaging Manager
PDF
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
PPTX
The Avant-garde of Apache NiFi
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PDF
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
PDF
Nifi workshop
[253] apache ni fi
Apache NiFi - Flow Based Programming Meetup
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Hortonworks Data in Motion Webinar Series - Part 1
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
NJ Hadoop Meetup - Apache NiFi Deep Dive
State of the Apache NiFi Ecosystem & Community
Apache Nifi Crash Course
Connecting the Drops with Apache NiFi & Apache MiNiFi
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
Data Con LA 2018 - Streaming and IoT by Pat Alwell
The Avant-garde of Apache NiFi
Introduction to Apache NiFi - Seattle Scalability Meetup
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
Nifi workshop
Ad

Recently uploaded (20)

PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Introduction to Artificial Intelligence
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ai tools demonstartion for schools and inter college
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
AI in Product Development-omnex systems
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
System and Network Administration Chapter 2
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Transform Your Business with a Software ERP System
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
Odoo POS Development Services by CandidRoot Solutions
CHAPTER 2 - PM Management and IT Context
How to Choose the Right IT Partner for Your Business in Malaysia
Introduction to Artificial Intelligence
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ai tools demonstartion for schools and inter college
ISO 45001 Occupational Health and Safety Management System
AI in Product Development-omnex systems
Softaken Excel to vCard Converter Software.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Navsoft: AI-Powered Business Solutions & Custom Software Development
Odoo Companies in India – Driving Business Transformation.pdf
System and Network Administration Chapter 2
Operating system designcfffgfgggggggvggggggggg
Transform Your Business with a Software ERP System
How Creative Agencies Leverage Project Management Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
ManageIQ - Sprint 268 Review - Slide Deck

BigData Techcon - Beyond Messaging with Apache NiFi

  • 1. Beyond Messaging Enterprise Dataflow powered by Apache NiFi © Hortonworks Inc. 2011 – 2015. All Rights Reserved Aldrin Piri 3 November 2015
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved About me Senior Member of Technical Staff Project Management Committee and Committer @aldrinpiri
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplistic View of Enterprise Data Flow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Global interactions with customers, business partners, and things spanning different volume, velocity, bandwidth, and latency needs Realistic View of Data Flow
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Meeting Edge Requirements GATHER DELIVER PRIORITIZE Track from the edge Through to the datacenter Small Footprints operate with very little power Limited Bandwidth can create high latency Data Availability exceeds transmission bandwidth Data Must Be Secured throughout its journey
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved • Remote sensor delivery (Internet of Things - IoT) • Intra-site / Inter-site / global distribution (Enterprise) • Ingest for driving analytics (Big Data) • Data Processing (Simple Event Processing) Where do we find data flow?
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Basics of Connecting Systems For every connection, these must agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 Producer C1 Consumer
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved • Messaging addresses only a small subset of the problem space • Needed to understand the big picture • Needed the ability to make immediate changes • Must maintain chain of custody for data • Rigorous security and compliance requirements Challenges of dataflow in the enterprise
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Great options including: • Kafka • ActiveMQ • Tibco Let us consider the perfect messaging system for this talk: • It has zero latency • It has perfect data durability • It supports unlimited consumers and producers Messaging Systems as Dataflow
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “But my system needs…” • A different format and/or schema • To use a different protocol • The highest priority information first • Large objects (event batches) / Small Objects (streams) • Authorization to the data level • Only interested in a subset of data on a topic • Data needs to be enriched/sanitized before it arrives Dataflow as a messaging problem
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Using Messaging Only a subset agree using messaging 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 CN C1 Messaging More issues to consider: • How do you know what the data flow looks like? • How is it managed? • How is it working – today, yesterday?
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved • Add new systems to handle the protocol differences • Add new systems to convert the data • Add new systems to reorder the data • Add new systems to filter the unauthorized data • Add new topics to represent ‘stages of the flow’ Which leads to latency, complexity, and limited retention Ultimately, the operations teams who handle data at flow boundaries become responsible for managing. How these issues are typically solved
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Real-time Data Flow It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache NiFi • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved November 2014 NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator. 2006 NiagaraFiles (NiFi) was first incepted by Joe Witt at the National Security Agency (NSA) A Brief History July 2015 NiFi reaches ASF top-level project status
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Live Demonstration
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Learn more and join us! Apache NiFi site http://nifi.apache.org Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you!

Editor's Notes

  • #15: ----- Meeting Notes (18Sep15 13:08) ----- Take a pause part way through.
  • #17: Introduce Flow Based Programming fundamentals, why they matter, and how NiFi adopts them