SlideShare a Scribd company logo
Beyond Messaging
Enterprise Dataflow powered by Apache NiFi
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Aldrin Piri
DEVIEW 2015
2015.09.15
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About me
Member of Technical Staff
Project Management Committee and Committer
@aldrinpiri
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 4
• Remote sensor delivery (Internet of Things - IoT)
• Intra-site / Inter-site / global distribution (Enterprise)
• Ingest for driving analytics (Big Data)
• Data Processing (Simple Event Processing)
Where do we find data flow?
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Basics of Connecting Systems
For every connection,
these must agree:
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
Producer
C1
Consumer
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 6
• Messaging addresses only a small subset of the problem space
• Needed to understand the big picture
• Needed the ability to make immediate changes
• Must maintain chain of custody for data
• Rigorous security and compliance requirements
Challenges of dataflow in the enterprise
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 7
Great options including:
• Kafka
• ActiveMQ
• Tibco
Let us consider the perfect messaging system for this talk:
• It has zero latency
• It has perfect data durability
• It supports unlimited consumers and producers
Messaging Systems as Dataflow
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 8
“But my system needs…”
• A different format and/or schema
• To use a different protocol
• The highest priority information first
• Large objects (event batches) / Small Objects (streams)
• Authorization to the data level
• Only interested in a subset of data on a topic
• Data needs to be enriched/sanitized before it arrives
Dataflow as a messaging problem
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using Messaging
Only a subset agree
using messaging
1. Protocol
2. Format
3. Schema
4. Priority
5. Size of event
6. Frequency of event
7. Authorization access
8. Relevance
P1
CN
C1
Messaging
More issues to consider:
• How do you know what the data flow looks like?
• How is it managed?
• How is it working – today, yesterday?
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 10
• Add new systems to handle the protocol differences
• Add new systems to convert the data
• Add new systems to reorder the data
• Add new systems to filter the unauthorized data
• Add new topics to represent ‘stages of the flow’
Which leads to latency, complexity, and limited retention
Ultimately, the operations teams who handle data at flow boundaries become
responsible for managing.
How these issues are typically solved
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Real-time Data Flow
It’s not just how quickly you
move data – it’s about how
quickly you can change behavior
and seize new opportunities
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache NiFi
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 13
November 2014
NiFi is donated to the Apache Software Foundation
(ASF) through NSA’s Technology Transfer Program
and enters ASF’s incubator.
2006
NiagaraFiles (NiFi) was first incepted by Joe Witt at
the National Security Agency (NSA)
A Brief History
July 2015
NiFi reaches ASF top-level project status
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing,
transformation, or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages
the threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send
data via ports. A process group allows creation of entirely new
component simply by composition of its components.
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Live Demonstration
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Feature Proposals – Status
FUTURE Better integration with Apache Kafka
FUTURE Clustering redesign
IN PROGRESS Configuration management of flows
STARTED Extension and template registry
RELEASE COMING SOON First-class Avro support
1
STARTED Interactive queue management
STARTED Multi-tenant data flow
FUTURE Pluggable authentication
FUTURE Reference-able process groups
FUTURE Variable registry
FUTURE ‘Wormhole’ connections
https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Learn more and join us!
Apache NiFi site
http://nifi.apache.org
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you!

More Related Content

PDF
[262] netflix 빅데이터 플랫폼
PDF
Chirp 2010: Scaling Twitter
PDF
[214]유연하고 확장성 있는 빅데이터 처리
PPTX
Introduction to Storm
PDF
Realtime Analytics with Storm and Hadoop
PDF
Hdfs high availability
PDF
PHP Backends for Real-Time User Interaction using Apache Storm.
[262] netflix 빅데이터 플랫폼
Chirp 2010: Scaling Twitter
[214]유연하고 확장성 있는 빅데이터 처리
Introduction to Storm
Realtime Analytics with Storm and Hadoop
Hdfs high availability
PHP Backends for Real-Time User Interaction using Apache Storm.

What's hot (20)

PPTX
Scaling Apache Storm (Hadoop Summit 2015)
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
PDF
Scaling Instagram
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
PPTX
Apache Storm 0.9 basic training - Verisign
PPTX
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
PDF
Storm: distributed and fault-tolerant realtime computation
PDF
Real-time streams and logs with Storm and Kafka
PDF
Learning Stream Processing with Apache Storm
PDF
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
PPTX
Multi-Tenant Storm Service on Hadoop Grid
PPTX
Introduction to Storm
PDF
Datacenter Computing with Apache Mesos - BigData DC
PPTX
Multi-tenant Apache Storm as a service
PDF
The Future of Apache Storm
PPT
Nov 2010 HUG: Fuzzy Table - B.A.H
PPTX
Apache Storm Internals
PPTX
Spark vs storm
PDF
Presto at Tivo, Boston Hadoop Meetup
PDF
Real-time Big Data Processing with Storm
Scaling Apache Storm (Hadoop Summit 2015)
Real-Time Big Data at In-Memory Speed, Using Storm
Scaling Instagram
Scaling Apache Storm - Strata + Hadoop World 2014
Apache Storm 0.9 basic training - Verisign
Message Queuing on a Large Scale: IMVUs stateful real-time message queue for ...
Storm: distributed and fault-tolerant realtime computation
Real-time streams and logs with Storm and Kafka
Learning Stream Processing with Apache Storm
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
Multi-Tenant Storm Service on Hadoop Grid
Introduction to Storm
Datacenter Computing with Apache Mesos - BigData DC
Multi-tenant Apache Storm as a service
The Future of Apache Storm
Nov 2010 HUG: Fuzzy Table - B.A.H
Apache Storm Internals
Spark vs storm
Presto at Tivo, Boston Hadoop Meetup
Real-time Big Data Processing with Storm
Ad

Viewers also liked (20)

PDF
[233] level 2 network programming using packet ngin rtos
PDF
[212] large scale backend service develpment
PDF
[261] 실시간 추천엔진 머신한대에 구겨넣기
PDF
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
PDF
[221] docker orchestration
PDF
[252] 증분 처리 플랫폼 cana 개발기
PDF
[245] presto 내부구조 파헤치기
PDF
[251] implementing deep learning using cu dnn
PDF
[223] h base consistent secondary indexing
PDF
[231] the simplicity of cluster apps with circuit
PDF
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
PDF
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
PDF
[224] 번역 모델 기반_질의_교정_시스템
PDF
[263] s2graph large-scale-graph-database-with-hbase-2
PDF
[243] turning data into value
PDF
[244] 분산 환경에서 스트림과 배치 처리 통합 모델
PDF
[242] wifi를 이용한 실내 장소 인식하기
PDF
[211] 네이버 검색과 데이터마이닝
PDF
[214] data science with apache zeppelin
PDF
[222]대화 시스템 서비스 동향 및 개발 방법
[233] level 2 network programming using packet ngin rtos
[212] large scale backend service develpment
[261] 실시간 추천엔진 머신한대에 구겨넣기
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[221] docker orchestration
[252] 증분 처리 플랫폼 cana 개발기
[245] presto 내부구조 파헤치기
[251] implementing deep learning using cu dnn
[223] h base consistent secondary indexing
[231] the simplicity of cluster apps with circuit
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[224] 번역 모델 기반_질의_교정_시스템
[263] s2graph large-scale-graph-database-with-hbase-2
[243] turning data into value
[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[242] wifi를 이용한 실내 장소 인식하기
[211] 네이버 검색과 데이터마이닝
[214] data science with apache zeppelin
[222]대화 시스템 서비스 동향 및 개발 방법
Ad

Similar to [253] apache ni fi (20)

PPTX
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
PPTX
BigData Techcon - Beyond Messaging with Apache NiFi
PDF
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
PDF
Apache NiFi - Flow Based Programming Meetup
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PPTX
Apache NiFi Toronto Meetup
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PPTX
HDF Powered by Apache NiFi Introduction
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PDF
Curing the Kafka blindness—Streams Messaging Manager
PDF
Apache Nifi Crash Course
PPTX
Apache NiFi Crash Course Intro
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
PDF
Nifi workshop
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Apache NiFi - Flow Based Programming Meetup
HDF: Hortonworks DataFlow: Technical Workshop
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Hortonworks Data in Motion Webinar Series - Part 1
Apache NiFi Toronto Meetup
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Introduction to Apache NiFi - Seattle Scalability Meetup
HDF Powered by Apache NiFi Introduction
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
Apache Nifi Crash Course
Apache NiFi Crash Course Intro
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Nifi workshop

More from NAVER D2 (20)

PDF
[211] 인공지능이 인공지능 챗봇을 만든다
PDF
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
PDF
[215] Druid로 쉽고 빠르게 데이터 분석하기
PDF
[245]Papago Internals: 모델분석과 응용기술 개발
PDF
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
PDF
[235]Wikipedia-scale Q&A
PDF
[244]로봇이 현실 세계에 대해 학습하도록 만들기
PDF
[243] Deep Learning to help student’s Deep Learning
PDF
[234]Fast & Accurate Data Annotation Pipeline for AI applications
PDF
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
PDF
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
PDF
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
PDF
[224]네이버 검색과 개인화
PDF
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
PDF
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
PDF
[213] Fashion Visual Search
PDF
[232] TensorRT를 활용한 딥러닝 Inference 최적화
PDF
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
PDF
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
PDF
[223]기계독해 QA: 검색인가, NLP인가?
[211] 인공지능이 인공지능 챗봇을 만든다
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[215] Druid로 쉽고 빠르게 데이터 분석하기
[245]Papago Internals: 모델분석과 응용기술 개발
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[235]Wikipedia-scale Q&A
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[243] Deep Learning to help student’s Deep Learning
[234]Fast & Accurate Data Annotation Pipeline for AI applications
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[224]네이버 검색과 개인화
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[213] Fashion Visual Search
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[223]기계독해 QA: 검색인가, NLP인가?

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced IT Governance
PDF
Modernizing your data center with Dell and AMD
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Monthly Chronicles - July 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
Advanced IT Governance
Modernizing your data center with Dell and AMD
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Monthly Chronicles - July 2025

[253] apache ni fi

  • 1. Beyond Messaging Enterprise Dataflow powered by Apache NiFi © Hortonworks Inc. 2011 – 2015. All Rights Reserved Aldrin Piri DEVIEW 2015 2015.09.15
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved About me Member of Technical Staff Project Management Committee and Committer @aldrinpiri
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplistic View of Enterprise Data Flow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 4 • Remote sensor delivery (Internet of Things - IoT) • Intra-site / Inter-site / global distribution (Enterprise) • Ingest for driving analytics (Big Data) • Data Processing (Simple Event Processing) Where do we find data flow?
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Basics of Connecting Systems For every connection, these must agree: 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 Producer C1 Consumer
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 6 • Messaging addresses only a small subset of the problem space • Needed to understand the big picture • Needed the ability to make immediate changes • Must maintain chain of custody for data • Rigorous security and compliance requirements Challenges of dataflow in the enterprise
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 7 Great options including: • Kafka • ActiveMQ • Tibco Let us consider the perfect messaging system for this talk: • It has zero latency • It has perfect data durability • It supports unlimited consumers and producers Messaging Systems as Dataflow
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 8 “But my system needs…” • A different format and/or schema • To use a different protocol • The highest priority information first • Large objects (event batches) / Small Objects (streams) • Authorization to the data level • Only interested in a subset of data on a topic • Data needs to be enriched/sanitized before it arrives Dataflow as a messaging problem
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Using Messaging Only a subset agree using messaging 1. Protocol 2. Format 3. Schema 4. Priority 5. Size of event 6. Frequency of event 7. Authorization access 8. Relevance P1 CN C1 Messaging More issues to consider: • How do you know what the data flow looks like? • How is it managed? • How is it working – today, yesterday?
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 10 • Add new systems to handle the protocol differences • Add new systems to convert the data • Add new systems to reorder the data • Add new systems to filter the unauthorized data • Add new topics to represent ‘stages of the flow’ Which leads to latency, complexity, and limited retention Ultimately, the operations teams who handle data at flow boundaries become responsible for managing. How these issues are typically solved
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Real-time Data Flow It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache NiFi • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 13 November 2014 NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator. 2006 NiagaraFiles (NiFi) was first incepted by Joe Witt at the National Security Agency (NSA) A Brief History July 2015 NiFi reaches ASF top-level project status
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Live Demonstration
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Feature Proposals – Status FUTURE Better integration with Apache Kafka FUTURE Clustering redesign IN PROGRESS Configuration management of flows STARTED Extension and template registry RELEASE COMING SOON First-class Avro support 1 STARTED Interactive queue management STARTED Multi-tenant data flow FUTURE Pluggable authentication FUTURE Reference-able process groups FUTURE Variable registry FUTURE ‘Wormhole’ connections https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Learn more and join us! Apache NiFi site http://nifi.apache.org Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you!