SlideShare a Scribd company logo
GuideTo New Features of
Hortonworks DataFlow 2.0
Haimo Liu
Product Manager
Bryan Bende
Sr. Software Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Security
Visualization
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management Flow management + Stream Processing
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWS
Azure
Google Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0 – Data in Motion Platform
Enterprise Services
Ambari Ranger Other services
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Management
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems Today: Timely Access to Data and Decisions
http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise
“HDF helps us to streamline the flow
of data and build models and
visualisations quickly, so that my team
can work iteratively with business
colleagues on building solutions
that work for the business.“
Royal Mail
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP
HORTONWORKS
DATA PLATFORM
Powered by Apache Hadoop
HDF Makes Big Data Ingest Easy
Complicated, messy, and takes weeks to
months to move the right data into Hadoop
Streamlined, Efficient, Easy
HDP
HORTONWORKS
DATA PLATFORM
Powered by Apache Hadoop
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a live dataflow in minutes
How would that change your business?
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Add processor for data intake. Time: 1 minute
1 Drag and drop processor from top menu
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Choose the specific processor
2 Choose one of the processors – currently 170+ available
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Pick Twitter Processor
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure the processor. Time: 2 minutes
3
4
Select processor and choose
option to Configure
Adjust
parameters as
required
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Another processor for data output. Time: 1 minute
5
6 Filter for and select a “Put” processor
Drag and drop processor from top menu
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Configure second processor. Time: 1 minute
7 Configure 2nd processor
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect processors, configure connection. 2 minutes
Configure Connection8
Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Click Start to Begin Processing. Time total: 7 minutes
9 Click start “play” to being processing
(will run continuously until you select stop)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0: what’s new?
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges
Different devices
Globally distributed organization
Intelligence on the edge
Time to delivery
Getting the right data to
the right place at the
right time is not trivial!
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
Globally distributed organizations
Intelligence on the edge
Time to delivery
Support disparate,
distributed systems
with easy drag & drop
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
• Deeper ecosystem integration, 170+ processors in total
Globally distributed organizations
Intelligence on the edge
Time to delivery Expanded ecosystem
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
Email
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deeper Ecosystem Integration – New Processors
Processor Description
Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively
JoltTransformJson Manipulate JSON data on the fly, with a preview functionality
GenerateTableFetch Incremental fetch + parallel fetch against source table partitions
PutHiveQL Ingest to Hive tables
SelectHiveQL Select from Hive tables
PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API
CovertAvroToORC Format conversation, Avro to ORC
Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices: different standards/protocols/formats
• Out of the box processors
• Intuitive GUI to combine processors and build ingestion pipeline
• Extensible framework, extremely easy to add a new source/protocol
• Deeper ecosystem integration, 170+ processors in total
• Redesigned UI, refreshed user experience
Globally distributed organizations
Intelligence on the edge
Time to delivery
More intuitive user
interface
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modernized UI – Complete Interface Redesign
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
Intelligence on the edge
Time to delivery Secure communications
across disparate,
distributed systems
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
Intelligence on the edge
Time to delivery
Simplifies flow
provisioning
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Variable Registry
 Variable registry
– To automatically resolve environmental specific values
• Example: connection string
• The same key referenced in a template, can be mapped to different values
in DEV vs PROD
– In-memory variable registry
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site-to-Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
• Better deployment management, Apache Ambari integration
Intelligence on the edge
Time to delivery Simplified operations in
distributed environments
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Integration
 NiFi cluster management
– Start/stop NiFi service
– Centralized place for managing config files
 Ambari to display NiFi metrics
 Ambari to manage kerberos
authentication
Ambari-NiFi Integration
 Automated deployment by Ambari
 Manual RPM deployment
 Tar.gz/zip deployment (NIFI/MINIFI Java)
 Tar.gz for most Linux/Mac, compile your own
for other OS (MINIFI C++)
HDF 2.0 Deployment Model
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices
Globally distributed organizations: dataflow across multiple data centers
• Internal Site to Site communication, secured by 2-way SSL
• Environmental neutral
• Variable registry
• Better deployment management, Apache Ambari integration
• Enhanced Site to Site communication
Intelligence on the edge
Time to delivery
Modularized s2s to support
pluggable protocols
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizations
Intelligence on the edge: analytics on resource constrained devices
• Run single node on the edge, communicating back via S2S
• Bi-directional communication
Time to delivery
Analytics at the Edge
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations
Intelligence on the edge: analytics on resource constrained devices
• Run single node on the edge, communicating back via Site to Site protocol
• Bi-directional communication
• Apache MiNiFi, bi-directional command and control on the edge
Time to delivery
Edge Intelligence
for the
first mile
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
 Guaranteed delivery
 Data buffering
‒ Backpressure
‒ Pressure release
 Prioritized queuing
 Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
 Data provenance
 Recovery / recording a rolling log
of fine-grained history
 Designed for extension
Different from Apache NiFi
 Design and Deploy
 Warm re-deploys
Key Features
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
FAST AND EASY
To get results, tune and
change dataflows
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
• Control plane high availability, zero-master clustering
High availability
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
 New clustering paradigm
 Zero-master clustering
– Multiple entry points, no master node, no single point of failure
– Auto-elected cluster coordinator for cluster maintenance
– Automatic failover handling
HDF 2.0 (NiFi 1.0.0)
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
Heartbeat messages (every 5s by default)
Node status: connecting/connected/disconnecting/disconnected
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zero-master Clustering
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Challenges & NiFi & HDF 2.0
Different devices, Globally distributed organizations, Intelligence on the edge
Time to delivery: need an application, out of the box solution
• Data provenance, traceability and compliance issues
• Flow visibility, big picture of the enterprise dataflow
• Automatic failure handling
• Control plane high availability, zero-master clustering
• Multi-tenancy flow editing, and authorization
Secured enterprise wide
collaboration
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Flow Editing
 Multi-tenant flow editing
– Self-service collaborative model, google-doc type user experience
– Multiple teams making edits to different processors at the same time
– Only the component being modified is locked, not the entire flow
– Scalable model to speed up flow editing
HDF 2.0 (NiFi 1.0.0)
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
 Component level authorization
– New authorizer API
– “Read” and “Write” permissions
– Protection against unauthorized usage without losing context
 Authorization management
– Internal management (NIFI)
– External management (Ranger, etc.)
HDF 2.0 (NiFi 1.0.0)
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
Read Permission
Processor name
visible
Processor configuration
visible
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-tenant Authorization
NO Read Permission
Processor name & configuration invisible
(content)
Statistics visible
(context)
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:
Data Ingestion and Streaming
https://community.hortonworks.com/

More Related Content

PPTX
Log Analytics Optimization
PPTX
Scaling real time streaming architectures with HDF and Dell EMC Isilon
PPTX
Apache NiFi Toronto Meetup
PPTX
MiNiFi 0.0.1 MeetUp talk
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PPTX
ODPi 101: Who we are, What we do
PPTX
Hortonworks Data In Motion Series Part 3 - HDF Ambari
PPTX
Falcon Meetup
Log Analytics Optimization
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Apache NiFi Toronto Meetup
MiNiFi 0.0.1 MeetUp talk
HDF: Hortonworks DataFlow: Technical Workshop
ODPi 101: Who we are, What we do
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Falcon Meetup

What's hot (20)

PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PPTX
Apache NiFi 1.0 in Nutshell
PPTX
Intro to Spark with Zeppelin
PDF
Attunity Hortonworks Webinar- Sept 22, 2016
PPTX
Double Your Hadoop Hardware Performance with SmartSense
PPTX
Apache NiFi in the Hadoop Ecosystem
PPTX
Mission to NARs with Apache NiFi
PPT
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PDF
What s new in spark 2.3 and spark 2.4
PDF
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
PPTX
Hive present-and-feature-shanghai
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PPTX
Integrating NiFi and Flink
PPTX
Apache Ambari - What's New in 2.2
PDF
What’s new in Apache Spark 2.3 and Spark 2.4
PPTX
Hortonworks Hadoop summit 2011 keynote - eric14
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Hortonworks Data in Motion Webinar Series - Part 1
Apache NiFi 1.0 in Nutshell
Intro to Spark with Zeppelin
Attunity Hortonworks Webinar- Sept 22, 2016
Double Your Hadoop Hardware Performance with SmartSense
Apache NiFi in the Hadoop Ecosystem
Mission to NARs with Apache NiFi
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks Technical Workshop: What's New in HDP 2.3
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
What s new in spark 2.3 and spark 2.4
Connecting the Drops with Apache NiFi & Apache MiNiFi
Hive present-and-feature-shanghai
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Integrating NiFi and Flink
Apache Ambari - What's New in 2.2
What’s new in Apache Spark 2.3 and Spark 2.4
Hortonworks Hadoop summit 2011 keynote - eric14
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Ad

Viewers also liked (20)

PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PPTX
Apache NiFi- MiNiFi meetup Slides
PPTX
Hortonworks Data In Motion Series Part 4
PPTX
Hortonworks Data In Motion Webinar Series Pt. 2
PPTX
Dynamic Column Masking and Row-Level Filtering in HDP
PPTX
Hortonworks Data Cloud for AWS
PPTX
Enabling the Real Time Analytical Enterprise
PPTX
Hive - 1455: Cloud Storage
PPTX
How to Use Apache Zeppelin with HWX HDB
PPTX
Real-Time Data Flows with Apache NiFi
PPTX
Integrating Apache Spark and NiFi for Data Lakes
PPTX
How Universities Use Big Data to Transform Education
PDF
Getting involved with Open Source at the ASF
PDF
Hortonworks technical workshop operations with ambari
PPTX
S3Guard: What's in your consistency model?
PPTX
Top 5 Strategies for Retail Data Analytics
PDF
The path to a Modern Data Architecture in Financial Services
PDF
Pivotal - Advanced Analytics for Telecommunications
PPTX
Edw Optimization Solution
PPTX
Apache Hadoop 0.23
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Apache NiFi- MiNiFi meetup Slides
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Webinar Series Pt. 2
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks Data Cloud for AWS
Enabling the Real Time Analytical Enterprise
Hive - 1455: Cloud Storage
How to Use Apache Zeppelin with HWX HDB
Real-Time Data Flows with Apache NiFi
Integrating Apache Spark and NiFi for Data Lakes
How Universities Use Big Data to Transform Education
Getting involved with Open Source at the ASF
Hortonworks technical workshop operations with ambari
S3Guard: What's in your consistency model?
Top 5 Strategies for Retail Data Analytics
The path to a Modern Data Architecture in Financial Services
Pivotal - Advanced Analytics for Telecommunications
Edw Optimization Solution
Apache Hadoop 0.23
Ad

Similar to Webinar Series Part 5 New Features of HDF 5 (20)

PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
PPTX
HDF Powered by Apache NiFi Introduction
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
PPTX
Druid Scaling Realtime Analytics
PPTX
Future of Data New Jersey - HDF 3.0 Deep Dive
PPTX
State of the Apache NiFi Ecosystem & Community
PDF
Curing the Kafka blindness—Streams Messaging Manager
PDF
HDF 3.1 : An Introduction to New Features
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
PPTX
Hadoop security
PPTX
Building a Smarter Home with Apache NiFi and Spark
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PDF
Apache NiFi - Flow Based Programming Meetup
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
Hive edw-dataworks summit-eu-april-2017
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
HDF Powered by Apache NiFi Introduction
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
NJ Hadoop Meetup - Apache NiFi Deep Dive
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Druid Scaling Realtime Analytics
Future of Data New Jersey - HDF 3.0 Deep Dive
State of the Apache NiFi Ecosystem & Community
Curing the Kafka blindness—Streams Messaging Manager
HDF 3.1 : An Introduction to New Features
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Hadoop security
Building a Smarter Home with Apache NiFi and Spark
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Apache NiFi - Flow Based Programming Meetup
Hadoop Summit Tokyo Apache NiFi Crash Course
Dataflow Management From Edge to Core with Apache NiFi
Hive edw-dataworks summit-eu-april-2017

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
MYSQL Presentation for SQL database connectivity
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
Tartificialntelligence_presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine Learning_overview_presentation.pptx

Webinar Series Part 5 New Features of HDF 5

  • 1. GuideTo New Features of Hortonworks DataFlow 2.0 Haimo Liu Product Manager Bryan Bende Sr. Software Engineer
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Flow management + Stream Processing D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0 – Data in Motion Platform Enterprise Services Ambari Ranger Other services
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problems Today: Timely Access to Data and Decisions http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise “HDF helps us to streamline the flow of data and build models and visualisations quickly, so that my team can work iteratively with business colleagues on building solutions that work for the business.“ Royal Mail
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop HDF Makes Big Data Ingest Easy Complicated, messy, and takes weeks to months to move the right data into Hadoop Streamlined, Efficient, Easy HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Create a live dataflow in minutes How would that change your business?
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Add processor for data intake. Time: 1 minute 1 Drag and drop processor from top menu
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 170+ available
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Pick Twitter Processor
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure the processor. Time: 2 minutes 3 4 Select processor and choose option to Configure Adjust parameters as required
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Another processor for data output. Time: 1 minute 5 6 Filter for and select a “Put” processor Drag and drop processor from top menu
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure second processor. Time: 1 minute 7 Configure 2nd processor
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connect processors, configure connection. 2 minutes Configure Connection8 Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Click Start to Begin Processing. Time total: 7 minutes 9 Click start “play” to being processing (will run continuously until you select stop)
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0: what’s new?
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges Different devices Globally distributed organization Intelligence on the edge Time to delivery Getting the right data to the right place at the right time is not trivial!
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol Globally distributed organizations Intelligence on the edge Time to delivery Support disparate, distributed systems with easy drag & drop
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total Globally distributed organizations Intelligence on the edge Time to delivery Expanded ecosystem
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2 Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute HL7 FTP UDP XML SFTP HTTP Syslog Email HTML Image AMQP MQTT All Apache project logos are trademarks of the ASF and the respective projects. Fetch
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deeper Ecosystem Integration – New Processors Processor Description Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively JoltTransformJson Manipulate JSON data on the fly, with a preview functionality GenerateTableFetch Incremental fetch + parallel fetch against source table partitions PutHiveQL Ingest to Hive tables SelectHiveQL Select from Hive tables PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API CovertAvroToORC Format conversation, Avro to ORC Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total • Redesigned UI, refreshed user experience Globally distributed organizations Intelligence on the edge Time to delivery More intuitive user interface
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Modernized UI – Complete Interface Redesign
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral Intelligence on the edge Time to delivery Secure communications across disparate, distributed systems
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry Intelligence on the edge Time to delivery Simplifies flow provisioning
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Variable Registry  Variable registry – To automatically resolve environmental specific values • Example: connection string • The same key referenced in a template, can be mapped to different values in DEV vs PROD – In-memory variable registry
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site-to-Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration Intelligence on the edge Time to delivery Simplified operations in distributed environments
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Integration  NiFi cluster management – Start/stop NiFi service – Centralized place for managing config files  Ambari to display NiFi metrics  Ambari to manage kerberos authentication Ambari-NiFi Integration  Automated deployment by Ambari  Manual RPM deployment  Tar.gz/zip deployment (NIFI/MINIFI Java)  Tar.gz for most Linux/Mac, compile your own for other OS (MINIFI C++) HDF 2.0 Deployment Model
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration • Enhanced Site to Site communication Intelligence on the edge Time to delivery Modularized s2s to support pluggable protocols
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via S2S • Bi-directional communication Time to delivery Analytics at the Edge
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via Site to Site protocol • Bi-directional communication • Apache MiNiFi, bi-directional command and control on the edge Time to delivery Edge Intelligence for the first mile
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling FAST AND EASY To get results, tune and change dataflows
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering High availability
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering  New clustering paradigm  Zero-master clustering – Multiple entry points, no master node, no single point of failure – Auto-elected cluster coordinator for cluster maintenance – Automatic failover handling HDF 2.0 (NiFi 1.0.0)
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering Heartbeat messages (every 5s by default) Node status: connecting/connected/disconnecting/disconnected
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering • Multi-tenancy flow editing, and authorization Secured enterprise wide collaboration
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Flow Editing  Multi-tenant flow editing – Self-service collaborative model, google-doc type user experience – Multiple teams making edits to different processors at the same time – Only the component being modified is locked, not the entire flow – Scalable model to speed up flow editing HDF 2.0 (NiFi 1.0.0)
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization  Component level authorization – New authorizer API – “Read” and “Write” permissions – Protection against unauthorized usage without losing context  Authorization management – Internal management (NIFI) – External management (Ranger, etc.) HDF 2.0 (NiFi 1.0.0)
  • 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization Read Permission Processor name visible Processor configuration visible
  • 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization NO Read Permission Processor name & configuration invisible (content) Statistics visible (context)
  • 46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/

Editor's Notes

  • #3: Hortonworks: Powering the Future of Data
  • #7: Hortonworks: Powering the Future of Data
  • #8: 7
  • #22: Hortonworks: Powering the Future of Data
  • #35: 34