SlideShare a Scribd company logo
pagepageYour Company Name Presentation 1
Gene Peters
Cask HQ
5.10.2017
Turning a Data Pond into a
Data Lake
With Apache NiFi
page
Introduction
Gene F. Peters
Founder & CTO
2
Senior Data Engineer, Kixeye
Consulting Engineer, Pebble, Tilt
College of William & Mary
pagepage 3
The Evolution of a Data Pipeline: Starting Out
pagepage 4
The Evolution of a Data Pipeline: Growing More Complex
pagepage 5
The Evolution of a Data Pipeline: Enterprise Scale
pagepage 6
Enter Apache NiFi
• Function: real-time data
ingest and routing
• Developed and open
sourced by the NSA
• v1.0 released August 30,
2016, v1.2.0 released May
8, 2017
• Hortonworks major
sponsor behind this
project
pagepage 7
Technology
• Write-ahead
transaction log for
changes to content
• At least once message
guarantee
• Horizontally Scalable
• Single node or cluster
deployment
pagepage 8
Terminology and API
• Processors
• Relationships
• Controller Services
• Process Groups
pagepage 9
Security
• Control Plane
• Pluggable Kerberos /
LDAP integration
• Per-component access
control
• Full usage auditing
• Data Plane
• SSL encryption of IPC
• Encryption of data at rest
pagepage 10
Use Case: On-Premise -> Cloud Replication
pagepage 11
Use Case: Cloud to Cloud Replication
pagepage 12
Use Case: Data Ingest from REST APIs
pagepage 13
Use Case: Data Conversion and Transformation
pagepage 14
Use Case: RDB to NoSQL
pagepage 15
Putting it all together: A Data Lake in the Cloud
pagepage 16
Demo
• Demo: Starting up a Google Cloud data lake with
Salesforce and Postgres sources
pagepage 17
Alternatives
• Streamsets
• Airflow
• Pentaho
• Storm / Flink / Spark Streaming
• Cloud-based migration tools
• Homebuilt Code
page
Questions and Discussion
18
page
Contact Info
facebook.com/telligentdata
twitter.com/telligentdata
Thank You
info@telligent-data.com
angel.co/telligent-data
(415) 758-0155
19

More Related Content

PDF
Database, data storage, hosting with Firebase
PPTX
NGINX MRA Fabric Model Release and Ask Me Anything Part 4
PPTX
Firebase - cloud based real time database
PPTX
Running OpenStack in Production
PDF
Elastic Cloud Enterprise in Azure with Devon
PPTX
Leveraging OpenStack at Scale: How the Elastic Cloud Drives Innovation Velocity
PPTX
Fast SAP system provisioning based on CloudStack
PDF
Unlocking the Cloud operating model with GitHub Actions
Database, data storage, hosting with Firebase
NGINX MRA Fabric Model Release and Ask Me Anything Part 4
Firebase - cloud based real time database
Running OpenStack in Production
Elastic Cloud Enterprise in Azure with Devon
Leveraging OpenStack at Scale: How the Elastic Cloud Drives Innovation Velocity
Fast SAP system provisioning based on CloudStack
Unlocking the Cloud operating model with GitHub Actions

What's hot (20)

PPTX
CloudStack EU user group - fast SAP provisioning
PPTX
Succeeding with OpenStack in the Enterprise (OpenStack Summit Austin 2016)
PPTX
OpenStack at Bloomberg
PPTX
Introduction to Firebase
PDF
Elastic on a Hyper-Converged Infrastructure for Operational Log Analytics
PPTX
What's Next for OpenStack at Walmart
PDF
Consul by Mitchell - HashiCorp Meetup
PDF
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
PDF
Better Search and Business Analytics at Southern Glazer’s Wine & Spirits
PDF
Building the future of Digital Television and Enterprise Database Management ...
PDF
Managing Content with the Nuxeo Platform - CXP ECM Event
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PDF
Monitoring IPv6 Networks
PPTX
Trilio for OpenStack; Protect, Recover, Migrate
PDF
Building a reliable and cost effect logging system at Box
PDF
CSX: Real-time Business Discovery with the Elastic Stack
PDF
Lenovo: Elastic Stack Practices in Enterprise Integration
PPTX
2013 Collaboration Tour - Keynote
PPTX
Cloud stack user group - Welcome
PDF
Search for All with Elastic Workplace Search
CloudStack EU user group - fast SAP provisioning
Succeeding with OpenStack in the Enterprise (OpenStack Summit Austin 2016)
OpenStack at Bloomberg
Introduction to Firebase
Elastic on a Hyper-Converged Infrastructure for Operational Log Analytics
What's Next for OpenStack at Walmart
Consul by Mitchell - HashiCorp Meetup
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
Better Search and Business Analytics at Southern Glazer’s Wine & Spirits
Building the future of Digital Television and Enterprise Database Management ...
Managing Content with the Nuxeo Platform - CXP ECM Event
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Monitoring IPv6 Networks
Trilio for OpenStack; Protect, Recover, Migrate
Building a reliable and cost effect logging system at Box
CSX: Real-time Business Discovery with the Elastic Stack
Lenovo: Elastic Stack Practices in Enterprise Integration
2013 Collaboration Tour - Keynote
Cloud stack user group - Welcome
Search for All with Elastic Workplace Search
Ad

Similar to Turning a Data Pond into a Data Lake with Apache NiFi (20)

PPTX
Open Marketing Meeting 03/27/2013
PDF
Moving to hyper-converged? Don’t forget about data protection.
PPTX
Integrating Apache Spark and NiFi for Data Lakes
PDF
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
PPTX
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
PDF
451 Research: Data Is the Key to Friction in DevOps
PPTX
Unleash the Power of Equinix: Digital Transformation through Interconnection
PPTX
OpenStack Grizzly Release
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
PPTX
Analytics at the Speed of Thought: Actian Express Overview
PPTX
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
PDF
Considering bare metal as a viable cloud option
PPTX
GraphTour - Neo4j Database Overview
PPTX
Delivering High Performance Websites with NGINX
PDF
It's Prime Time for OpenStack--What Are You Waiting For?
PPTX
10 Reasons Snowflake Is Great for Analytics
PDF
HBase Meetup @ Cask HQ 09/25
PPTX
Icehouse Release Webinar
PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Open Marketing Meeting 03/27/2013
Moving to hyper-converged? Don’t forget about data protection.
Integrating Apache Spark and NiFi for Data Lakes
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
451 Research: Data Is the Key to Friction in DevOps
Unleash the Power of Equinix: Digital Transformation through Interconnection
OpenStack Grizzly Release
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Analytics at the Speed of Thought: Actian Express Overview
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Considering bare metal as a viable cloud option
GraphTour - Neo4j Database Overview
Delivering High Performance Websites with NGINX
It's Prime Time for OpenStack--What Are You Waiting For?
10 Reasons Snowflake Is Great for Analytics
HBase Meetup @ Cask HQ 09/25
Icehouse Release Webinar
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Ad

Recently uploaded (20)

PPTX
ai tools demonstartion for schools and inter college
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
AI in Product Development-omnex systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administraation Chapter 3
PPT
Introduction Database Management System for Course Database
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
System and Network Administration Chapter 2
PDF
Digital Strategies for Manufacturing Companies
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Online Work Permit System for Fast Permit Processing
ai tools demonstartion for schools and inter college
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Illustrator 28.6 Crack My Vision of Vector Design
AI in Product Development-omnex systems
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administraation Chapter 3
Introduction Database Management System for Course Database
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Operating system designcfffgfgggggggvggggggggg
Odoo Companies in India – Driving Business Transformation.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
System and Network Administration Chapter 2
Digital Strategies for Manufacturing Companies
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Odoo POS Development Services by CandidRoot Solutions
Which alternative to Crystal Reports is best for small or large businesses.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Online Work Permit System for Fast Permit Processing

Turning a Data Pond into a Data Lake with Apache NiFi