SlideShare a Scribd company logo
Bringing Real-Time to the
Enterprise with
Hortonworks DataFlow
Cavan Loughran – TELUS
Oliver Meyn – T4G
June 2017
2TELUS Public
Introduction to TELUS and Optik TV®
Starting state
Target state
Journey
Lessons learned
Agenda / Today’s Talk
1
2
3
4
5
TELUS Public 3
 TELUS is Canada’s fastest-growing national
telecommunications company, with $12.9B of annual
revenue and 12.7M customer connections.
 TELUS provides Optik TV® to 1.1M customers
 At TELUS, our goal is to delight our customers by
continuously improving the service/viewing experience.
TELUS and Optik TV®
TELUS Public 4
Watching TV @Home
Content
Provider
eg. NBC
Service
Creation Service Delivery
Encoding &
DRM
IP/Internet/CDN
Network
Access
Technology
Eg.
Fibre/Copper
Your
Neighborhood Your Home Your TV
Home
Gateway
IPTV
STB
(Smart
Device)
Video flows as data over IP
Streaming video monitoring
TELUS Public 5
IPTV Set Top Box (STB)
TELUS Public 6
 STB’s report diagnostic logs every 12 hours.
 Daily batch load into Datalake using scp & pig.
 Through daily analysis we create and action key
insights for the network and individual services.
Starting State
TELUS Public 7
12 hour interval?
TV is a streaming real-time service, a
check point every 12 hours hides too many
things from analysis and timely action.
TELUS Public 8
 Identify and take action to correct a problem within minutes.
 Predict when a device and/or service is in danger of
degrading or failing.
 How to achieve Target State:
– Increase STB reporting frequency ~50X from twice a day to 96 per day.
– Streaming analytics (ML) to monitor each STB and take immediate
action.
Target State
TELUS Public 9
Watching TV @Home
Content
Provider
eg. NBC
Service
Creation Service Delivery
Encoding &
DRM
IP/Internet/CDN
Network
Access
Technology
Eg.
Fibre/Copper
Your
Neighborhood Your Home Your TV
Home
Gateway
IPTV
STB
(Smart
Device)
Video flows as data over IP
Streaming video monitoring
Target
state
TELUS Public 10
Journey
TELUS Public 11
 TELUS supports many enterprise use cases with a
single “Datalake”, aka multi-tenancy.
 HDP 2.4 (ships with Spark 1.6 and Kafka 0.9)
 HDF 2.1 (always latest NiFi)
 Encryption & Kerberos in and out of Datalake.
 Not one tutorial on the web cares about security 
Constraints
TELUS Public 12
®
First Attempt - Kafka
Set Top
Boxes
Actions /
Alerts
Insights /
Analytics
Hadoop network
TELUS Public 13
 Kafka 0.9 introduced SSL/TLS, but Spark 1.6 uses
the Kafka 0.8 client library, so can’t do SSL.
 Solved by using Spark 2.x, but:
 NiFi can only use one Kerberos principal
Kafka, thwarted!
TELUS Public 14
 Site to Site (S2S) provides two way SSL between
NiFi instances.
 SparkStreaming can stream using S2S protocol.
 Careful… No easy way to limit NiFi resources by
user/group.
Multiple NiFi?
TELUS Public 15
 NiFi outside the Datalake does the heavy work to
process incoming STB logs.
 NiFi within the Datalake does the lighter work:
– Store in HDFS
– Feeds Spark Streaming using S2S via 2-way SSL
– Lighter load improves multi-tenancy (scale) for NiFi
Second Attempt – NiFi Site to Site
TELUS Public 16
Final Architecture – NiFi S2S
Set Top
Boxes
Actions &
Alerts
Insights &
Analytics
Hadoop network
ML
results
Store
®
TELUS Public 17
 No prior labels to demonstrate a failing STB
 Unsupervised classifier as first pass
 SparkML using KMeans to build model
Problem Detection
TELUS Public 18
Model Building
TELUS Public 19
Lessons Learned
TELUS Public 20
 Java 8 everywhere
– SSL ‘fun’ with J7 vs J8
– NiFi >= 1.0 libraries compiled with J8
 At least Spark 2.0, ideally 2.1 or newer
– SSL/TLS with Kafka
– Speed
– Improved SparkML
– Structured Streaming
Lessons Learned
TELUS Public 21
 NiFi  Kafka support one Kerberos principal
per NiFi server.
 NiFi  HDFS supports multiple Kerberos
principals, however, server process can read all
keytabs.
More Lessons Learned
TELUS Public 22
 NiFi site to site squashes identity to that of the
sending server.
 Get good at SSL and how certificates work.
 Highly secured multi-tenancy is hard!
More Lessons Learned
TELUS Public 23
Cavan Loughran, TELUS, www.telus.com
Oliver Meyn, T4G, www.t4g.com
Thank you!
TELUS Public 24
the future is friendly.

More Related Content

PDF
Hadoop 3 @ Hadoop Summit San Jose 2017
PPTX
YARN and the Docker container runtime
PPTX
Bringing complex event processing to Spark streaming
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PPTX
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
PPTX
Deep Learning with Spark and GPUs
PPTX
Evolving HDFS to Generalized Storage Subsystem
PPTX
Scale-Out Resource Management at Microsoft using Apache YARN
Hadoop 3 @ Hadoop Summit San Jose 2017
YARN and the Docker container runtime
Bringing complex event processing to Spark streaming
Apache Hadoop 3.0 What's new in YARN and MapReduce
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Deep Learning with Spark and GPUs
Evolving HDFS to Generalized Storage Subsystem
Scale-Out Resource Management at Microsoft using Apache YARN

What's hot (20)

PPTX
Ingest and Stream Processing - What will you choose?
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
PPTX
Apache Hadoop YARN: Present and Future
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
PDF
Apache kafka
PPTX
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
PPTX
PPTX
HDFS Tiered Storage: Mounting Object Stores in HDFS
PPTX
Hadoop & cloud storage object store integration in production (final)
PPTX
Running a container cloud on YARN
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
PPTX
Lessons learned from running Spark on Docker
PDF
Spark Uber Development Kit
PPTX
Hdfs 2016-hadoop-summit-san-jose-v4
PDF
Scaling Hadoop at LinkedIn
PPTX
To The Cloud and Back: A Look At Hybrid Analytics
PPTX
Schema Registry - Set Your Data Free
PPTX
The Future of Apache Ambari
PPTX
CBlocks - Posix compliant files systems for HDFS
Ingest and Stream Processing - What will you choose?
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Apache Hadoop YARN: Present and Future
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Apache kafka
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
HDFS Tiered Storage: Mounting Object Stores in HDFS
Hadoop & cloud storage object store integration in production (final)
Running a container cloud on YARN
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons learned from running Spark on Docker
Spark Uber Development Kit
Hdfs 2016-hadoop-summit-san-jose-v4
Scaling Hadoop at LinkedIn
To The Cloud and Back: A Look At Hybrid Analytics
Schema Registry - Set Your Data Free
The Future of Apache Ambari
CBlocks - Posix compliant files systems for HDFS
Ad

Similar to Bringing Real-Time to the Enterprise with Hortonworks DataFlow (20)

PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
PPTX
NiFi Best Practices for the Enterprise
PDF
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PDF
Curing the Kafka blindness—Streams Messaging Manager
PPTX
BigData Techcon - Beyond Messaging with Apache NiFi
PPTX
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
PPTX
Building a Smarter Home with Apache NiFi and Spark
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
PDF
Igniting Audience Measurement at Time Warner Cable
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
PDF
Dataflow Management From Edge to Core with Apache NiFi
PDF
HDF: Hortonworks DataFlow: Technical Workshop
PPTX
Make Streaming IoT Analytics Work for You
PPTX
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PDF
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
NiFi Best Practices for the Enterprise
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
BigData Techcon - Beyond Messaging with Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Building a Smarter Home with Apache NiFi and Spark
Best practices and lessons learnt from Running Apache NiFi at Renault
Igniting Audience Measurement at Time Warner Cable
NJ Hadoop Meetup - Apache NiFi Deep Dive
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Dataflow Management From Edge to Core with Apache NiFi
HDF: Hortonworks DataFlow: Technical Workshop
Make Streaming IoT Analytics Work for You
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
Hadoop Summit Tokyo Apache NiFi Crash Course
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Bringing Real-Time to the Enterprise with Hortonworks DataFlow

  • 1. Bringing Real-Time to the Enterprise with Hortonworks DataFlow Cavan Loughran – TELUS Oliver Meyn – T4G June 2017
  • 2. 2TELUS Public Introduction to TELUS and Optik TV® Starting state Target state Journey Lessons learned Agenda / Today’s Talk 1 2 3 4 5
  • 3. TELUS Public 3  TELUS is Canada’s fastest-growing national telecommunications company, with $12.9B of annual revenue and 12.7M customer connections.  TELUS provides Optik TV® to 1.1M customers  At TELUS, our goal is to delight our customers by continuously improving the service/viewing experience. TELUS and Optik TV®
  • 4. TELUS Public 4 Watching TV @Home Content Provider eg. NBC Service Creation Service Delivery Encoding & DRM IP/Internet/CDN Network Access Technology Eg. Fibre/Copper Your Neighborhood Your Home Your TV Home Gateway IPTV STB (Smart Device) Video flows as data over IP Streaming video monitoring
  • 5. TELUS Public 5 IPTV Set Top Box (STB)
  • 6. TELUS Public 6  STB’s report diagnostic logs every 12 hours.  Daily batch load into Datalake using scp & pig.  Through daily analysis we create and action key insights for the network and individual services. Starting State
  • 7. TELUS Public 7 12 hour interval? TV is a streaming real-time service, a check point every 12 hours hides too many things from analysis and timely action.
  • 8. TELUS Public 8  Identify and take action to correct a problem within minutes.  Predict when a device and/or service is in danger of degrading or failing.  How to achieve Target State: – Increase STB reporting frequency ~50X from twice a day to 96 per day. – Streaming analytics (ML) to monitor each STB and take immediate action. Target State
  • 9. TELUS Public 9 Watching TV @Home Content Provider eg. NBC Service Creation Service Delivery Encoding & DRM IP/Internet/CDN Network Access Technology Eg. Fibre/Copper Your Neighborhood Your Home Your TV Home Gateway IPTV STB (Smart Device) Video flows as data over IP Streaming video monitoring Target state
  • 11. TELUS Public 11  TELUS supports many enterprise use cases with a single “Datalake”, aka multi-tenancy.  HDP 2.4 (ships with Spark 1.6 and Kafka 0.9)  HDF 2.1 (always latest NiFi)  Encryption & Kerberos in and out of Datalake.  Not one tutorial on the web cares about security  Constraints
  • 12. TELUS Public 12 ® First Attempt - Kafka Set Top Boxes Actions / Alerts Insights / Analytics Hadoop network
  • 13. TELUS Public 13  Kafka 0.9 introduced SSL/TLS, but Spark 1.6 uses the Kafka 0.8 client library, so can’t do SSL.  Solved by using Spark 2.x, but:  NiFi can only use one Kerberos principal Kafka, thwarted!
  • 14. TELUS Public 14  Site to Site (S2S) provides two way SSL between NiFi instances.  SparkStreaming can stream using S2S protocol.  Careful… No easy way to limit NiFi resources by user/group. Multiple NiFi?
  • 15. TELUS Public 15  NiFi outside the Datalake does the heavy work to process incoming STB logs.  NiFi within the Datalake does the lighter work: – Store in HDFS – Feeds Spark Streaming using S2S via 2-way SSL – Lighter load improves multi-tenancy (scale) for NiFi Second Attempt – NiFi Site to Site
  • 16. TELUS Public 16 Final Architecture – NiFi S2S Set Top Boxes Actions & Alerts Insights & Analytics Hadoop network ML results Store ®
  • 17. TELUS Public 17  No prior labels to demonstrate a failing STB  Unsupervised classifier as first pass  SparkML using KMeans to build model Problem Detection
  • 20. TELUS Public 20  Java 8 everywhere – SSL ‘fun’ with J7 vs J8 – NiFi >= 1.0 libraries compiled with J8  At least Spark 2.0, ideally 2.1 or newer – SSL/TLS with Kafka – Speed – Improved SparkML – Structured Streaming Lessons Learned
  • 21. TELUS Public 21  NiFi  Kafka support one Kerberos principal per NiFi server.  NiFi  HDFS supports multiple Kerberos principals, however, server process can read all keytabs. More Lessons Learned
  • 22. TELUS Public 22  NiFi site to site squashes identity to that of the sending server.  Get good at SSL and how certificates work.  Highly secured multi-tenancy is hard! More Lessons Learned
  • 23. TELUS Public 23 Cavan Loughran, TELUS, www.telus.com Oliver Meyn, T4G, www.t4g.com Thank you!
  • 24. TELUS Public 24 the future is friendly.