Self-Service Analytics on Hadoop: Lessons Learned

Self-Service Analytics on Hadoop: Lessons Learned
June 29, 2016
Drew Leamon
Director – Advanced Technology Solutions

Comcast: Shaping the Future of Media and Technology
High Speed
Internet
Video
IP
Telephony
Home
Security /
Automation
Universal
Parks
Media
Properties

Forecast
Engineering
Design
Budget
Engineering Analysis: Global Central Analysis Team

Animals are Best Suited in Their Native Habitat

Spreadsheets: The Natural Habitat of Analysts

Evolution of Self Service Analytics
SSRS

Self Service: Native Habitat
Limitations of the Spreadsheet Native Habitat
• 1 Million Row Max
Self Service
• Not Even Medium Data
• Not Collaborative
• No Automation
• Not Repeatable
IT Analyst

Self Service: How We Started
Analyst goes to IT, makes request, waited weeks to get results
SSRS
• 10 TB Storage
• 1 Compute Node
Not Self Service
• 10 TB (Medium Data)
• Limited Compute
• IT Hand-off
• Consultative service
• Not self service.
IT Analysts

Bigger database still meant building dashboards for team
IT Analysts
Still Not Self Service
• 100s TBs (Large Data)
• Data silos
• IT Hand-off
• Consultative service
• Analysts not SQL experts
Graduated to Specialized Databases
• Clustered Storage
• Columnar Compression
• Clustered Compute

Datameer, native on Hadoop, enables self-service for big data
Analysts
True Self Service
• PB == Big Data
• Data Lake
• Excel-like UI
• No more waiting for IT
Self Service: The New Way
• Clustered Storage
• Columnar Compression
• Clustered Compute
• Liberated Data

11
Multiple Configurations for Big Data

12
Engineering
Analysis
IP
Telephony
Video
Research
IP Video
Engineering
X1
Operations
Advanced
Advertising
Web
Analytics
Enterprise
Business
Intelligence
Network
EngineeringMature
Evolving
On-Boarded
On-Deck
Expanding Use Cases with Datameer

Use Case #1: Comcast Digital Voice

One Of The Largest IP Telephony Networks

Anonymized Call Detail Records (CDR) Data Set
Data complexity from network
Data size: TBs/month

Discovered Unusual Patterns
Noticed large spikes for high cost areas

30% of this traffic was coming from three
accounts.
Analysis Shows Traffic Concentration Few Accounts

Ongoing Monitoring of Future Abuse
Analyst Scheduled a Tableau Data Extract and built a Tableau dashboard
- Now the business can keep an eye out for further abuse.

Result: Future Abuse Prevented and More
Abuse detected Analysts empowered Resources saved
No IT hand-off Value to organizationAutomated and
repeatable

21
Engineering
Analysis
IP
Telephony
Video
Research
IP Video
Engineering
X1
Operations
Advanced
Advertising
Web
Analytics
Enterprise
Business
Intelligence
Network
EngineeringMature
Evolving
On-Boarded
On-Deck
Expanding Use Cases with Datameer

Use Case #2: Customer Perspective
How to measure customer experience from the customer perspective
22

23
Millions of Viewing
Experiences

Improved Customer Experience through Data Analytics
24
Findings / Analysis
Best
Practices
Improved Customer Experience
Data driven scheduling
Dataflow Automation

Solution:
25
- Build views
quickly &
aggregate
large
datasets.
- Early visibility
of data in
Hadoop
- Create
repeatable
processes
through
automated
workflow
• Aggregations of large datasets from disparate data sources.
- RDBMS, HDFS, APIs
• Data Joins / Data Quality Checks / Pipeline between clusters

Result: Data-driven Customer Viewing Experience Enhancements
26
Customer Experience
Improved
Analysts empowered Capital Spend
Directed Intelligently
No IT hand-off Value to organizationAutomated and
repeatable

Self-Service Analytics on Hadoop: Lessons Learned

Self-Service Analytics on Hadoop: Lessons Learned

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Self-Service Analytics on Hadoop: Lessons Learned (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

Self-Service Analytics on Hadoop: Lessons Learned

Editor's Notes