SlideShare a Scribd company logo
NETFLIX + NTT + RUBICON PROJECT USE CASES
WHO AM I
▸ Dir. of Solutions Engineering, Imply
▸ Author: Virtualizing Hadoop
▸ 11 years of experience in distributed systems, big data
platforms, cloud computing
AGENDA
▸ The analytics challenges it solves
▸ Use Cases
▸ Architecture
ANALYTICS CHALLENGES
APACHE DRUID
▸ New class of Operational Data Store
▸ Solves the following analytics challenges
▸ scale
▸ speed
▸ grain complexity
▸ high dimensionality
▸ concurrency
▸ freshness
USE CASES
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
SOME NUMBERS
▸ 160 Billion events per day
▸ 190 countries
▸ 300 million devices
▸ Thousands of Druid users
▸ 100s of Druid nodes
CHALLENGES
▸ RedShift was used as the backend for their ad-hoc
aggregated analytics dashboard
▸ It’s slow
▸ Cannot support longer data retention
▸ Cannot support a lot of dimensions
▸ Apache Druid replaced RedShift
USE CASES
▸ AWS capacity planning
▸ Payment analysis
▸ Algorithm comparison
▸ Security
▸ Client performance / Quality of Experience(QoE)
SOLUTION ARCHITECTURE
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
HIGHLIGHTS
▸ 4th largest telecommunication in the world
▸ Provides high speed, high capacity IP communication
services for
▸ Europe
▸ North and South America
▸ Asia
▸ Oceana
NETWORK ARCHITECTURE
CHALLENGES
▸ Legacy netflow analytics system
▸ was a blackbox, difficult to troubleshoot and extend on
both frontend and backend
▸ not scaling in a cost performant manner
▸ limited ad-hoc analysis
USE CASES
▸ Netflow analysis
▸ Capacity planning
▸ Traffic matrix analysis
▸ Inter-domain traffic analysis
SOLUTION ARCHITECTURE
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
SOME NUMBERS
▸ Thousands of external customers, publishers, DSPs across
the globe
▸ Trillions of ad and bid requests quarterly
▸ 1K header bidding connections
▸ 40% growth in mobile ad spend Q2 2018 vs Q2 2017
▸ 70% growth in video ad spend 1st half 2018 vs 1st half 2017
▸ 300% growth of audio ad spend in Q2 2018
CHALLENGES
▸ Advertising traffic exponentially grew
▸ MySQL can only store 10% of data volume daily
▸ Scaling interactive analytics to a wide base of users was
tough
▸ Cost, performance, timeliness
DRUID NUMBERS
▸ >2TB data per hour to Druid
▸ <500ms average response time
▸ >1 Trillion events per day
▸ Thousands of users across the globe
ARCHITECTURE
DRUID ARCHITECTURE
SEGMENT
▸ Highly optimized storage unit
▸ Highly compressed bitmap indexes
▸ 150MB - 700MB size
▸ Determines parallelism
▸ Read in memory
▸ No contentions between read and writes
▸ 10x - 75x storage space savings
DATA MODEL
▸ Roll ups
▸ Approximation algorithms
▸ Segment granularity
▸ Query granularity
▸ Metrics
▸ Bitmap type (concise vs roaring)
QUERY OPTIMIZATION
▸ Threads
▸ Heap
▸ Horizontal scaling
▸ topN vs groupby
▸ datasketches
▸ splitting data sources for targeted queries
JOIN THE COMMUNITY
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
TRY THIS AT HOME

More Related Content

PDF
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
PPTX
Apache Druid Design and Future prospect
PPTX
The of Operational Analytics Data Store
PDF
Imply at Apache Druid Meetup in London 1-15-20
PDF
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
PDF
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PDF
What’s New in Imply 3.3 & Apache Druid 0.18
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid Design and Future prospect
The of Operational Analytics Data Store
Imply at Apache Druid Meetup in London 1-15-20
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Archmage, Pinterest’s Real-time Analytics Platform on Druid
What’s New in Imply 3.3 & Apache Druid 0.18

What's hot (20)

PDF
Benchmarking Apache Druid
PDF
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
PDF
Analytics over Terabytes of Data at Twitter
PPTX
Log Events @Twitter
PDF
Druid meetup 2018-03-13
PDF
Druid in Spot Instances
PDF
Building a Real-Time Gaming Analytics Service with Apache Druid
PPTX
Google Cloud Spanner Preview
PDF
Big data real time architectures
PDF
Building Data Applications with Apache Druid
PDF
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
PDF
Splunk: Druid on Kubernetes with Druid-operator
PDF
Druid meetup @ Netflix (11/14/2018 )
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PDF
Big Trends in Big Data
PDF
August meetup - All about Apache Druid
PDF
How @twitterhadoop chose google cloud
PDF
Apache Druid Vision and Roadmap
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
Solving Hybrid Cloud Data Replication with Apache Cassandra
Benchmarking Apache Druid
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Analytics over Terabytes of Data at Twitter
Log Events @Twitter
Druid meetup 2018-03-13
Druid in Spot Instances
Building a Real-Time Gaming Analytics Service with Apache Druid
Google Cloud Spanner Preview
Big data real time architectures
Building Data Applications with Apache Druid
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Splunk: Druid on Kubernetes with Druid-operator
Druid meetup @ Netflix (11/14/2018 )
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Big Trends in Big Data
August meetup - All about Apache Druid
How @twitterhadoop chose google cloud
Apache Druid Vision and Roadmap
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Solving Hybrid Cloud Data Replication with Apache Cassandra
Ad

Similar to What does Netflix, NTT and Rubicon Project have in common? Apache Druid. (20)

PPTX
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
PPTX
Cloudifying High Availability: The Case for Elastic Disaster Recovery
PPTX
Innovating to Create a Brighter Future for AI, HPC, and Big Data
PPTX
MapR and Cisco Make IT Better
PDF
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PPTX
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
PDF
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
PPTX
Real-time Analytics with Redis
PDF
An Introduction to the MapR Converged Data Platform
PDF
Open Hybrid Cloud - Erik Geensen
PPTX
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
PDF
Lean Enterprise, Microservices and Big Data
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
PDF
Introduction to Big Data Technologies & Applications
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PPTX
Neo4j GraphTalks - Einführung in Graphdatenbanken
PPTX
Integrating Hadoop into your enterprise IT environment
PPTX
The rise of “Big Data” on cloud computing
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Innovating to Create a Brighter Future for AI, HPC, and Big Data
MapR and Cisco Make IT Better
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
How to get Real-Time Value from your IoT Data - Datastax
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
[DOST] OpenStack & the Enterprise Hybrid Cloud - Tech, People, Processes
Real-time Analytics with Redis
An Introduction to the MapR Converged Data Platform
Open Hybrid Cloud - Erik Geensen
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Lean Enterprise, Microservices and Big Data
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Introduction to Big Data Technologies & Applications
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Neo4j GraphTalks - Einführung in Graphdatenbanken
Integrating Hadoop into your enterprise IT environment
The rise of “Big Data” on cloud computing
Ad

More from Rommel Garcia (10)

PPTX
GPU 101: The Beast In Data Centers
PDF
PCI Compliane With Hadoop
PDF
Virtualizing Hadoop
PPTX
Open Source Security Tools for Big Data
PPTX
Apache Ranger
PPTX
Hadoop Meets Scrum
PPTX
Realtime analytics + hadoop 2.0
PPTX
Interactive query in hadoop
PPTX
YARN - Presented At Dallas Hadoop User Group
PPT
Hadoop 1.x vs 2
GPU 101: The Beast In Data Centers
PCI Compliane With Hadoop
Virtualizing Hadoop
Open Source Security Tools for Big Data
Apache Ranger
Hadoop Meets Scrum
Realtime analytics + hadoop 2.0
Interactive query in hadoop
YARN - Presented At Dallas Hadoop User Group
Hadoop 1.x vs 2

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
history of c programming in notes for students .pptx
PDF
AI in Product Development-omnex systems
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Introduction to Artificial Intelligence
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Transform Your Business with a Software ERP System
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
top salesforce developer skills in 2025.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
history of c programming in notes for students .pptx
AI in Product Development-omnex systems
wealthsignaloriginal-com-DS-text-... (1).pdf
Introduction to Artificial Intelligence
How Creative Agencies Leverage Project Management Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Transform Your Business with a Software ERP System
Understanding Forklifts - TECH EHS Solution
How to Migrate SBCGlobal Email to Yahoo Easily
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
2025 Textile ERP Trends: SAP, Odoo & Oracle
top salesforce developer skills in 2025.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
Softaken Excel to vCard Converter Software.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...

What does Netflix, NTT and Rubicon Project have in common? Apache Druid.

  • 1. NETFLIX + NTT + RUBICON PROJECT USE CASES
  • 2. WHO AM I ▸ Dir. of Solutions Engineering, Imply ▸ Author: Virtualizing Hadoop ▸ 11 years of experience in distributed systems, big data platforms, cloud computing
  • 3. AGENDA ▸ The analytics challenges it solves ▸ Use Cases ▸ Architecture
  • 5. APACHE DRUID ▸ New class of Operational Data Store ▸ Solves the following analytics challenges ▸ scale ▸ speed ▸ grain complexity ▸ high dimensionality ▸ concurrency ▸ freshness
  • 8. SOME NUMBERS ▸ 160 Billion events per day ▸ 190 countries ▸ 300 million devices ▸ Thousands of Druid users ▸ 100s of Druid nodes
  • 9. CHALLENGES ▸ RedShift was used as the backend for their ad-hoc aggregated analytics dashboard ▸ It’s slow ▸ Cannot support longer data retention ▸ Cannot support a lot of dimensions ▸ Apache Druid replaced RedShift
  • 10. USE CASES ▸ AWS capacity planning ▸ Payment analysis ▸ Algorithm comparison ▸ Security ▸ Client performance / Quality of Experience(QoE)
  • 13. HIGHLIGHTS ▸ 4th largest telecommunication in the world ▸ Provides high speed, high capacity IP communication services for ▸ Europe ▸ North and South America ▸ Asia ▸ Oceana
  • 15. CHALLENGES ▸ Legacy netflow analytics system ▸ was a blackbox, difficult to troubleshoot and extend on both frontend and backend ▸ not scaling in a cost performant manner ▸ limited ad-hoc analysis
  • 16. USE CASES ▸ Netflow analysis ▸ Capacity planning ▸ Traffic matrix analysis ▸ Inter-domain traffic analysis
  • 19. SOME NUMBERS ▸ Thousands of external customers, publishers, DSPs across the globe ▸ Trillions of ad and bid requests quarterly ▸ 1K header bidding connections ▸ 40% growth in mobile ad spend Q2 2018 vs Q2 2017 ▸ 70% growth in video ad spend 1st half 2018 vs 1st half 2017 ▸ 300% growth of audio ad spend in Q2 2018
  • 20. CHALLENGES ▸ Advertising traffic exponentially grew ▸ MySQL can only store 10% of data volume daily ▸ Scaling interactive analytics to a wide base of users was tough ▸ Cost, performance, timeliness
  • 21. DRUID NUMBERS ▸ >2TB data per hour to Druid ▸ <500ms average response time ▸ >1 Trillion events per day ▸ Thousands of users across the globe
  • 24. SEGMENT ▸ Highly optimized storage unit ▸ Highly compressed bitmap indexes ▸ 150MB - 700MB size ▸ Determines parallelism ▸ Read in memory ▸ No contentions between read and writes ▸ 10x - 75x storage space savings
  • 25. DATA MODEL ▸ Roll ups ▸ Approximation algorithms ▸ Segment granularity ▸ Query granularity ▸ Metrics ▸ Bitmap type (concise vs roaring)
  • 26. QUERY OPTIMIZATION ▸ Threads ▸ Heap ▸ Horizontal scaling ▸ topN vs groupby ▸ datasketches ▸ splitting data sources for targeted queries
  • 27. JOIN THE COMMUNITY Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started
  • 28. TRY THIS AT HOME