SlideShare a Scribd company logo
Druid Optimizations for
Scaling Customer Facing
Analytics
at Conviva
Amir Youssefi
Principal Software Engineer
Conviva
Presenters
Pawas Ranjan
Engineer Manager
Conviva
Highlights
● Streaming Analytics at Conviva
● Before Druid
● Druid Usage and Challenges (Query Timeouts, Reliable Data Ingestion...)
● Solutions
● Outcomes (add a 9 to our reliability...)
Streaming Analytics at
Conviva
Conviva: Video Streaming Analytics
Products
Products
Products
Conviva: Customers
Conviva: Stats
Before Druid
Before (2019 and earlier)
Data Pipeline
● Hadoop MR Batch Jobs (5m) with rollup
(Hourly and Daily)
● Spark Streaming (mini batches)
● Serving from HBase using Phoenix (SQL)
● SQL Query Gateway
Data Center Locations
● On-premise
● Cloud (AWS): Hot Backup
Druid (since 2019)
Druid since 2019
Data Pipeline
● Started at Druid 3.x
● Native Druid Query Gateway
● Hadoop and Spark for batch ingestion
● Spark Streaming (Real Time, mini batches)
● Akka (Scala) Streaming and Spark Streaming
● Elastic and Imply Clarity for query/log analysis
Data Center Locations
● On Premise
● Amazon AWS
● Google GCP
Druid Challenges
Druid Challenges
● Reliability and Performance (Query Timeouts)
● High Cardinality Measures
● Cost (specially on Cloud)
● Multi Tenancy and Data Locality
● Wide Rows (Metadata Performance)
● Multi Data Center/Cloud
● Orchestration (Docker -> K8s )
● Rapid Disaster Recovery
Solutions and
Outcome
Analytics over Query Logs
Query Start/End Timestamps
Run Times Entity Distribution
Time-out Distribution
Detailed Study of Query Access Patterns, Timestamps, Time-outs, etc.
Challenges/Solutions (Reliability)
Reliability Issues
● Query Timeouts
● Reliable Ingestion & Query Speed Balance
● Query Performance (High Avg Time)
● Query Runtime Fluctuations
● Random Ingestion Task Failures
Solutions
● Druid Configuration & Tuning
● Data Locality, Dynamic Partitions, Multi Tenancy
● Tuning Brokers; Extra Tier 3 for recent Data and Queries
● Query/Context Updates
● Resource and Configuration Adjustment
Challenge/Solutions
Solutions
● Resource Optimizations; On-prem+GCP
● Created in-house Dimensional index
● Using Native Query instead of Druid SQL
● K8s+Helm Chart Improvements on GCP
● Supervisor Optimizations and disabling Historicals
Challenges
● Cost (specially on Cloud)
● Querying High Cardinality Measures
● SQL Metadata Performance due to Wide Rows
● Rapid Disaster Recovery
● Real Time Cluster
Outcomes
Before/After: Response Times & Query Timeout
Additional 9 added to reliability...
Recent days at Query Gateway
Questions?
Thank you
Contact: amir@conviva.com , pranjan@conviva.com
Special thanks to Ashrut Tewari and Haijie Wu

More Related Content

PDF
Apache Druid 101
PDF
Fast analytics kudu to druid
PPTX
Realtime classroom analytics powered by apache druid
PPTX
Understanding apache-druid
PDF
Unlocking the Power of IoT: A comprehensive approach to real-time insights
PDF
PDF
Relevance of time series databases & druid.io
PPTX
Big data processing engines, Atlanta Meetup 4/30
Apache Druid 101
Fast analytics kudu to druid
Realtime classroom analytics powered by apache druid
Understanding apache-druid
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Relevance of time series databases & druid.io
Big data processing engines, Atlanta Meetup 4/30

Similar to Druid Optimizations for Scaling Customer Facing Analytics (20)

PDF
Real-time analytics with Druid at Appsflyer
PDF
Druid @ branch
PPTX
Apache Druid Design and Future prospect
PDF
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
PDF
Druid: Under the Covers (Virtual Meetup)
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
PPTX
Druid Scaling Realtime Analytics
PPTX
Druid Overview by Rachel Pedreschi
PDF
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
PDF
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
PPTX
Why data warehouses cannot support hot analytics
PPTX
Druid and Hive Together : Use Cases and Best Practices
PDF
Druid at Strata Conf NY 2016.pdf
PPTX
Our journey with druid - from initial research to full production scale
PDF
Game Analytics at London Apache Druid Meetup
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PDF
Aggregated queries with Druid on terrabytes and petabytes of data
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Real-time analytics with Druid at Appsflyer
Druid @ branch
Apache Druid Design and Future prospect
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
Druid: Under the Covers (Virtual Meetup)
A Day in the Life of a Druid Implementor and Druid's Roadmap
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Druid Scaling Realtime Analytics
Druid Overview by Rachel Pedreschi
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Why data warehouses cannot support hot analytics
Druid and Hive Together : Use Cases and Best Practices
Druid at Strata Conf NY 2016.pdf
Our journey with druid - from initial research to full production scale
Game Analytics at London Apache Druid Meetup
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Aggregated queries with Druid on terrabytes and petabytes of data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
annual-report-2024-2025 original latest.
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PDF
Lecture1 pattern recognition............
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
annual-report-2024-2025 original latest.
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
climate analysis of Dhaka ,Banglades.pptx
Ad

Druid Optimizations for Scaling Customer Facing Analytics

Editor's Notes

  • #20: Verbally describe each story (optionally add more context and screenshots/charts in final presentation)
  • #21: Verbally describe each story (optionally add more context and screenshots/charts in final presentation)