SlideShare a Scribd company logo
Analytics over terabytes
April 15, 2020
Swapnesh Gandhi
Sr. Software Engineer at Twitter
@RealSwapneshG
1
MoPub
MoPub Analytics
Infrastructure
Lookups
Monitoring
Tiering
Experimenting
Coordinator issues
Security and retention
MoPub
Real time ad-exchange for mobile publishers
> 40k Publishers
> 200 Bidders
> 200 TB of raw datasets/day; 1.7TB/day of aggregated data
> 700 TB of data in Druid
MoPub Analytics
Metamarkets
Druid vs other solutions
Significant to our business
First production cluster at Twitter; helping wider adoption
Infrastructure
Infrastructure
On prem data center using Apache Mesos
Component # of
nodes
Mesos CPUs RAM DISK SSD
Broker 12 Shared 32 130 15 GB
Coordinator 1 Shared 32 64 10 GB
Router 8 Shared 16 20 15 GB
Historicals 700 Dedicated 80 373 15 GB 4 TB(NVMe)
Infrastructure
Historicals
Host 13 months of data in the cluster
Fast tier
Most recent 2 weeks of data.
1:1 SSD to RAM ratio to achieve low latency
2 replicas
Slow tier
All data except the most recent 2 weeks
5:1 SSD to RAM ratio
2 replicas
Node size vs cluster size
Lookups
Monitoring
Tiering
Experimenting
Coordinator issues
Security and retention
Lookups
Query time lookups
Data
Id -> name mapping
Many to one mapping
About 15 total lookups
4 large lookups > 8m rows and GBs of data
GC Issues during lookup reloads
G1 collector
Incremental load
Monitoring
Internal monitoring and alerting system
Good for finding issues
Track simple metrics such as CPU, Memory, GC
Latency
Not good for finding exact cause
Keep evolving
Monitoring
Imply Clarity
Druid cluster
Query latency
Broker queries
Historicals
Users
Usage of the platform
Tune configs
Tiering
Think in terms of use cases
Isolation vs shared resources
20% saved on infra costs
Experimenting
Performance tests
Running A/B tests in the cluster
Coordinator
Coordinator
Druid.coordinator.loadqueuepeon.type
The default - Curator is single threaded
Http is multi-threaded
Security & retention
mTLS
Stripping dimensions after 30 days
Manage retention through Druid kill tasks
Deep storage backups
Summary
Lookups
Monitoring
Tiering
Experimenting
Coordinator issues
Security and retention
Thank you.
19
Time for questions
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
20
Register now for
Druid Summit
November 2-4, 2020
San Francisco, CA
druidsummit.org
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.

More Related Content

PDF
Building a Real-Time Gaming Analytics Service with Apache Druid
PDF
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PDF
Druid Adoption Tips and Tricks
PDF
Apache Druid Vision and Roadmap
PDF
Building Data Applications with Apache Druid
PDF
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
PDF
August meetup - All about Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Druid Adoption Tips and Tricks
Apache Druid Vision and Roadmap
Building Data Applications with Apache Druid
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
August meetup - All about Apache Druid

What's hot (20)

PDF
Splunk: Druid on Kubernetes with Druid-operator
PDF
What’s New in Imply 3.3 & Apache Druid 0.18
PDF
Apache Druid®: A Dance of Distributed Processes
PDF
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
PDF
Self Service Analytics at Twitch
PDF
Druid in Spot Instances
PDF
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
PDF
Benchmarking Apache Druid
PPTX
Apache Druid Design and Future prospect
PPTX
Why data warehouses cannot support hot analytics
PDF
Druid: Under the Covers (Virtual Meetup)
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PPTX
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PPTX
The of Operational Analytics Data Store
PDF
Elastic Stack Roadmap
PDF
Au cœur de la roadmap de la Suite Elastic
PDF
Big Data Applications
PDF
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
PDF
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Splunk: Druid on Kubernetes with Druid-operator
What’s New in Imply 3.3 & Apache Druid 0.18
Apache Druid®: A Dance of Distributed Processes
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
Self Service Analytics at Twitch
Druid in Spot Instances
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Benchmarking Apache Druid
Apache Druid Design and Future prospect
Why data warehouses cannot support hot analytics
Druid: Under the Covers (Virtual Meetup)
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
A Day in the Life of a Druid Implementor and Druid's Roadmap
The of Operational Analytics Data Store
Elastic Stack Roadmap
Au cœur de la roadmap de la Suite Elastic
Big Data Applications
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Ad

Similar to Analytics over Terabytes of Data at Twitter (20)

PPTX
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
PDF
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
PPTX
PostgreSQL as a Strategic Tool
 
PPTX
Introduction to Hadoop
PDF
Game Analytics at London Apache Druid Meetup
PPT
Gp Introduction 200811
PPTX
Storage, Virtual, and Server Profiler Training
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
Lesson 1 introduction to_big_data_and_hadoop.pptx
PPTX
Big Data Lessons from the Cloud
PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
PDF
Don't think DevOps think Compliant Database DevOps
PPTX
How Will Going Virtual Impact Your Search Performance?
PPTX
Webinar: Cleaning up the SDS Mess - Four Keys to Success
PPT
MongoDB Sharding Webinar 2014
PDF
Best Practices for Building Robust Data Platform with Apache Spark and Delta
PDF
sudoers: Benchmarking Hadoop with ALOJA
PDF
OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journe...
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
PostgreSQL as a Strategic Tool
 
Introduction to Hadoop
Game Analytics at London Apache Druid Meetup
Gp Introduction 200811
Storage, Virtual, and Server Profiler Training
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Lesson 1 introduction to_big_data_and_hadoop.pptx
Big Data Lessons from the Cloud
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Don't think DevOps think Compliant Database DevOps
How Will Going Virtual Impact Your Search Performance?
Webinar: Cleaning up the SDS Mess - Four Keys to Success
MongoDB Sharding Webinar 2014
Best Practices for Building Robust Data Platform with Apache Spark and Delta
sudoers: Benchmarking Hadoop with ALOJA
OSMC 2024 | Netdata: Open Source, Distributed Observability Pipeline – Journe...
Ad

More from Imply (6)

PPTX
Pivot 2.0 - The next generation visualization tool for your streaming data
PDF
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
PDF
Nielsen: Casting the Spell - Druid in Practice
PDF
Maximizing Apache Druid performance: Beyond the basics
PDF
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
PDF
Benchmarking Apache Druid
Pivot 2.0 - The next generation visualization tool for your streaming data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Nielsen: Casting the Spell - Druid in Practice
Maximizing Apache Druid performance: Beyond the basics
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Benchmarking Apache Druid

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine Learning_overview_presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Dropbox Q2 2025 Financial Results & Investor Presentation
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Analytics over Terabytes of Data at Twitter