SlideShare a Scribd company logo
Real time intelligence to
combat ad fraud
TrafficGuard’s winning partnership
with Druid & Imply
April 2020
Raigon Jolly, Head of Analytics & Data Science,
TrafficGuard
1
Virtual Druid Summit - September 2020 222
What is TrafficGuard?
Measurement Verification
Measurement, verification and fraud prevention for digital advertising.
Invalid Traffic
Prevention
Reporting
Virtual Druid Summit - September 2020 333
What is TrafficGuard?
55% of the team
comprises data
scientists, analysts
and engineers
13+ billion*
impressions, clicks
and events
invalidated
5+ years of fighting
ad fraud
Listed on the
Australian Securities
Exchange
(ASX: AV1)
Global business
headquartered in
Perth, Australia
Virtual Druid Summit - September 2020
The challenging digital advertising landscape
1. Journey
2. Scale
3. Real time nature
Virtual Druid Summit - September 2020
The challenging digital advertising landscape
Impression Click Conversion Event Post-Conversion
Event
1. Journey
2. Scale
3. Real time nature
Virtual Druid Summit - September 2020
1. Journey
2. Scale
3. Real time nature
The challenging digital advertising landscape
Virtual Druid Summit - September 2020
The challenging digital advertising landscape
1. Journey
2. Scale
3. Real time nature
Virtual Druid Summit - September 2020
The challenging digital advertising landscape
1. Journey
2. Scale
3. Real time nature
Virtual Druid Summit - September 2020
Customer Reporting
Analytical layer
● Granularity of detail and providing log level insight helps
us differentiate from competitors who only provide
aggregated data to clients
● 14+ Reporting Subsections in product.
● Some having 18 different queries with comparisons on
time period
● Concurrency: Peak Concurrency 1000 queries a second
● Low latency
● Streamlined development with SQL - helping is faster
time to market
Virtual Druid Summit - September 2020
Query Optimisation
● Separate datasets for larger client, operational reporting, unified reporting
● Govern KPI in PIVOT, have role based access control, versioning of definition
● Precalculate into one field
● Define difference between NULL(eg “(not set)”) and empty fields
● Define data retention and multi tenancy early on
● Remove case statements/TIME EXTRACT/ REGEX in the group by clause
Learnings
Virtual Druid Summit - September 2020
Query Optimisation
● Have sound Data dictionary and governance
● Keep name of field descriptive and maintaining lineage
● Adjustment process for handling immutability
● Clarity as proactive monitor system + AWS EC2 cloudwatch
● If the dimensions are only used for COUNT DISTINCT calculations, use theta
sketches measures and remove the dimensions.
● Maintain performance history of common queries
Learnings
Virtual Druid Summit - September 2020
Some useful slices
● SQL query ID - Point to product area
● Identity - who ? what type?
● Dimension - Common Used
● Num metric - Common Used
● Num complex metric - Precalculate at
ingestion/upstream
● Num dimension - Precalculate at ingestion/upstream
● Duration
● Type
● P98 Latency - Cluster issues, Proactive monitoring,
Leading Indicator
Know thy queries using Clarity
Learnings
Virtual Druid Summit - September 2020
Operational Data Store
Real time detection operations layer
Real-time operational data store provides us:
● Operational analytics visibility across the business
● Ability to quickly identify data quality issues
● Data profiling (Null, Missing, Unique, Spike and drops)
● Ability to test integrations in real time to ensure
TrafficGuard is receiving correct and complete signal
Virtual Druid Summit - September 2020
Log, Threat hunting & Prototyping
Analytical layer
Virtual Druid Summit - September 2020
TrafficGuard Feature Store with Druid aggregated data
Streaming Layer
Virtual Druid Summit - September 2020
Source: VoltDB
Ease of management, more economical
One solution across the data lifecycle
Virtual Druid Summit - September 2020
TrafficGuard Partnership with Imply and Druid
● Multi year engagement
● Team can focus on last mile
● Imply takes care of rolling updates on Druid,
Clarity and Pivot
● The consultation, guidance and proactive
support from imply team is keeping our
operation overhead to minimal
Virtual Druid Summit - September 2020
With Druid, TrafficGuard has achieved...
Page load <5 secs
@ high concurrency
70% analytics
powered by Druid
High dimensionality &
cardinality an asset, not
challenge
Immediate and
significant cost savings
Time for questions
@TrafficGuardAI
19
Thank you!
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
November 2-4, 2020
San Francisco, CA
druidsummit.org
20
Register Now for
Druid Summit

More Related Content

PDF
Building a Real-Time Gaming Analytics Service with Apache Druid
PDF
August meetup - All about Apache Druid
PDF
Druid in Spot Instances
PDF
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
PDF
Analytics over Terabytes of Data at Twitter
PDF
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
PPTX
Apache Druid Design and Future prospect
PDF
Apache Druid Vision and Roadmap
Building a Real-Time Gaming Analytics Service with Apache Druid
August meetup - All about Apache Druid
Druid in Spot Instances
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Analytics over Terabytes of Data at Twitter
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Apache Druid Design and Future prospect
Apache Druid Vision and Roadmap

What's hot (20)

PDF
What’s New in Imply 3.3 & Apache Druid 0.18
PPTX
Why data warehouses cannot support hot analytics
PDF
Self Service Analytics at Twitch
PDF
Apache Druid®: A Dance of Distributed Processes
PDF
Benchmarking Apache Druid
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PDF
Splunk: Druid on Kubernetes with Druid-operator
PDF
Druid: Under the Covers (Virtual Meetup)
PDF
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
PDF
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
PDF
Druid Adoption Tips and Tricks
PPTX
LendingClub RealTime BigData Platform with Oracle GoldenGate
PDF
Building Data Applications with Apache Druid
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PDF
Can My Inventory Survive Eventual Consistency?
PPTX
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
PPTX
Pivot 2.0 - The next generation visualization tool for your streaming data
PPTX
Big Data Best Practices on GCP
PDF
Big Trends in Big Data
PDF
Building a Cross Cloud Data Protection Engine
What’s New in Imply 3.3 & Apache Druid 0.18
Why data warehouses cannot support hot analytics
Self Service Analytics at Twitch
Apache Druid®: A Dance of Distributed Processes
Benchmarking Apache Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Splunk: Druid on Kubernetes with Druid-operator
Druid: Under the Covers (Virtual Meetup)
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Druid Adoption Tips and Tricks
LendingClub RealTime BigData Platform with Oracle GoldenGate
Building Data Applications with Apache Druid
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Can My Inventory Survive Eventual Consistency?
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
Pivot 2.0 - The next generation visualization tool for your streaming data
Big Data Best Practices on GCP
Big Trends in Big Data
Building a Cross Cloud Data Protection Engine
Ad

Similar to How TrafficGuard uses Druid to Fight Ad Fraud and Bots (20)

PDF
Presentation raigon jolly trafficguard, imply druid sydney
PPTX
Realtime classroom analytics powered by apache druid
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
PDF
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
PDF
Virtual Instruments - Infrastructure Performance Management White Paper
PDF
Apache Druid 101
PDF
The TOP 10 tech trends of 2011
PPTX
Druid Overview by Rachel Pedreschi
PPTX
BI survey - it breakfast club - 29 january 2013 ver 2.0
PDF
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
PDF
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
PDF
Druid at Strata Conf NY 2016.pdf
PPTX
Generating actionable consumer insights from analytics
PPTX
Ad-hoc Analysis with Apache Druid
PDF
John Mancini's Predictions for Information Management in 2015
PDF
V i d e o M a n a g e d S e r V i c e S
PDF
V i d e o M a n a g e d S e r V i c e S
PPT
PCTY 2012, Overvågning af forretningssystemer i et virtuelt miljø v. Hans Ped...
PDF
Darin Fredde Project Document
DOCX
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
Presentation raigon jolly trafficguard, imply druid sydney
Realtime classroom analytics powered by apache druid
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Virtual Instruments - Infrastructure Performance Management White Paper
Apache Druid 101
The TOP 10 tech trends of 2011
Druid Overview by Rachel Pedreschi
BI survey - it breakfast club - 29 january 2013 ver 2.0
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
Druid at Strata Conf NY 2016.pdf
Generating actionable consumer insights from analytics
Ad-hoc Analysis with Apache Druid
John Mancini's Predictions for Information Management in 2015
V i d e o M a n a g e d S e r V i c e S
V i d e o M a n a g e d S e r V i c e S
PCTY 2012, Overvågning af forretningssystemer i et virtuelt miljø v. Hans Ped...
Darin Fredde Project Document
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
Ad

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PDF
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
NewMind AI Monthly Chronicles - July 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
Unlocking AI with Model Context Protocol (MCP)

How TrafficGuard uses Druid to Fight Ad Fraud and Bots

  • 1. Real time intelligence to combat ad fraud TrafficGuard’s winning partnership with Druid & Imply April 2020 Raigon Jolly, Head of Analytics & Data Science, TrafficGuard 1
  • 2. Virtual Druid Summit - September 2020 222 What is TrafficGuard? Measurement Verification Measurement, verification and fraud prevention for digital advertising. Invalid Traffic Prevention Reporting
  • 3. Virtual Druid Summit - September 2020 333 What is TrafficGuard? 55% of the team comprises data scientists, analysts and engineers 13+ billion* impressions, clicks and events invalidated 5+ years of fighting ad fraud Listed on the Australian Securities Exchange (ASX: AV1) Global business headquartered in Perth, Australia
  • 4. Virtual Druid Summit - September 2020 The challenging digital advertising landscape 1. Journey 2. Scale 3. Real time nature
  • 5. Virtual Druid Summit - September 2020 The challenging digital advertising landscape Impression Click Conversion Event Post-Conversion Event 1. Journey 2. Scale 3. Real time nature
  • 6. Virtual Druid Summit - September 2020 1. Journey 2. Scale 3. Real time nature The challenging digital advertising landscape
  • 7. Virtual Druid Summit - September 2020 The challenging digital advertising landscape 1. Journey 2. Scale 3. Real time nature
  • 8. Virtual Druid Summit - September 2020 The challenging digital advertising landscape 1. Journey 2. Scale 3. Real time nature
  • 9. Virtual Druid Summit - September 2020 Customer Reporting Analytical layer ● Granularity of detail and providing log level insight helps us differentiate from competitors who only provide aggregated data to clients ● 14+ Reporting Subsections in product. ● Some having 18 different queries with comparisons on time period ● Concurrency: Peak Concurrency 1000 queries a second ● Low latency ● Streamlined development with SQL - helping is faster time to market
  • 10. Virtual Druid Summit - September 2020 Query Optimisation ● Separate datasets for larger client, operational reporting, unified reporting ● Govern KPI in PIVOT, have role based access control, versioning of definition ● Precalculate into one field ● Define difference between NULL(eg “(not set)”) and empty fields ● Define data retention and multi tenancy early on ● Remove case statements/TIME EXTRACT/ REGEX in the group by clause Learnings
  • 11. Virtual Druid Summit - September 2020 Query Optimisation ● Have sound Data dictionary and governance ● Keep name of field descriptive and maintaining lineage ● Adjustment process for handling immutability ● Clarity as proactive monitor system + AWS EC2 cloudwatch ● If the dimensions are only used for COUNT DISTINCT calculations, use theta sketches measures and remove the dimensions. ● Maintain performance history of common queries Learnings
  • 12. Virtual Druid Summit - September 2020 Some useful slices ● SQL query ID - Point to product area ● Identity - who ? what type? ● Dimension - Common Used ● Num metric - Common Used ● Num complex metric - Precalculate at ingestion/upstream ● Num dimension - Precalculate at ingestion/upstream ● Duration ● Type ● P98 Latency - Cluster issues, Proactive monitoring, Leading Indicator Know thy queries using Clarity Learnings
  • 13. Virtual Druid Summit - September 2020 Operational Data Store Real time detection operations layer Real-time operational data store provides us: ● Operational analytics visibility across the business ● Ability to quickly identify data quality issues ● Data profiling (Null, Missing, Unique, Spike and drops) ● Ability to test integrations in real time to ensure TrafficGuard is receiving correct and complete signal
  • 14. Virtual Druid Summit - September 2020 Log, Threat hunting & Prototyping Analytical layer
  • 15. Virtual Druid Summit - September 2020 TrafficGuard Feature Store with Druid aggregated data Streaming Layer
  • 16. Virtual Druid Summit - September 2020 Source: VoltDB Ease of management, more economical One solution across the data lifecycle
  • 17. Virtual Druid Summit - September 2020 TrafficGuard Partnership with Imply and Druid ● Multi year engagement ● Team can focus on last mile ● Imply takes care of rolling updates on Druid, Clarity and Pivot ● The consultation, guidance and proactive support from imply team is keeping our operation overhead to minimal
  • 18. Virtual Druid Summit - September 2020 With Druid, TrafficGuard has achieved... Page load <5 secs @ high concurrency 70% analytics powered by Druid High dimensionality & cardinality an asset, not challenge Immediate and significant cost savings
  • 19. Time for questions @TrafficGuardAI 19 Thank you! Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org. Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
  • 20. November 2-4, 2020 San Francisco, CA druidsummit.org 20 Register Now for Druid Summit