SlideShare a Scribd company logo
PROBABILISTIC DATA
STRUCTURES IN REAL LIFE
Valentin Bazarevsky
WHO THEY ARE?
Bloom Filter
LogLog Family
MinHash
BUSINESS CASE:
ESTIMATE YOUR AUDIENCE
SEGMENT BUILDER
15 Tb of transactional data
4h SLA
POSSIBLE SOLUTIONS
Brute force (15 TB of transactional data)
Sampling (1 % of users => 1.2 mb / b.o.)
Magic tool (?!)
Estimator
HyperLogLog allows to estimate > 1 000 000 000 sets of unique
elements with 1% error, and requires only 4kb memory
50 000 000 basic operations
OOPS…
Supports only Unions
But we need Intersections, Subtractions, Not
operators
HYPERLOGLOG INTUITION
00101010101010001111010101101 => a[2] = 0
10010101010100101010101001011 => a[9] = 1
00000101010100101010101110101 => a[0] = 1
01010101010100100101010101010 => a[5] = 1
01010000000000000000000000010 => a[5] = 23
INCLUSION-EXCLUSION PRINCIPLE
MINHASH
Store only x (8192) smallest hashes in set
Jaccard Distance
UNION OF INTERSECTIONS
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ B)
A - B - C = A - (B ∪ C)
NOT OPERATOR
Subtraction
I WANT EVERYONE EXCEPT…
A and not B
Not A and Not B
CORNER CASES
|(A ∪ not(B)) ∩ C| => |A ∩ C|
|A ∪ not(B)| = |Everything| - |B| + |A ∩ B|
|A ∩ not(B)| => |A| - |A ∩ B|
ARCHITECTURE
ERROR RATE
Median = 5%
Percentile 75 = 8%
Probabilistic data structures in real life

More Related Content

PPTX
Writing predictive web services with Azure ML
PPTX
Bulgarian Recipes
PPTX
Science10 h permanentice
PDF
PPTX
2013 module 2 the learning center
PPTX
Turkey meeting
PDF
Flower can tell you a secret in dark
PPT
Christmas in spain
Writing predictive web services with Azure ML
Bulgarian Recipes
Science10 h permanentice
2013 module 2 the learning center
Turkey meeting
Flower can tell you a secret in dark
Christmas in spain

Viewers also liked (19)

DOCX
Muazzam_mirza[1]
PPTX
Half life
PPTX
Child Labor in Philippines
PPTX
Science10 h permanentice
PDF
Can you :)
PPTX
PPT
What we like and what we don´t like
PPTX
Pinky dinky doo
PPTX
โครงการตรวจสอบครุภัณฑ์ ปวช.2/2
ODP
Nelson
PPT
It’s about time revised
PDF
Story to U MaM
PPTX
Total ranks
PPTX
Разработка средств управления и мониторинга распределенной мультиагентной сис...
ODP
Skolačka
PPT
One day in our life
PPTX
Day 7 powerpoint time on a clock
PPT
Typical dishes from spain
PDF
La competencia de comprension lectora en estudiantes de nivel medio superior
Muazzam_mirza[1]
Half life
Child Labor in Philippines
Science10 h permanentice
Can you :)
What we like and what we don´t like
Pinky dinky doo
โครงการตรวจสอบครุภัณฑ์ ปวช.2/2
Nelson
It’s about time revised
Story to U MaM
Total ranks
Разработка средств управления и мониторинга распределенной мультиагентной сис...
Skolačka
One day in our life
Day 7 powerpoint time on a clock
Typical dishes from spain
La competencia de comprension lectora en estudiantes de nivel medio superior
Ad

Recently uploaded (20)

PPT
Project quality management in manufacturing
PPTX
Artificial Intelligence
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
web development for engineering and engineering
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
OOP with Java - Java Introduction (Basics)
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Mechanical Engineering MATERIALS Selection
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PPT on Performance Review to get promotions
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Project quality management in manufacturing
Artificial Intelligence
Operating System & Kernel Study Guide-1 - converted.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
web development for engineering and engineering
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
OOP with Java - Java Introduction (Basics)
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Internet of Things (IOT) - A guide to understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT on Performance Review to get promotions
Safety Seminar civil to be ensured for safe working.
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Ad

Probabilistic data structures in real life