SlideShare a Scribd company logo
Hayden Marchant, Software Engineer
haydenm@wix.com
Bloom Filters
Agenda
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
A
Bloom-what?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Just a Data Structure
‘Is an element in a set
or not’
Probabilistic
Definitely not in set
Probably in set
OR
ILLUSTRATION
Why would I
want one?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Why not just
store all used
keys?
Element
Key size:
# elements:
Memory
required:
100 bytes
10 million
100 * 10M
= 1GB
SIMPLE MATHS
Bloom Filters:
Tiny footprint
~ 10 bits per element
>98%
reduction
in space
Element
Key size:
# elements:
Memory
required:
10 bits
10 million
100M bits
~= 12MB
SIMPLE MATHS
(PART 2)
Footprints:
Traditional = 1GB
Bloom Filter <=12MB
Practical
Examples
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Akamai
Prevent caching of
One-Hit-Wonders
75% of web requests are
only ever called once
- Called One-Hit-Wonders
Use Bloom Filters to prevent
one-hit-wonders to be stored
in Disk Cache
1. Saves disk-access for
these 75% of requests
2. reducing workload
Medium
Avoid recommending
articles a user has
read
- Add article recommended
to bloom filter -
$userid-$articleid
- Before recommending,
check in Bloom Filter if
(user,article) exists.
- If not, then recommend
and then add to bloom
filter
Cassandra
Reduce disk lookup for
non-existent rows in file
storage
- Shard responsible over
range of data
- Each shard stored in
immutable shard-files
- Create Bloom Filter in
server for each shard-file
- Do not access shard-files
if Bloom Filter returns
false for key
Chrome
Identifying
Malicious URLs
Store malicious
URLs in Bloom
Filter
In-depth check on
positive response
How it works
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Basics of
Bloom
Filter
● Array of m bits
● initially set to 0
● Hash functions
● k hash functions
defined
● maps or hashes some
set element to one of
the m array positions
Bloom Filter
in
slow motion
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
{ }m = 20
k = 3
0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
{ cat}m = 20
k = 3
0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1
{ cat , dog }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }
owl
m = 20
k = 3
owl is definitely NOT in set
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
rat
m = 20
k = 3
rat might be in set
{cat , dog , mouse }
Constant memory
space
Extremely Fast
Processing
Low Error Rate
The maths
‘Under-the-hood’
1
2
3
4
5
PROBABILITIES
101
Probability of slot not selected for a single hash function
Probability of slot not selected for all k hashes
Probability of slot not selected for all k hashes of n elements
Probability of slot selected for all k hashes of n elements
Probability of k slots selected for n elements === False positive
GETTING A LITTLE
TRICKIER
If you want to read
more on the maths,
take a look at
https://en.wikipedia.org/
wiki/Bloom_filter
A Simple
Calculator
Click here: https://hur.st/bloomfilter
Next step
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Bloom
Filters in
Code
● Libraries exist in most
languages
○ https://github.com/alexan
drnikitin/bloom-filter-scala
○ https://www.npmjs.com/p
ackage/bloom-filters
● Distributed Bloom Filters
in Redis
○ Perfect for multiple
instance of a service
using single bloom filter
Now, go think how using
Bloom Filters in your
services could reduce
payload and speed things
up
Other
probabilistic
structures
● Counting Bloom Filters
○ Allow deletes
● TopK
○ Keep track of Top
K counts of
elements
● Count-Min Sketch
○ Frequency table of
elements
Q&A

More Related Content

PPT
Computer notes - Hashing
PPT
computer notes - Data Structures - 35
PPTX
Java script array methods
PDF
Bloom filter
PPTX
Bloom-Filters-A-Comprehensive-Guide with CSharp Sample
PPTX
Data Analytics using R.pptx
PDF
Tutorial 9 (bloom filters)
PPTX
Lecture_3.pptx
Computer notes - Hashing
computer notes - Data Structures - 35
Java script array methods
Bloom filter
Bloom-Filters-A-Comprehensive-Guide with CSharp Sample
Data Analytics using R.pptx
Tutorial 9 (bloom filters)
Lecture_3.pptx

Similar to Introduction to Bloom Filters (20)

PDF
Bloom Filters: An Introduction
PPT
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
PPTX
Unit 5 Streams2.pptx
PDF
Probabilistic algorithms for fun and pseudorandom profit
PPT
New zealand bloom filter
PPT
Footalks#1 Bloom Filters
PPTX
Bloom filters
PDF
Bloom filter
PDF
Approximate "Now" is Better Than Accurate "Later"
PDF
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
PPTX
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
PPTX
Bloom Filter the use of bloom filter for learning
PDF
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
PDF
On Improving the Performance of Data Leak Prevention using White-list Approach
PPTX
Probabilistic data structure
PPTX
Data Con LA 2018 - Applying Probabilistic Algorithms by Grant Kushida
PDF
Hash Functions FTW
PPTX
big data analytics ,stream analytics....
PDF
Cuckoo Filter: Practically Better than Bloom
Bloom Filters: An Introduction
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Unit 5 Streams2.pptx
Probabilistic algorithms for fun and pseudorandom profit
New zealand bloom filter
Footalks#1 Bloom Filters
Bloom filters
Bloom filter
Approximate "Now" is Better Than Accurate "Later"
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
Bloom Filter the use of bloom filter for learning
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
On Improving the Performance of Data Leak Prevention using White-list Approach
Probabilistic data structure
Data Con LA 2018 - Applying Probabilistic Algorithms by Grant Kushida
Hash Functions FTW
big data analytics ,stream analytics....
Cuckoo Filter: Practically Better than Bloom
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
A comparative analysis of optical character recognition models for extracting...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25-Week II
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Ad

Introduction to Bloom Filters