SlideShare a Scribd company logo
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter: practically better than Bloom
AED 2013/2014 Presentation
Alessandro Lenzi
Master in Computer Science and Networking
Università di Pisa and Scuola Superiore Sant’Anna
February 23, 2015
From Cuckoo Filter: Practically Better than Bloom
by Fan, Andersen, Kaminsky and Mitzenmacher [6]
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Table of contents
1 Introduction
Bloom Filters and extensions
2 Cuckoo Filter
3 Analysis
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Benchmark
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Section 1
Introduction
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Lookup, Classification and approximate set membership
Approximate set membership tests are often used for high
speed packet processing
Allow to rule out elements not in the set in a fast and space
efficient manner
Both Lookup and Monitor require fast membership tests
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Lookup, Classification and approximate set membership
Approximate set membership tests are often used for high
speed packet processing
Allow to rule out elements not in the set in a fast and space
efficient manner
Both Lookup and Monitor require fast membership tests
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Application of approximate set membership
Search on Prefix Lengths using Bloom Filters[1]
Associate a Bloom Filter to each possible prefix length
Perform parallel membership queries on all possible lengths.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Application of approximate set membership
Search on Prefix Lengths using Bloom Filters[1]
Associate a Bloom Filter to each possible prefix length
Perform parallel membership queries on all possible lengths.
BUFFALO [2]
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Application of approximate set membership
Search on Prefix Lengths using Bloom Filters[1]
Associate a Bloom Filter to each possible prefix length
Perform parallel membership queries on all possible lengths.
BUFFALO [2]
Entire lookup in fast memory, at the expense of slightly longer
paths in worst case
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Application of approximate set membership
Search on Prefix Lengths using Bloom Filters[1]
Associate a Bloom Filter to each possible prefix length
Perform parallel membership queries on all possible lengths.
BUFFALO [2]
Entire lookup in fast memory, at the expense of slightly longer
paths in worst case
Build a Bloom Filter BF(h) for each next hop h
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Application of approximate set membership
Search on Prefix Lengths using Bloom Filters[1]
Associate a Bloom Filter to each possible prefix length
Perform parallel membership queries on all possible lengths.
BUFFALO [2]
Entire lookup in fast memory, at the expense of slightly longer
paths in worst case
Build a Bloom Filter BF(h) for each next hop h
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Subsection 1
Bloom Filters and extensions
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filters
The most famous approximate set membership data structure.
Allows to represent an item in the set with 1.44log2(1/ε) bit, very
close to the information-theoretic minimum, log2(1/ε).
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filters
The most famous approximate set membership data structure.
Allows to represent an item in the set with 1.44log2(1/ε) bit, very
close to the information-theoretic minimum, log2(1/ε).
The probability of false positive ε can be made arbitrarily small at
the expense of space efficiency.
How Bloom Filters Work
The data structure is a bit array BF of size n
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filters
The most famous approximate set membership data structure.
Allows to represent an item in the set with 1.44log2(1/ε) bit, very
close to the information-theoretic minimum, log2(1/ε).
The probability of false positive ε can be made arbitrarily small at
the expense of space efficiency.
How Bloom Filters Work
The data structure is a bit array BF of size n
When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filters
The most famous approximate set membership data structure.
Allows to represent an item in the set with 1.44log2(1/ε) bit, very
close to the information-theoretic minimum, log2(1/ε).
The probability of false positive ε can be made arbitrarily small at
the expense of space efficiency.
How Bloom Filters Work
The data structure is a bit array BF of size n
When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k
Lookup(x) returns true when
BF[h1(x)] = 1∧BF[h2(x)] = 1∧...∧BF[hk(x)] = 1.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filters
The most famous approximate set membership data structure.
Allows to represent an item in the set with 1.44log2(1/ε) bit, very
close to the information-theoretic minimum, log2(1/ε).
The probability of false positive ε can be made arbitrarily small at
the expense of space efficiency.
How Bloom Filters Work
The data structure is a bit array BF of size n
When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k
Lookup(x) returns true when
BF[h1(x)] = 1∧BF[h2(x)] = 1∧...∧BF[hk(x)] = 1.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
The number of cache misses falls to 1 for lookup query,
increasing false positive rates.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
The number of cache misses falls to 1 for lookup query,
increasing false positive rates.
It is not possible to list elements in the set using only the
Bloom Filter data structure
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
The number of cache misses falls to 1 for lookup query,
increasing false positive rates.
It is not possible to list elements in the set using only the
Bloom Filter data structure
Need of another way to store the elements to check in case of
false positive or to retrieve the set.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
The number of cache misses falls to 1 for lookup query,
increasing false positive rates.
It is not possible to list elements in the set using only the
Bloom Filter data structure
Need of another way to store the elements to check in case of
false positive or to retrieve the set.
Invertible Bloom Lookup Table [7] allow for deletion and
listing (successful w.h.p) of elements.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Bloom Filters and extensions
Bloom Filter limitations
Classical Bloom Filter don’t allow deletion of elements from
the set without inserting false negatives.
Counting Bloom Filters allow deletions, using counters of c
bits.
Every lookup query causes k cache misses, impairing lookup
time
Blocked Bloom Filters provide spatial locality by forcing all k
hashes belonging to element x to fall on the same cache-line.
The number of cache misses falls to 1 for lookup query,
increasing false positive rates.
It is not possible to list elements in the set using only the
Bloom Filter data structure
Need of another way to store the elements to check in case of
false positive or to retrieve the set.
Invertible Bloom Lookup Table [7] allow for deletion and
listing (successful w.h.p) of elements.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Section 2
Cuckoo Filter
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Select a victim v between T[h1(x)] and T[h2(x)]
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Select a victim v between T[h1(x)] and T[h2(x)]
Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the
procedure as done for x.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Select a victim v between T[h1(x)] and T[h2(x)]
Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the
procedure as done for x.
In case a threshold n on the maximum number of movements
is reached, rebuild the hash table from scratch.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Select a victim v between T[h1(x)] and T[h2(x)]
Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the
procedure as done for x.
In case a threshold n on the maximum number of movements
is reached, rebuild the hash table from scratch.
Further improvements from buckets in place of single element
storage.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Hashing [3] [4]
Exploits the power of 2-choices to provide lookup of an
element in a hash table in O(1) worst case, rather than in
expectation.
An element x can be stored either in T[h1(x)] or T[h2(x)].
In case both positions are occupied:
Select a victim v between T[h1(x)] and T[h2(x)]
Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the
procedure as done for x.
In case a threshold n on the maximum number of movements
is reached, rebuild the hash table from scratch.
Further improvements from buckets in place of single element
storage.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - The idea
Intuition
Cuckoo Hashing can be used to check for set membership in
O(1), but is not as spatially efficient as approximate set
membership data structures.
For set membership queries, store a fingerprint, rather than
(key,value) pairs!
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - The idea
Intuition
Cuckoo Hashing can be used to check for set membership in
O(1), but is not as spatially efficient as approximate set
membership data structures.
For set membership queries, store a fingerprint, rather than
(key,value) pairs!
Partial-key Cuckoo Hashing
Storing fingerprint makes it impossible to find alternative
locations and perform rehashing.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - The idea
Intuition
Cuckoo Hashing can be used to check for set membership in
O(1), but is not as spatially efficient as approximate set
membership data structures.
For set membership queries, store a fingerprint, rather than
(key,value) pairs!
Partial-key Cuckoo Hashing
Storing fingerprint makes it impossible to find alternative
locations and perform rehashing.
Partial key cuckoo hashing derivers alternative location using
x’s fingerprint f and the current position i = h1(x) or h2(x):
h2(x) = h1(x)⊕hash(f )
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - The idea
Intuition
Cuckoo Hashing can be used to check for set membership in
O(1), but is not as spatially efficient as approximate set
membership data structures.
For set membership queries, store a fingerprint, rather than
(key,value) pairs!
Partial-key Cuckoo Hashing
Storing fingerprint makes it impossible to find alternative
locations and perform rehashing.
Partial key cuckoo hashing derivers alternative location using
x’s fingerprint f and the current position i = h1(x) or h2(x):
h2(x) = h1(x)⊕hash(f )
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - The idea
Intuition
Cuckoo Hashing can be used to check for set membership in
O(1), but is not as spatially efficient as approximate set
membership data structures.
For set membership queries, store a fingerprint, rather than
(key,value) pairs!
Partial-key Cuckoo Hashing
Storing fingerprint makes it impossible to find alternative
locations and perform rehashing.
Partial key cuckoo hashing derivers alternative location using
x’s fingerprint f and the current position i = h1(x) or h2(x):
h2(x) = h1(x)⊕hash(f )
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - the data structure
m is the number of buckets
b is the number of entries (fingerprints) for each bucket
f is the number of bits used to represent a fingerprint
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Try to insert fv in j = h⊕hash(fv ), otherwise kick out another
element and repeat for at most maxKicks elements.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Try to insert fv in j = h⊕hash(fv ), otherwise kick out another
element and repeat for at most maxKicks elements.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Try to insert fv in j = h⊕hash(fv ), otherwise kick out another
element and repeat for at most maxKicks elements.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Try to insert fv in j = h⊕hash(fv ), otherwise kick out another
element and repeat for at most maxKicks elements.
Copies or elements with same fingerprint can be inserted up to
2b times.
To increase the limit we can add counters.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Cuckoo Filter - Insert
The fingerprint of x, fx = fingerprint(x) can be stored either in
bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b
elements. Hash fx used to spread colliding elements
If both are full, fv is picked at random between the 2b
fingerprints elements stored in T[i] and T[j]
Replace fv in bucket h ∈ {i,j} with fx
Try to insert fv in j = h⊕hash(fv ), otherwise kick out another
element and repeat for at most maxKicks elements.
Copies or elements with same fingerprint can be inserted up to
2b times.
To increase the limit we can add counters.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Deleting x requires removing from T an instance of fx
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Deleting x requires removing from T an instance of fx
The delete procedure avoids the false deletion problem.
However requires that x must have been inserted previously.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Deleting x requires removing from T an instance of fx
The delete procedure avoids the false deletion problem.
However requires that x must have been inserted previously.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Deleting x requires removing from T an instance of fx
The delete procedure avoids the false deletion problem.
However requires that x must have been inserted previously.
The cost of the operation is O(2b).
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Lookup and Delete Operations
Lookup
Given fx = fingerprint(x), check for it in buckets i = hash(x)
and j = i ⊕hash(fx )
The cost of the operation is O(2b), with b usually small.
Delete
Deleting x requires removing from T an instance of fx
The delete procedure avoids the false deletion problem.
However requires that x must have been inserted previously.
The cost of the operation is O(2b).
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Section 3
Analysis
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Subsection 1
Insertion failure probability
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure
Insertion failure in cuckoo hashing [5]
In CH, insert(x) can fail only if, in the sequence
x1 = x,x2 = v,...,xp of elements removed and then reinserted in
the hash ∃i,j ∈ [0,p] : xi = xj .
Insertion failure in cuckoo filter
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure
Insertion failure in cuckoo hashing [5]
In CH, insert(x) can fail only if, in the sequence
x1 = x,x2 = v,...,xp of elements removed and then reinserted in
the hash ∃i,j ∈ [0,p] : xi = xj .
Insertion failure in cuckoo filter
Obiously, a similar condition holds for CF.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure
Insertion failure in cuckoo hashing [5]
In CH, insert(x) can fail only if, in the sequence
x1 = x,x2 = v,...,xp of elements removed and then reinserted in
the hash ∃i,j ∈ [0,p] : xi = xj .
Insertion failure in cuckoo filter
Obiously, a similar condition holds for CF.
However insert(x) always fails if there are 2b +1 elements
with the same candidate buckets i and j.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure
Insertion failure in cuckoo hashing [5]
In CH, insert(x) can fail only if, in the sequence
x1 = x,x2 = v,...,xp of elements removed and then reinserted in
the hash ∃i,j ∈ [0,p] : xi = xj .
Insertion failure in cuckoo filter
Obiously, a similar condition holds for CF.
However insert(x) always fails if there are 2b +1 elements
with the same candidate buckets i and j.
This event probability dominates the insertion failure
probability
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure
Insertion failure in cuckoo hashing [5]
In CH, insert(x) can fail only if, in the sequence
x1 = x,x2 = v,...,xp of elements removed and then reinserted in
the hash ∃i,j ∈ [0,p] : xi = xj .
Insertion failure in cuckoo filter
Obiously, a similar condition holds for CF.
However insert(x) always fails if there are 2b +1 elements
with the same candidate buckets i and j.
This event probability dominates the insertion failure
probability
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion Failure Probability (1)
CH’s insertion failure analysis cannot be used for CF, because:
Candidates positions for an element are not independent as,
given a certain i = hash(x), j may assume at most 2f different
values. Thus it is often chosen only among a subset of
positions.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion Failure Probability (1)
CH’s insertion failure analysis cannot be used for CF, because:
Candidates positions for an element are not independent as,
given a certain i = hash(x), j may assume at most 2f different
values. Thus it is often chosen only among a subset of
positions.
For large f and m not too big, partial key hashing is similar to
cuckoo.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion Failure Probability (1)
CH’s insertion failure analysis cannot be used for CF, because:
Candidates positions for an element are not independent as,
given a certain i = hash(x), j may assume at most 2f different
values. Thus it is often chosen only among a subset of
positions.
For large f and m not too big, partial key hashing is similar to
cuckoo.
Otherwise insertion failure probability increases.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion Failure Probability (1)
CH’s insertion failure analysis cannot be used for CF, because:
Candidates positions for an element are not independent as,
given a certain i = hash(x), j may assume at most 2f different
values. Thus it is often chosen only among a subset of
positions.
For large f and m not too big, partial key hashing is similar to
cuckoo.
Otherwise insertion failure probability increases.
It is possible to provide a lower bound on the insertion failure
probability by calculating P{|Si,j | = 2b +1}, where Si,j = {x : x’s
candidate buckets are i and j}.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion Failure Probability (1)
CH’s insertion failure analysis cannot be used for CF, because:
Candidates positions for an element are not independent as,
given a certain i = hash(x), j may assume at most 2f different
values. Thus it is often chosen only among a subset of
positions.
For large f and m not too big, partial key hashing is similar to
cuckoo.
Otherwise insertion failure probability increases.
It is possible to provide a lower bound on the insertion failure
probability by calculating P{|Si,j | = 2b +1}, where Si,j = {x : x’s
candidate buckets are i and j}.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
1 Have the same fingerprint ( fx = fy ), which occurs with
probability 1
2f and
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
1 Have the same fingerprint ( fx = fy ), which occurs with
probability 1
2f and
2 Have i or j in common, happens with probability 2
m
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
1 Have the same fingerprint ( fx = fy ), which occurs with
probability 1
2f and
2 Have i or j in common, happens with probability 2
m
Thus P{|Si,j | = q} = n
q ( 2
2f m
)q−1
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
1 Have the same fingerprint ( fx = fy ), which occurs with
probability 1
2f and
2 Have i or j in common, happens with probability 2
m
Thus P{|Si,j | = q} = n
q ( 2
2f m
)q−1
Insertion failure probability is at least n
2b+1 ( 2
2f m
)2b = Ω n
4bf
for m = cn
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insertion failure probability (2)
Since fx length will be much smaller than its hash one, we will
assume no collisions on fingerprint’s hashing.
For two elements x and y to collide they must:
1 Have the same fingerprint ( fx = fy ), which occurs with
probability 1
2f and
2 Have i or j in common, happens with probability 2
m
Thus P{|Si,j | = q} = n
q ( 2
2f m
)q−1
Insertion failure probability is at least n
2b+1 ( 2
2f m
)2b = Ω n
4bf
for m = cn
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Bits per element
Minimum Fingerprint Size
For a non-trivial insertion failure probability we need 4bf = Ω(n),
or, in other words, f = Ω(log(n/b))
Comparison with Bloom Filter
In CF Ω(log(n/b)) bits are used to represent an element,
against the ln(1/ε) = O(1) required by Bloom Filter.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Bits per element
Minimum Fingerprint Size
For a non-trivial insertion failure probability we need 4bf = Ω(n),
or, in other words, f = Ω(log(n/b))
Comparison with Bloom Filter
In CF Ω(log(n/b)) bits are used to represent an element,
against the ln(1/ε) = O(1) required by Bloom Filter.
CF is asymptotically worst, but in practice the denominator b
saves the day!
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Bits per element
Minimum Fingerprint Size
For a non-trivial insertion failure probability we need 4bf = Ω(n),
or, in other words, f = Ω(log(n/b))
Comparison with Bloom Filter
In CF Ω(log(n/b)) bits are used to represent an element,
against the ln(1/ε) = O(1) required by Bloom Filter.
CF is asymptotically worst, but in practice the denominator b
saves the day!
With ε < 3%, the number of bits required to represent an
element with CF is less than the one required by BF!
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Bits per element
Minimum Fingerprint Size
For a non-trivial insertion failure probability we need 4bf = Ω(n),
or, in other words, f = Ω(log(n/b))
Comparison with Bloom Filter
In CF Ω(log(n/b)) bits are used to represent an element,
against the ln(1/ε) = O(1) required by Bloom Filter.
CF is asymptotically worst, but in practice the denominator b
saves the day!
With ε < 3%, the number of bits required to represent an
element with CF is less than the one required by BF!
In practice f , for b large enough can be treated as a
reasonable sized constant.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Bits per element
Minimum Fingerprint Size
For a non-trivial insertion failure probability we need 4bf = Ω(n),
or, in other words, f = Ω(log(n/b))
Comparison with Bloom Filter
In CF Ω(log(n/b)) bits are used to represent an element,
against the ln(1/ε) = O(1) required by Bloom Filter.
CF is asymptotically worst, but in practice the denominator b
saves the day!
With ε < 3%, the number of bits required to represent an
element with CF is less than the one required by BF!
In practice f , for b large enough can be treated as a
reasonable sized constant.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Subsection 2
Space analysis
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
f ≥ log2(2b/ε) = log2(2b)+log2(1/ε)
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
f ≥ log2(2b/ε) = log2(2b)+log2(1/ε)
Finally, the cost for representing a single element, or amortized
space cost C is defined as:
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
f ≥ log2(2b/ε) = log2(2b)+log2(1/ε)
Finally, the cost for representing a single element, or amortized
space cost C is defined as:
C =
table size
items stored
=
f ×b ×m
n
=
f
α
≤
log2(2b)+log2(1/ε)
α
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
f ≥ log2(2b/ε) = log2(2b)+log2(1/ε)
Finally, the cost for representing a single element, or amortized
space cost C is defined as:
C =
table size
items stored
=
f ×b ×m
n
=
f
α
≤
log2(2b)+log2(1/ε)
α
With α = n
b×m load factor.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Amortized space cost
False Positive Probability
A false positive occurs when searching for x if either bucket i or j
contains fy = fx .
P{false positive for x} = 1−P{all different from fx } = 1−P{an
element is different from fx }2b = 1−(1− 1
2f )2b ≈ 2b
2f
If we want a certain error probability ε, given b:
f ≥ log2(2b/ε) = log2(2b)+log2(1/ε)
Finally, the cost for representing a single element, or amortized
space cost C is defined as:
C =
table size
items stored
=
f ×b ×m
n
=
f
α
≤
log2(2b)+log2(1/ε)
α
With α = n
b×m load factor.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Optimization: Buckets semi sorting
All operations are insensible to the relative ordering of
elements in the buckets
Since the number of sorted sequences is 2f +b−1
b , a table
containing all them can be pre-computed.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Optimization: Buckets semi sorting
All operations are insensible to the relative ordering of
elements in the buckets
Since the number of sorted sequences is 2f +b−1
b , a table
containing all them can be pre-computed.
Buckets can be replaced with an offset in the table, achieving
some compression.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Optimization: Buckets semi sorting
All operations are insensible to the relative ordering of
elements in the buckets
Since the number of sorted sequences is 2f +b−1
b , a table
containing all them can be pre-computed.
Buckets can be replaced with an offset in the table, achieving
some compression.
If such table is retained in fast memory, the further indirection
penalty is negligible.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Optimization: Buckets semi sorting
All operations are insensible to the relative ordering of
elements in the buckets
Since the number of sorted sequences is 2f +b−1
b , a table
containing all them can be pre-computed.
Buckets can be replaced with an offset in the table, achieving
some compression.
If such table is retained in fast memory, the further indirection
penalty is negligible.
Example with b = 4 and f = 4
In this case an uncompressed bucket occupies 16 bits. However,
since the sorted buckets are at most 3876, they can be represented
with only 12 bits, sparing a bit for element.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Optimization: Buckets semi sorting
All operations are insensible to the relative ordering of
elements in the buckets
Since the number of sorted sequences is 2f +b−1
b , a table
containing all them can be pre-computed.
Buckets can be replaced with an offset in the table, achieving
some compression.
If such table is retained in fast memory, the further indirection
penalty is negligible.
Example with b = 4 and f = 4
In this case an uncompressed bucket occupies 16 bits. However,
since the sorted buckets are at most 3876, they can be represented
with only 12 bits, sparing a bit for element.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Subsection 3
Comparison with Bloom Filters
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Occupancy
Load Factor
Crucial for the space
efficiency of the cuckoo
filter
α increases with b, and
b = 4 is already enough
to reach close to optimal
load factors.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Occupancy
Load Factor
Crucial for the space
efficiency of the cuckoo
filter
α increases with b, and
b = 4 is already enough
to reach close to optimal
load factors.
Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical
scenarios C < 1.44log2(1/ε), making CF more space efficient than
BF.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Occupancy
Load Factor
Crucial for the space
efficiency of the cuckoo
filter
α increases with b, and
b = 4 is already enough
to reach close to optimal
load factors.
Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical
scenarios C < 1.44log2(1/ε), making CF more space efficient than
BF. Bit × item cost decreases further with semi-sorting
optimization.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Space Occupancy
Load Factor
Crucial for the space
efficiency of the cuckoo
filter
α increases with b, and
b = 4 is already enough
to reach close to optimal
load factors.
Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical
scenarios C < 1.44log2(1/ε), making CF more space efficient than
BF. Bit × item cost decreases further with semi-sorting
optimization.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
ε, space efficiency and lookup cost
With empirical (close
to optimal) load
factors, cuckoo filter
is more efficient than
bloom filters for
target error rates
smaller than 3%.
In terms of cache-misses, CF outperforms BF for meaningful
values of ε
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
ε, space efficiency and lookup cost
With empirical (close
to optimal) load
factors, cuckoo filter
is more efficient than
bloom filters for
target error rates
smaller than 3%.
In terms of cache-misses, CF outperforms BF for meaningful
values of ε
In BF positive queries incour in k > 2 misses for ε < 25%
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
ε, space efficiency and lookup cost
With empirical (close
to optimal) load
factors, cuckoo filter
is more efficient than
bloom filters for
target error rates
smaller than 3%.
In terms of cache-misses, CF outperforms BF for meaningful
values of ε
In BF positive queries incour in k > 2 misses for ε < 25%
negative queries have on average the same cost.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
ε, space efficiency and lookup cost
With empirical (close
to optimal) load
factors, cuckoo filter
is more efficient than
bloom filters for
target error rates
smaller than 3%.
In terms of cache-misses, CF outperforms BF for meaningful
values of ε
In BF positive queries incour in k > 2 misses for ε < 25%
negative queries have on average the same cost.
In both cases maintaining a fixed ε might require expansion as
the set grows.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
ε, space efficiency and lookup cost
With empirical (close
to optimal) load
factors, cuckoo filter
is more efficient than
bloom filters for
target error rates
smaller than 3%.
In terms of cache-misses, CF outperforms BF for meaningful
values of ε
In BF positive queries incour in k > 2 misses for ε < 25%
negative queries have on average the same cost.
In both cases maintaining a fixed ε might require expansion as
the set grows.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Subsubsection 1
Benchmark
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Experiment
Random 64-bit elements, with 64 bits hash
Filters configured to have the same space occupancy of 192MB
Metrics
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Experiment
Random 64-bit elements, with 64 bits hash
Filters configured to have the same space occupancy of 192MB
Metrics
Space efficiency, achieved false positive rate, construction rate
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Experiment
Random 64-bit elements, with 64 bits hash
Filters configured to have the same space occupancy of 192MB
Metrics
Space efficiency, achieved false positive rate, construction rate
Insert, Delete and Lookup throughput
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Experiment
Random 64-bit elements, with 64 bits hash
Filters configured to have the same space occupancy of 192MB
Metrics
Space efficiency, achieved false positive rate, construction rate
Insert, Delete and Lookup throughput
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
Bloom Filter
bits × item: 13
ε: 0.19%
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
ss-CF optimization allows
for lower ε (0.09%) as one
more bit can be encoded.
Bloom Filter
bits × item: 13
ε: 0.19%
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
ss-CF optimization allows
for lower ε (0.09%) as one
more bit can be encoded.
CF construction rate:
5×106keys/sec; ss-CF
construction rate:
3.13×106keys/sec
Bloom Filter
bits × item: 13
ε: 0.19%
BF construction rate:
3.91×106keys/sec
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
ss-CF optimization allows
for lower ε (0.09%) as one
more bit can be encoded.
CF construction rate:
5×106keys/sec; ss-CF
construction rate:
3.13×106keys/sec
Bloom Filter
bits × item: 13
ε: 0.19%
BF construction rate:
3.91×106keys/sec
Blocked Bloom Filter
optimization has a
construction rate f
7.64×106keys/sec
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
ss-CF optimization allows
for lower ε (0.09%) as one
more bit can be encoded.
CF construction rate:
5×106keys/sec; ss-CF
construction rate:
3.13×106keys/sec
Bloom Filter
bits × item: 13
ε: 0.19%
BF construction rate:
3.91×106keys/sec
Blocked Bloom Filter
optimization has a
construction rate f
7.64×106keys/sec
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Cuckoo Filter
bits × item: 12.60
ε: 0.19%
ss-CF optimization allows
for lower ε (0.09%) as one
more bit can be encoded.
CF construction rate:
5×106keys/sec; ss-CF
construction rate:
3.13×106keys/sec
Bloom Filter
bits × item: 13
ε: 0.19%
BF construction rate:
3.91×106keys/sec
Blocked Bloom Filter
optimization has a
construction rate f
7.64×106keys/sec
Semi Sorted implementation of Cuckoo Filter outperforms all
competitors in all metrics, except construction rate.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Lookup performance
Experiment characterized by a fraction p of items belonging to the
set.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Lookup performance
Experiment characterized by a fraction p of items belonging to the
set.
Drop in performance for p in the range 50% - 75% of CF due to
branch misprediction.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Lookup performance
Experiment characterized by a fraction p of items belonging to the
set.
Drop in performance for p in the range 50% - 75% of CF due to
branch misprediction.
ss-CF is slower due to decoding, but still outperforms BF.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Lookup performance
Experiment characterized by a fraction p of items belonging to the
set.
Drop in performance for p in the range 50% - 75% of CF due to
branch misprediction.
ss-CF is slower due to decoding, but still outperforms BF.
Both are stable independently of the load factor.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Lookup performance
Experiment characterized by a fraction p of items belonging to the
set.
Drop in performance for p in the range 50% - 75% of CF due to
branch misprediction.
ss-CF is slower due to decoding, but still outperforms BF.
Both are stable independently of the load factor.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insert and Delete Performance
Insert performance is widely
affected by load factor.
Throughput for CF and variant
ss-CF is inversely proportional to
α.
CF delete performance
significantly outperform all Bloom
Filter based alternatives. ss-CF is
slower. Both maintain constant
performance across changing α.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Insertion failure probability
Space analysis
Comparison with Bloom Filters
Insert and Delete Performance
Insert performance is widely
affected by load factor.
Throughput for CF and variant
ss-CF is inversely proportional to
α.
CF delete performance
significantly outperform all Bloom
Filter based alternatives. ss-CF is
slower. Both maintain constant
performance across changing α.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Section 4
Conclusion
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Conclusion
Cuckoo Filter and its semi sorted optimization are viable
alternatives to replace Bloom Filer for small false positive
rates.
Space efficiency and presence of delete operation makes it
viable for dynamic and highly changing scenarios (e.g. sensor
networks)
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Conclusion
Cuckoo Filter and its semi sorted optimization are viable
alternatives to replace Bloom Filer for small false positive
rates.
Space efficiency and presence of delete operation makes it
viable for dynamic and highly changing scenarios (e.g. sensor
networks)
The elevate lookup performance makes them viable for network
application requiring high speed approximate set membership.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Conclusion
Cuckoo Filter and its semi sorted optimization are viable
alternatives to replace Bloom Filer for small false positive
rates.
Space efficiency and presence of delete operation makes it
viable for dynamic and highly changing scenarios (e.g. sensor
networks)
The elevate lookup performance makes them viable for network
application requiring high speed approximate set membership.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Dharmapurikar, Krishnamurthy and Taylor Longest Prefix
Match using Bloom Filters, SIGCOMM 2003.
Yu, Fabrikant and Rexford BUFFALO: Bloom Filter Forwarding
Architecture for Large Organizations, CoNEXT’09, December
1-4, 2009, Rome, Italy.
Pagh: Cuckoo Hashing for Undergraduates.
Pagh and Rodle Cuckoo Hashing Proceedings of European
Symposium on Algorithms, 2001.
Chen An Overview of Cuckoo Hashing.
Fan, Andersen, Kaminsky, Mitzenmacher Cuckoo Filter:
Practically Better Than Bloom, CoNEXT ’14, 02-05 December
2014, Sidney, Australia.
Alessandro Lenzi Cuckoo Filter: practically better than Bloom
Introduction
Cuckoo Filter
Analysis
Conclusion
Godritch and Mitzenmacher Invertible Bloom Lookup Table,
arXiv, 3 May 2011.
Broder and Mitzenmacher Network Applications of Bloom
Filters: A Survey.
Deri High-Speed Dynamic Packet Filtering
Alessandro Lenzi Cuckoo Filter: practically better than Bloom

More Related Content

PPTX
Predicate logic
PDF
MG6088 SOFTWARE PROJECT MANAGEMENT
PPT
Sets and disjoint sets union123
PPT
Divide and Conquer
PPTX
Data link layer
PDF
Nlp ambiguity presentation
PPTX
blackboard architecture
PPTX
Fault Tolerance System
Predicate logic
MG6088 SOFTWARE PROJECT MANAGEMENT
Sets and disjoint sets union123
Divide and Conquer
Data link layer
Nlp ambiguity presentation
blackboard architecture
Fault Tolerance System

What's hot (20)

PPT
Operating System: Deadlock
PPT
Compiler Design Unit 1
PDF
Hardware implementation of page table
PPTX
Artificial Intelligence Notes Unit 4
PPT
L03 ai - knowledge representation using logic
PPTX
Dempster shafer theory
PPTX
Function Point Analysis
PPT
Software Metrics
PPTX
Checkpoints of the Process
PDF
Divide&Conquer & Dynamic Programming
PPT
Predicate logic_2(Artificial Intelligence)
PPTX
Deadlock ppt
PPTX
Unit 1 introduction to computer networks
PPT
Unit 1 chapter 1 Design and Analysis of Algorithms
PPTX
Huffman coding || Huffman Tree
PPTX
Mutual exclusion in distributed systems
PPTX
object oriented methodologies
PPTX
Language Model (N-Gram).pptx
PDF
Distributed deadlock
PPTX
Dijkstra & flooding ppt(Routing algorithm)
Operating System: Deadlock
Compiler Design Unit 1
Hardware implementation of page table
Artificial Intelligence Notes Unit 4
L03 ai - knowledge representation using logic
Dempster shafer theory
Function Point Analysis
Software Metrics
Checkpoints of the Process
Divide&Conquer & Dynamic Programming
Predicate logic_2(Artificial Intelligence)
Deadlock ppt
Unit 1 introduction to computer networks
Unit 1 chapter 1 Design and Analysis of Algorithms
Huffman coding || Huffman Tree
Mutual exclusion in distributed systems
object oriented methodologies
Language Model (N-Gram).pptx
Distributed deadlock
Dijkstra & flooding ppt(Routing algorithm)
Ad

Similar to Cuckoo Filter: Practically Better than Bloom (20)

PDF
Bloom filter
PDF
Tutorial 9 (bloom filters)
PPTX
Probabilistic data structure
PPT
New zealand bloom filter
PPTX
Unit 5 Streams2.pptx
PPTX
streamingalgo88585858585858585pppppp.pptx
PDF
Probabilistic algorithms for fun and pseudorandom profit
PDF
Bloom filter
PDF
Introduction to Bloom Filters
PPTX
Bloom-Filters-A-Comprehensive-Guide with CSharp Sample
PDF
Bloom Filters: An Introduction
PPTX
Lecture_3.pptx
PPTX
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
PDF
On Improving the Performance of Data Leak Prevention using White-list Approach
PDF
Bigdata analytics
PPTX
Bloom filters
PPTX
Bucket sort- A Noncomparision Algorithm
PDF
Bv25430436
PDF
Approximate "Now" is Better Than Accurate "Later"
Bloom filter
Tutorial 9 (bloom filters)
Probabilistic data structure
New zealand bloom filter
Unit 5 Streams2.pptx
streamingalgo88585858585858585pppppp.pptx
Probabilistic algorithms for fun and pseudorandom profit
Bloom filter
Introduction to Bloom Filters
Bloom-Filters-A-Comprehensive-Guide with CSharp Sample
Bloom Filters: An Introduction
Lecture_3.pptx
Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)
On Improving the Performance of Data Leak Prevention using White-list Approach
Bigdata analytics
Bloom filters
Bucket sort- A Noncomparision Algorithm
Bv25430436
Approximate "Now" is Better Than Accurate "Later"
Ad

Recently uploaded (20)

PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Digital Logic Computer Design lecture notes
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
composite construction of structures.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
Project quality management in manufacturing
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Geodesy 1.pptx...............................................
PPTX
Artificial Intelligence
PPTX
Current and future trends in Computer Vision.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
bas. eng. economics group 4 presentation 1.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Digital Logic Computer Design lecture notes
CYBER-CRIMES AND SECURITY A guide to understanding
Automation-in-Manufacturing-Chapter-Introduction.pdf
OOP with Java - Java Introduction (Basics)
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
composite construction of structures.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Project quality management in manufacturing
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Embodied AI: Ushering in the Next Era of Intelligent Systems
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Foundation to blockchain - A guide to Blockchain Tech
Geodesy 1.pptx...............................................
Artificial Intelligence
Current and future trends in Computer Vision.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

Cuckoo Filter: Practically Better than Bloom

  • 1. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter: practically better than Bloom AED 2013/2014 Presentation Alessandro Lenzi Master in Computer Science and Networking Università di Pisa and Scuola Superiore Sant’Anna February 23, 2015 From Cuckoo Filter: Practically Better than Bloom by Fan, Andersen, Kaminsky and Mitzenmacher [6] Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 2. Introduction Cuckoo Filter Analysis Conclusion Table of contents 1 Introduction Bloom Filters and extensions 2 Cuckoo Filter 3 Analysis Insertion failure probability Space analysis Comparison with Bloom Filters Benchmark Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 3. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Section 1 Introduction Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 4. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Lookup, Classification and approximate set membership Approximate set membership tests are often used for high speed packet processing Allow to rule out elements not in the set in a fast and space efficient manner Both Lookup and Monitor require fast membership tests Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 5. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Lookup, Classification and approximate set membership Approximate set membership tests are often used for high speed packet processing Allow to rule out elements not in the set in a fast and space efficient manner Both Lookup and Monitor require fast membership tests Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 6. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Application of approximate set membership Search on Prefix Lengths using Bloom Filters[1] Associate a Bloom Filter to each possible prefix length Perform parallel membership queries on all possible lengths. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 7. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Application of approximate set membership Search on Prefix Lengths using Bloom Filters[1] Associate a Bloom Filter to each possible prefix length Perform parallel membership queries on all possible lengths. BUFFALO [2] Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 8. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Application of approximate set membership Search on Prefix Lengths using Bloom Filters[1] Associate a Bloom Filter to each possible prefix length Perform parallel membership queries on all possible lengths. BUFFALO [2] Entire lookup in fast memory, at the expense of slightly longer paths in worst case Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 9. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Application of approximate set membership Search on Prefix Lengths using Bloom Filters[1] Associate a Bloom Filter to each possible prefix length Perform parallel membership queries on all possible lengths. BUFFALO [2] Entire lookup in fast memory, at the expense of slightly longer paths in worst case Build a Bloom Filter BF(h) for each next hop h Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 10. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Application of approximate set membership Search on Prefix Lengths using Bloom Filters[1] Associate a Bloom Filter to each possible prefix length Perform parallel membership queries on all possible lengths. BUFFALO [2] Entire lookup in fast memory, at the expense of slightly longer paths in worst case Build a Bloom Filter BF(h) for each next hop h Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 11. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Subsection 1 Bloom Filters and extensions Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 12. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filters The most famous approximate set membership data structure. Allows to represent an item in the set with 1.44log2(1/ε) bit, very close to the information-theoretic minimum, log2(1/ε). Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 13. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filters The most famous approximate set membership data structure. Allows to represent an item in the set with 1.44log2(1/ε) bit, very close to the information-theoretic minimum, log2(1/ε). The probability of false positive ε can be made arbitrarily small at the expense of space efficiency. How Bloom Filters Work The data structure is a bit array BF of size n Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 14. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filters The most famous approximate set membership data structure. Allows to represent an item in the set with 1.44log2(1/ε) bit, very close to the information-theoretic minimum, log2(1/ε). The probability of false positive ε can be made arbitrarily small at the expense of space efficiency. How Bloom Filters Work The data structure is a bit array BF of size n When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 15. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filters The most famous approximate set membership data structure. Allows to represent an item in the set with 1.44log2(1/ε) bit, very close to the information-theoretic minimum, log2(1/ε). The probability of false positive ε can be made arbitrarily small at the expense of space efficiency. How Bloom Filters Work The data structure is a bit array BF of size n When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k Lookup(x) returns true when BF[h1(x)] = 1∧BF[h2(x)] = 1∧...∧BF[hk(x)] = 1. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 16. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filters The most famous approximate set membership data structure. Allows to represent an item in the set with 1.44log2(1/ε) bit, very close to the information-theoretic minimum, log2(1/ε). The probability of false positive ε can be made arbitrarily small at the expense of space efficiency. How Bloom Filters Work The data structure is a bit array BF of size n When x is inserted, BF[hi (x)] is set, for 1 ≤ i ≤ k Lookup(x) returns true when BF[h1(x)] = 1∧BF[h2(x)] = 1∧...∧BF[hk(x)] = 1. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 17. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 18. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 19. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 20. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. The number of cache misses falls to 1 for lookup query, increasing false positive rates. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 21. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. The number of cache misses falls to 1 for lookup query, increasing false positive rates. It is not possible to list elements in the set using only the Bloom Filter data structure Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 22. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. The number of cache misses falls to 1 for lookup query, increasing false positive rates. It is not possible to list elements in the set using only the Bloom Filter data structure Need of another way to store the elements to check in case of false positive or to retrieve the set. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 23. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. The number of cache misses falls to 1 for lookup query, increasing false positive rates. It is not possible to list elements in the set using only the Bloom Filter data structure Need of another way to store the elements to check in case of false positive or to retrieve the set. Invertible Bloom Lookup Table [7] allow for deletion and listing (successful w.h.p) of elements. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 24. Introduction Cuckoo Filter Analysis Conclusion Bloom Filters and extensions Bloom Filter limitations Classical Bloom Filter don’t allow deletion of elements from the set without inserting false negatives. Counting Bloom Filters allow deletions, using counters of c bits. Every lookup query causes k cache misses, impairing lookup time Blocked Bloom Filters provide spatial locality by forcing all k hashes belonging to element x to fall on the same cache-line. The number of cache misses falls to 1 for lookup query, increasing false positive rates. It is not possible to list elements in the set using only the Bloom Filter data structure Need of another way to store the elements to check in case of false positive or to retrieve the set. Invertible Bloom Lookup Table [7] allow for deletion and listing (successful w.h.p) of elements. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 25. Introduction Cuckoo Filter Analysis Conclusion Section 2 Cuckoo Filter Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 26. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 27. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 28. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Select a victim v between T[h1(x)] and T[h2(x)] Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 29. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Select a victim v between T[h1(x)] and T[h2(x)] Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the procedure as done for x. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 30. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Select a victim v between T[h1(x)] and T[h2(x)] Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the procedure as done for x. In case a threshold n on the maximum number of movements is reached, rebuild the hash table from scratch. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 31. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Select a victim v between T[h1(x)] and T[h2(x)] Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the procedure as done for x. In case a threshold n on the maximum number of movements is reached, rebuild the hash table from scratch. Further improvements from buckets in place of single element storage. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 32. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Hashing [3] [4] Exploits the power of 2-choices to provide lookup of an element in a hash table in O(1) worst case, rather than in expectation. An element x can be stored either in T[h1(x)] or T[h2(x)]. In case both positions are occupied: Select a victim v between T[h1(x)] and T[h2(x)] Try to store v in T[h1(v)] or T[h2(v)], otherwise repeat the procedure as done for x. In case a threshold n on the maximum number of movements is reached, rebuild the hash table from scratch. Further improvements from buckets in place of single element storage. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 33. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - The idea Intuition Cuckoo Hashing can be used to check for set membership in O(1), but is not as spatially efficient as approximate set membership data structures. For set membership queries, store a fingerprint, rather than (key,value) pairs! Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 34. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - The idea Intuition Cuckoo Hashing can be used to check for set membership in O(1), but is not as spatially efficient as approximate set membership data structures. For set membership queries, store a fingerprint, rather than (key,value) pairs! Partial-key Cuckoo Hashing Storing fingerprint makes it impossible to find alternative locations and perform rehashing. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 35. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - The idea Intuition Cuckoo Hashing can be used to check for set membership in O(1), but is not as spatially efficient as approximate set membership data structures. For set membership queries, store a fingerprint, rather than (key,value) pairs! Partial-key Cuckoo Hashing Storing fingerprint makes it impossible to find alternative locations and perform rehashing. Partial key cuckoo hashing derivers alternative location using x’s fingerprint f and the current position i = h1(x) or h2(x): h2(x) = h1(x)⊕hash(f ) Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 36. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - The idea Intuition Cuckoo Hashing can be used to check for set membership in O(1), but is not as spatially efficient as approximate set membership data structures. For set membership queries, store a fingerprint, rather than (key,value) pairs! Partial-key Cuckoo Hashing Storing fingerprint makes it impossible to find alternative locations and perform rehashing. Partial key cuckoo hashing derivers alternative location using x’s fingerprint f and the current position i = h1(x) or h2(x): h2(x) = h1(x)⊕hash(f ) Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 37. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - The idea Intuition Cuckoo Hashing can be used to check for set membership in O(1), but is not as spatially efficient as approximate set membership data structures. For set membership queries, store a fingerprint, rather than (key,value) pairs! Partial-key Cuckoo Hashing Storing fingerprint makes it impossible to find alternative locations and perform rehashing. Partial key cuckoo hashing derivers alternative location using x’s fingerprint f and the current position i = h1(x) or h2(x): h2(x) = h1(x)⊕hash(f ) Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 38. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - the data structure m is the number of buckets b is the number of entries (fingerprints) for each bucket f is the number of bits used to represent a fingerprint Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 39. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 40. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 41. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 42. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Try to insert fv in j = h⊕hash(fv ), otherwise kick out another element and repeat for at most maxKicks elements. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 43. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Try to insert fv in j = h⊕hash(fv ), otherwise kick out another element and repeat for at most maxKicks elements. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 44. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Try to insert fv in j = h⊕hash(fv ), otherwise kick out another element and repeat for at most maxKicks elements. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 45. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Try to insert fv in j = h⊕hash(fv ), otherwise kick out another element and repeat for at most maxKicks elements. Copies or elements with same fingerprint can be inserted up to 2b times. To increase the limit we can add counters. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 46. Introduction Cuckoo Filter Analysis Conclusion Cuckoo Filter - Insert The fingerprint of x, fx = fingerprint(x) can be stored either in bucket i = hash(x) or j = i ⊕hash(fx ) if one has less than b elements. Hash fx used to spread colliding elements If both are full, fv is picked at random between the 2b fingerprints elements stored in T[i] and T[j] Replace fv in bucket h ∈ {i,j} with fx Try to insert fv in j = h⊕hash(fv ), otherwise kick out another element and repeat for at most maxKicks elements. Copies or elements with same fingerprint can be inserted up to 2b times. To increase the limit we can add counters. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 47. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 48. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Deleting x requires removing from T an instance of fx Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 49. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Deleting x requires removing from T an instance of fx The delete procedure avoids the false deletion problem. However requires that x must have been inserted previously. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 50. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Deleting x requires removing from T an instance of fx The delete procedure avoids the false deletion problem. However requires that x must have been inserted previously. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 51. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Deleting x requires removing from T an instance of fx The delete procedure avoids the false deletion problem. However requires that x must have been inserted previously. The cost of the operation is O(2b). Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 52. Introduction Cuckoo Filter Analysis Conclusion Lookup and Delete Operations Lookup Given fx = fingerprint(x), check for it in buckets i = hash(x) and j = i ⊕hash(fx ) The cost of the operation is O(2b), with b usually small. Delete Deleting x requires removing from T an instance of fx The delete procedure avoids the false deletion problem. However requires that x must have been inserted previously. The cost of the operation is O(2b). Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 53. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Section 3 Analysis Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 54. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Subsection 1 Insertion failure probability Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 55. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure Insertion failure in cuckoo hashing [5] In CH, insert(x) can fail only if, in the sequence x1 = x,x2 = v,...,xp of elements removed and then reinserted in the hash ∃i,j ∈ [0,p] : xi = xj . Insertion failure in cuckoo filter Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 56. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure Insertion failure in cuckoo hashing [5] In CH, insert(x) can fail only if, in the sequence x1 = x,x2 = v,...,xp of elements removed and then reinserted in the hash ∃i,j ∈ [0,p] : xi = xj . Insertion failure in cuckoo filter Obiously, a similar condition holds for CF. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 57. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure Insertion failure in cuckoo hashing [5] In CH, insert(x) can fail only if, in the sequence x1 = x,x2 = v,...,xp of elements removed and then reinserted in the hash ∃i,j ∈ [0,p] : xi = xj . Insertion failure in cuckoo filter Obiously, a similar condition holds for CF. However insert(x) always fails if there are 2b +1 elements with the same candidate buckets i and j. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 58. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure Insertion failure in cuckoo hashing [5] In CH, insert(x) can fail only if, in the sequence x1 = x,x2 = v,...,xp of elements removed and then reinserted in the hash ∃i,j ∈ [0,p] : xi = xj . Insertion failure in cuckoo filter Obiously, a similar condition holds for CF. However insert(x) always fails if there are 2b +1 elements with the same candidate buckets i and j. This event probability dominates the insertion failure probability Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 59. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure Insertion failure in cuckoo hashing [5] In CH, insert(x) can fail only if, in the sequence x1 = x,x2 = v,...,xp of elements removed and then reinserted in the hash ∃i,j ∈ [0,p] : xi = xj . Insertion failure in cuckoo filter Obiously, a similar condition holds for CF. However insert(x) always fails if there are 2b +1 elements with the same candidate buckets i and j. This event probability dominates the insertion failure probability Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 60. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion Failure Probability (1) CH’s insertion failure analysis cannot be used for CF, because: Candidates positions for an element are not independent as, given a certain i = hash(x), j may assume at most 2f different values. Thus it is often chosen only among a subset of positions. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 61. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion Failure Probability (1) CH’s insertion failure analysis cannot be used for CF, because: Candidates positions for an element are not independent as, given a certain i = hash(x), j may assume at most 2f different values. Thus it is often chosen only among a subset of positions. For large f and m not too big, partial key hashing is similar to cuckoo. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 62. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion Failure Probability (1) CH’s insertion failure analysis cannot be used for CF, because: Candidates positions for an element are not independent as, given a certain i = hash(x), j may assume at most 2f different values. Thus it is often chosen only among a subset of positions. For large f and m not too big, partial key hashing is similar to cuckoo. Otherwise insertion failure probability increases. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 63. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion Failure Probability (1) CH’s insertion failure analysis cannot be used for CF, because: Candidates positions for an element are not independent as, given a certain i = hash(x), j may assume at most 2f different values. Thus it is often chosen only among a subset of positions. For large f and m not too big, partial key hashing is similar to cuckoo. Otherwise insertion failure probability increases. It is possible to provide a lower bound on the insertion failure probability by calculating P{|Si,j | = 2b +1}, where Si,j = {x : x’s candidate buckets are i and j}. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 64. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion Failure Probability (1) CH’s insertion failure analysis cannot be used for CF, because: Candidates positions for an element are not independent as, given a certain i = hash(x), j may assume at most 2f different values. Thus it is often chosen only among a subset of positions. For large f and m not too big, partial key hashing is similar to cuckoo. Otherwise insertion failure probability increases. It is possible to provide a lower bound on the insertion failure probability by calculating P{|Si,j | = 2b +1}, where Si,j = {x : x’s candidate buckets are i and j}. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 65. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 66. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 67. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: 1 Have the same fingerprint ( fx = fy ), which occurs with probability 1 2f and Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 68. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: 1 Have the same fingerprint ( fx = fy ), which occurs with probability 1 2f and 2 Have i or j in common, happens with probability 2 m Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 69. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: 1 Have the same fingerprint ( fx = fy ), which occurs with probability 1 2f and 2 Have i or j in common, happens with probability 2 m Thus P{|Si,j | = q} = n q ( 2 2f m )q−1 Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 70. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: 1 Have the same fingerprint ( fx = fy ), which occurs with probability 1 2f and 2 Have i or j in common, happens with probability 2 m Thus P{|Si,j | = q} = n q ( 2 2f m )q−1 Insertion failure probability is at least n 2b+1 ( 2 2f m )2b = Ω n 4bf for m = cn Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 71. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insertion failure probability (2) Since fx length will be much smaller than its hash one, we will assume no collisions on fingerprint’s hashing. For two elements x and y to collide they must: 1 Have the same fingerprint ( fx = fy ), which occurs with probability 1 2f and 2 Have i or j in common, happens with probability 2 m Thus P{|Si,j | = q} = n q ( 2 2f m )q−1 Insertion failure probability is at least n 2b+1 ( 2 2f m )2b = Ω n 4bf for m = cn Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 72. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Bits per element Minimum Fingerprint Size For a non-trivial insertion failure probability we need 4bf = Ω(n), or, in other words, f = Ω(log(n/b)) Comparison with Bloom Filter In CF Ω(log(n/b)) bits are used to represent an element, against the ln(1/ε) = O(1) required by Bloom Filter. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 73. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Bits per element Minimum Fingerprint Size For a non-trivial insertion failure probability we need 4bf = Ω(n), or, in other words, f = Ω(log(n/b)) Comparison with Bloom Filter In CF Ω(log(n/b)) bits are used to represent an element, against the ln(1/ε) = O(1) required by Bloom Filter. CF is asymptotically worst, but in practice the denominator b saves the day! Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 74. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Bits per element Minimum Fingerprint Size For a non-trivial insertion failure probability we need 4bf = Ω(n), or, in other words, f = Ω(log(n/b)) Comparison with Bloom Filter In CF Ω(log(n/b)) bits are used to represent an element, against the ln(1/ε) = O(1) required by Bloom Filter. CF is asymptotically worst, but in practice the denominator b saves the day! With ε < 3%, the number of bits required to represent an element with CF is less than the one required by BF! Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 75. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Bits per element Minimum Fingerprint Size For a non-trivial insertion failure probability we need 4bf = Ω(n), or, in other words, f = Ω(log(n/b)) Comparison with Bloom Filter In CF Ω(log(n/b)) bits are used to represent an element, against the ln(1/ε) = O(1) required by Bloom Filter. CF is asymptotically worst, but in practice the denominator b saves the day! With ε < 3%, the number of bits required to represent an element with CF is less than the one required by BF! In practice f , for b large enough can be treated as a reasonable sized constant. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 76. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Bits per element Minimum Fingerprint Size For a non-trivial insertion failure probability we need 4bf = Ω(n), or, in other words, f = Ω(log(n/b)) Comparison with Bloom Filter In CF Ω(log(n/b)) bits are used to represent an element, against the ln(1/ε) = O(1) required by Bloom Filter. CF is asymptotically worst, but in practice the denominator b saves the day! With ε < 3%, the number of bits required to represent an element with CF is less than the one required by BF! In practice f , for b large enough can be treated as a reasonable sized constant. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 77. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Subsection 2 Space analysis Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 78. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 79. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 80. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 81. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: f ≥ log2(2b/ε) = log2(2b)+log2(1/ε) Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 82. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: f ≥ log2(2b/ε) = log2(2b)+log2(1/ε) Finally, the cost for representing a single element, or amortized space cost C is defined as: Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 83. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: f ≥ log2(2b/ε) = log2(2b)+log2(1/ε) Finally, the cost for representing a single element, or amortized space cost C is defined as: C = table size items stored = f ×b ×m n = f α ≤ log2(2b)+log2(1/ε) α Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 84. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: f ≥ log2(2b/ε) = log2(2b)+log2(1/ε) Finally, the cost for representing a single element, or amortized space cost C is defined as: C = table size items stored = f ×b ×m n = f α ≤ log2(2b)+log2(1/ε) α With α = n b×m load factor. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 85. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Amortized space cost False Positive Probability A false positive occurs when searching for x if either bucket i or j contains fy = fx . P{false positive for x} = 1−P{all different from fx } = 1−P{an element is different from fx }2b = 1−(1− 1 2f )2b ≈ 2b 2f If we want a certain error probability ε, given b: f ≥ log2(2b/ε) = log2(2b)+log2(1/ε) Finally, the cost for representing a single element, or amortized space cost C is defined as: C = table size items stored = f ×b ×m n = f α ≤ log2(2b)+log2(1/ε) α With α = n b×m load factor. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 86. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Optimization: Buckets semi sorting All operations are insensible to the relative ordering of elements in the buckets Since the number of sorted sequences is 2f +b−1 b , a table containing all them can be pre-computed. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 87. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Optimization: Buckets semi sorting All operations are insensible to the relative ordering of elements in the buckets Since the number of sorted sequences is 2f +b−1 b , a table containing all them can be pre-computed. Buckets can be replaced with an offset in the table, achieving some compression. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 88. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Optimization: Buckets semi sorting All operations are insensible to the relative ordering of elements in the buckets Since the number of sorted sequences is 2f +b−1 b , a table containing all them can be pre-computed. Buckets can be replaced with an offset in the table, achieving some compression. If such table is retained in fast memory, the further indirection penalty is negligible. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 89. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Optimization: Buckets semi sorting All operations are insensible to the relative ordering of elements in the buckets Since the number of sorted sequences is 2f +b−1 b , a table containing all them can be pre-computed. Buckets can be replaced with an offset in the table, achieving some compression. If such table is retained in fast memory, the further indirection penalty is negligible. Example with b = 4 and f = 4 In this case an uncompressed bucket occupies 16 bits. However, since the sorted buckets are at most 3876, they can be represented with only 12 bits, sparing a bit for element. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 90. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Optimization: Buckets semi sorting All operations are insensible to the relative ordering of elements in the buckets Since the number of sorted sequences is 2f +b−1 b , a table containing all them can be pre-computed. Buckets can be replaced with an offset in the table, achieving some compression. If such table is retained in fast memory, the further indirection penalty is negligible. Example with b = 4 and f = 4 In this case an uncompressed bucket occupies 16 bits. However, since the sorted buckets are at most 3876, they can be represented with only 12 bits, sparing a bit for element. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 91. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Subsection 3 Comparison with Bloom Filters Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 92. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Occupancy Load Factor Crucial for the space efficiency of the cuckoo filter α increases with b, and b = 4 is already enough to reach close to optimal load factors. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 93. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Occupancy Load Factor Crucial for the space efficiency of the cuckoo filter α increases with b, and b = 4 is already enough to reach close to optimal load factors. Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical scenarios C < 1.44log2(1/ε), making CF more space efficient than BF. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 94. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Occupancy Load Factor Crucial for the space efficiency of the cuckoo filter α increases with b, and b = 4 is already enough to reach close to optimal load factors. Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical scenarios C < 1.44log2(1/ε), making CF more space efficient than BF. Bit × item cost decreases further with semi-sorting optimization. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 95. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Space Occupancy Load Factor Crucial for the space efficiency of the cuckoo filter α increases with b, and b = 4 is already enough to reach close to optimal load factors. Since C ≤ log2(1/ε)+log2(2b) /α, we can see that for typical scenarios C < 1.44log2(1/ε), making CF more space efficient than BF. Bit × item cost decreases further with semi-sorting optimization. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 96. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters ε, space efficiency and lookup cost With empirical (close to optimal) load factors, cuckoo filter is more efficient than bloom filters for target error rates smaller than 3%. In terms of cache-misses, CF outperforms BF for meaningful values of ε Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 97. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters ε, space efficiency and lookup cost With empirical (close to optimal) load factors, cuckoo filter is more efficient than bloom filters for target error rates smaller than 3%. In terms of cache-misses, CF outperforms BF for meaningful values of ε In BF positive queries incour in k > 2 misses for ε < 25% Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 98. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters ε, space efficiency and lookup cost With empirical (close to optimal) load factors, cuckoo filter is more efficient than bloom filters for target error rates smaller than 3%. In terms of cache-misses, CF outperforms BF for meaningful values of ε In BF positive queries incour in k > 2 misses for ε < 25% negative queries have on average the same cost. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 99. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters ε, space efficiency and lookup cost With empirical (close to optimal) load factors, cuckoo filter is more efficient than bloom filters for target error rates smaller than 3%. In terms of cache-misses, CF outperforms BF for meaningful values of ε In BF positive queries incour in k > 2 misses for ε < 25% negative queries have on average the same cost. In both cases maintaining a fixed ε might require expansion as the set grows. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 100. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters ε, space efficiency and lookup cost With empirical (close to optimal) load factors, cuckoo filter is more efficient than bloom filters for target error rates smaller than 3%. In terms of cache-misses, CF outperforms BF for meaningful values of ε In BF positive queries incour in k > 2 misses for ε < 25% negative queries have on average the same cost. In both cases maintaining a fixed ε might require expansion as the set grows. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 101. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Subsubsection 1 Benchmark Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 102. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Experiment Random 64-bit elements, with 64 bits hash Filters configured to have the same space occupancy of 192MB Metrics Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 103. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Experiment Random 64-bit elements, with 64 bits hash Filters configured to have the same space occupancy of 192MB Metrics Space efficiency, achieved false positive rate, construction rate Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 104. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Experiment Random 64-bit elements, with 64 bits hash Filters configured to have the same space occupancy of 192MB Metrics Space efficiency, achieved false positive rate, construction rate Insert, Delete and Lookup throughput Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 105. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Experiment Random 64-bit elements, with 64 bits hash Filters configured to have the same space occupancy of 192MB Metrics Space efficiency, achieved false positive rate, construction rate Insert, Delete and Lookup throughput Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 106. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% Bloom Filter bits × item: 13 ε: 0.19% Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 107. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% ss-CF optimization allows for lower ε (0.09%) as one more bit can be encoded. Bloom Filter bits × item: 13 ε: 0.19% Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 108. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% ss-CF optimization allows for lower ε (0.09%) as one more bit can be encoded. CF construction rate: 5×106keys/sec; ss-CF construction rate: 3.13×106keys/sec Bloom Filter bits × item: 13 ε: 0.19% BF construction rate: 3.91×106keys/sec Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 109. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% ss-CF optimization allows for lower ε (0.09%) as one more bit can be encoded. CF construction rate: 5×106keys/sec; ss-CF construction rate: 3.13×106keys/sec Bloom Filter bits × item: 13 ε: 0.19% BF construction rate: 3.91×106keys/sec Blocked Bloom Filter optimization has a construction rate f 7.64×106keys/sec Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 110. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% ss-CF optimization allows for lower ε (0.09%) as one more bit can be encoded. CF construction rate: 5×106keys/sec; ss-CF construction rate: 3.13×106keys/sec Bloom Filter bits × item: 13 ε: 0.19% BF construction rate: 3.91×106keys/sec Blocked Bloom Filter optimization has a construction rate f 7.64×106keys/sec Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 111. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Cuckoo Filter bits × item: 12.60 ε: 0.19% ss-CF optimization allows for lower ε (0.09%) as one more bit can be encoded. CF construction rate: 5×106keys/sec; ss-CF construction rate: 3.13×106keys/sec Bloom Filter bits × item: 13 ε: 0.19% BF construction rate: 3.91×106keys/sec Blocked Bloom Filter optimization has a construction rate f 7.64×106keys/sec Semi Sorted implementation of Cuckoo Filter outperforms all competitors in all metrics, except construction rate. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 112. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Lookup performance Experiment characterized by a fraction p of items belonging to the set. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 113. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Lookup performance Experiment characterized by a fraction p of items belonging to the set. Drop in performance for p in the range 50% - 75% of CF due to branch misprediction. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 114. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Lookup performance Experiment characterized by a fraction p of items belonging to the set. Drop in performance for p in the range 50% - 75% of CF due to branch misprediction. ss-CF is slower due to decoding, but still outperforms BF. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 115. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Lookup performance Experiment characterized by a fraction p of items belonging to the set. Drop in performance for p in the range 50% - 75% of CF due to branch misprediction. ss-CF is slower due to decoding, but still outperforms BF. Both are stable independently of the load factor. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 116. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Lookup performance Experiment characterized by a fraction p of items belonging to the set. Drop in performance for p in the range 50% - 75% of CF due to branch misprediction. ss-CF is slower due to decoding, but still outperforms BF. Both are stable independently of the load factor. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 117. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insert and Delete Performance Insert performance is widely affected by load factor. Throughput for CF and variant ss-CF is inversely proportional to α. CF delete performance significantly outperform all Bloom Filter based alternatives. ss-CF is slower. Both maintain constant performance across changing α. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 118. Introduction Cuckoo Filter Analysis Conclusion Insertion failure probability Space analysis Comparison with Bloom Filters Insert and Delete Performance Insert performance is widely affected by load factor. Throughput for CF and variant ss-CF is inversely proportional to α. CF delete performance significantly outperform all Bloom Filter based alternatives. ss-CF is slower. Both maintain constant performance across changing α. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 119. Introduction Cuckoo Filter Analysis Conclusion Section 4 Conclusion Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 120. Introduction Cuckoo Filter Analysis Conclusion Conclusion Cuckoo Filter and its semi sorted optimization are viable alternatives to replace Bloom Filer for small false positive rates. Space efficiency and presence of delete operation makes it viable for dynamic and highly changing scenarios (e.g. sensor networks) Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 121. Introduction Cuckoo Filter Analysis Conclusion Conclusion Cuckoo Filter and its semi sorted optimization are viable alternatives to replace Bloom Filer for small false positive rates. Space efficiency and presence of delete operation makes it viable for dynamic and highly changing scenarios (e.g. sensor networks) The elevate lookup performance makes them viable for network application requiring high speed approximate set membership. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 122. Introduction Cuckoo Filter Analysis Conclusion Conclusion Cuckoo Filter and its semi sorted optimization are viable alternatives to replace Bloom Filer for small false positive rates. Space efficiency and presence of delete operation makes it viable for dynamic and highly changing scenarios (e.g. sensor networks) The elevate lookup performance makes them viable for network application requiring high speed approximate set membership. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 123. Introduction Cuckoo Filter Analysis Conclusion Dharmapurikar, Krishnamurthy and Taylor Longest Prefix Match using Bloom Filters, SIGCOMM 2003. Yu, Fabrikant and Rexford BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations, CoNEXT’09, December 1-4, 2009, Rome, Italy. Pagh: Cuckoo Hashing for Undergraduates. Pagh and Rodle Cuckoo Hashing Proceedings of European Symposium on Algorithms, 2001. Chen An Overview of Cuckoo Hashing. Fan, Andersen, Kaminsky, Mitzenmacher Cuckoo Filter: Practically Better Than Bloom, CoNEXT ’14, 02-05 December 2014, Sidney, Australia. Alessandro Lenzi Cuckoo Filter: practically better than Bloom
  • 124. Introduction Cuckoo Filter Analysis Conclusion Godritch and Mitzenmacher Invertible Bloom Lookup Table, arXiv, 3 May 2011. Broder and Mitzenmacher Network Applications of Bloom Filters: A Survey. Deri High-Speed Dynamic Packet Filtering Alessandro Lenzi Cuckoo Filter: practically better than Bloom