論文紹介: Cuckoo filter: practically better than bloom
1. Cuckoo Filter: Practically
Better Than Bloom
紹介: nikezono
Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the
10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT '14). Association for Computing Machinery, New
York, NY, USA, 75–88. DOI:https://doi.org/10.1145/2674005.2674994
6. Cuckoo Filter
Bloom Filter を Cuckoo Hashing のテクニックを使って改善したもの.
貢献は以下:
1. It supports adding and removing items dynamically;
2. It provides higher lookup performance than traditional Bloom filters, even when close to full (e.g., 95% space utilized);
3. It is easier to implement than alternatives such as the quotient filter; and
4. It uses less space than Bloom filters in many practical applications, if the target false positive rate is less than 3%.
7. Cuckoo Filter: Introduction
A cuckoo filter is a compact variant of a cuckoo hash table [21] that stores only fingerprints—a bit string derived from the
item using a hash function—for each item inserted, instead of key-value pairs. … A set membership query for item x
simply searches the hash table for the fingerprint of x, and returns true if an identical fingerprint is found.
Cuckoo Filterは Cuckoo Hashing をほぼそのまま使っている.
Hashing は Key/Value を保存できるHash table だが, Filter のほうは Valueに fingerprint を保存する.
Lookup はこの fingerprint をHash table から検索する処理になる.
fingerprint をどう作るか,という設計がフィルタの性能(偽陽性)を左右する.
9. Cuckoo Filter: Introduction
Interestingly, while we show that cuckoo filters are practically better than Bloom filters for many real workloads, they are
asymptotically worse: the minimum fingerprint size used in the cuckoo filter grows logarithmically with the number of
entries in the table.
Cuckoo Filterは Practical にはBloomより性能が良いが,テーブルサイズが増大すると漸近的に性能が悪
くなっていく.fingerprint の最小サイズがテーブルサイズに対して増大していく.
(ので,空間効率もキャッシュ効率も悪くなる)
ここでいう Practical は < billion items
11. 2.2 Using Cuckoo Hashing for Set-membership
もと, cuckoo hash tableを set membership testing に使う例は過去にもいくつかある.この論文の著者陣はも
ともと cuckoo hash table を使ってset-membership testing の研究をしていた.
partial-key cuckoo hashing と呼ばれるテクニックを使ってhash table / NW switch を改善してきたが,
そのテクニックをBloom Filter 化したCuckoo に用いる,というのが今回の論旨.
Recently, standard cuckoo hash tables have been used to provide set membership information in a few applications. …
Our previous study in building high-speed and memory efficient key-value stores [17, 11] and software-based Ethernet
switches [26] all applied cuckoo hash tables as internal data structures. That work was motivated by and also focused on
improving hash table performance by an optimization called partial-key cuckoo hashing. However, as we show in this
paper, this technique also enabled a new approach to build a Bloom filter replacement which has not been studied
before.
13. Partial-key Cuckoo Hashing
The xor operation … ensures an important property: h1(x) can also be calculated from h2(x) and the fingerprint using the
same formula. In other words, to displace a key originally in bucket i (no matter if i is h1(x) or h2(x)), we directly calculate
its alternate bucket j from the current bucket index i and the fingerprint stored in this bucket by
このやり方でh1(x), h2(x) を作ると,片方のbucket (h1(x) or h2(x))とfingerprintがわかっていれば,もう片方
のbucket が求められる.ハッシュ関数2つを使うのと異なり,衝突もしない.さらに
`sizeof(fingerprint)` ビット
離れたindexとなることが保証されるため,空間効率も良くなる.(
#?)