NEIGHBOURHOOD PRESERVING
QUANTISATION FOR LSH
SEAN MORAN†, VICTOR LAVRENKO, MILES OSBORNE
† SEAN.MORAN@ED.AC.UK
RESEARCH QUESTION
• LSH uses 1 bit per hyperplane. Can we do better with multiple bits?
INTRODUCTION
• Problem: Fast Nearest Neighbour (NN) search in large datasets.
• Hashing-based approach:
– Generate a similarity preserving binary code (fingerprint).
– Use fingerprint as index into the buckets of a hash table.
– If collision occurs only compare to items in the same bucket.
110101
010111
111101
H
H
Query Compute
Similarity
Nearest
Neighbours
Query
Database
Hash Table
Content Based IR
Image: Imense Ltd
Image: Doersch et al.
Image: Xu et al.
Location Recognition
Near duplicate detection
010101
111101
.....
• Advantages:
– Constant query time (with respect to the database size).
– Compact binary codes are extremely storage efficient.
LOCALITY SENSITIVE HASHING (LSH) (INDYK AND MOTWANI, ’98)
• Randomized algorithm for approximate nearest neighbour (ANN)
search using binary codes.
• Probabilistic guarantee on retrieval accuracy versus search time.
• LSH for inner product similarity:
– Divide input space with L random hyperplanes (e.g. L=2):
x
y
n2
n1
h1
h2
11
0100
10
– Projection: Take dot product of data-point (x) with normal (n.x):
n2
n1
– Quantisation: Threshold and apply sign function:
0 1
t
n2
• Vanilla LSH: each hyperplane gives 1 bit of the binary code.
NEIGHBOURHOOD PRESERVING QUANTISATION (NPQ)
• Assigns multiple bits per hyperplane using multiple thresholds.
• F1 optimisation using pairwise constraints matrix S: if Sij = 1 then
points xi, xj with projections yi, yj are true nearest neighbours.
• TP: # Sij = 1 pairs in same region. FP: # Sij = 0 pairs in same
region. FN: # Sij = 1 pairs in different regions. Combine TP, FP, FN
using F1:
F1 score: 1.00
00 01 10 11
t1 t2 t3
n2
F1 score: 0.44
00 01 10 11
t1 t2 t3
n2
• Interpolate F1 with a regularisation term Ω(T1:u):
Znpq = αF1 + (1 − α)(1 − Ω(T1:u)) with: Ω(T1:u) =
1
σ
uX
a=0
X
i:yi∈ra
{yi − µra }2
where: σ =
nX
i=1
{yi − µd}2
, µd is dimension mean, µra is mean of region ra
.• Random restarts used to optimise Znpq. Time complexity ∼ O(N2
),
where N is # data points in training dataset.
EVALUATION PROTOCOL
• Task: Image retrieval on three image datasets: 22k LabelMe, CIFAR-
10 and 100k TinyImages. Images encoded with GIST descriptors.
• Projections: LSH, Shift-invariant kernel hashing (SIKH), Itera-
tive Quantisation (ITQ), Spectral Hashing (SH) and PCA-Hashing
(PCAH).
• Baselines: Single Bit Quantisation (SBQ), Manhattan Hashing (MQ)
(Kong et al., ’12), Double-Bit quantisation (DBQ) (Kong and Li, ’12).
• Hamming Ranking: how well do we retrieve −NN of queries?
Quantify using area under the precision-recall curve (AUPRC).
RESULTS
• AUPRC across different projection methods at 32 bits:
Dataset LabelMe CIFAR TinyImages
SBQ MQ DBQ NPQ SBQ MQ DBQ NPQ SBQ MQ DBQ NPQ
ITQ 0.277 0.354 0.308 0.408 0.272 0.235 0.222 0.407 0.494 0.428 0.410 0.660
SIKH 0.049 0.072 0.077 0.107 0.042 0.063 0.047 0.090 0.135 0.221 0.182 0.365
LSH 0.156 0.138 0.123 0.184 0.119 0.093 0.066 0.153 0.361 0.340 0.285 0.464
SH 0.080 0.221 0.182 0.250 0.051 0.135 0.111 0.167 0.117 0.237 0.136 0.356
PCAH 0.050 0.191 0.156 0.220 0.036 0.137 0.107 0.153 0.046 0.257 0.295 0.312
• NPQ can quantise a wide range of projection functions.
• NPQ + cheap projection (e.g. LSH) can outperform SBQ + expensive
projection (e.g. PCA). NPQ is faster for N < data dimensionality.
• AUPRC vs. Number of bits for LabelMe, CIFAR and TinyImages:
(a) LabelMe (b) CIFAR-10 (c) TinyImages
• NPQ is an effective quantisation strategy across a wide bit range.
FUTURE WORK
• Variable bits per hyperplane: refer to our recent ACL’13 paper.
• Evaluation of NPQ in a hash lookup based retrieval scenario.

More Related Content

PDF
Graph Regularised Hashing
PDF
LHCb Computing Workshop 2018: PV finding with CNNs
PDF
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
PPTX
K10692 control theory
PDF
ICMR 2014 - Sparse Kernel Learning Poster
PPTX
Thesis Presentation
PPT
An Overview of HDF-EOS (Part 1)
PPTX
How GPS works
Graph Regularised Hashing
LHCb Computing Workshop 2018: PV finding with CNNs
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
K10692 control theory
ICMR 2014 - Sparse Kernel Learning Poster
Thesis Presentation
An Overview of HDF-EOS (Part 1)
How GPS works

What's hot (20)

PPSX
Advancements in-tiled-rendering
PPT
GPS [ Global Positioning System ]
PDF
Alpha Go: in few slides
PPTX
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
PDF
An Analysis of Convolution for Inference
PDF
Visual Impression Localization of Autonomous Robots_#CASE2015
PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PDF
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
PDF
High-Performance GPU Programming for Deep Learning
PPTX
Arthur weglein
PDF
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
PPTX
Real-Time Visual Simulation of Smoke
PDF
ES_SAA_OG_PF_ECCTD_Pos
PPT
Paris Master Class 2011 - 02 Screen Space Material System
PPTX
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
PDF
New Frontiers in LAE Science with MUSE
PDF
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
PDF
Maximizing Submodular Function over the Integer Lattice
PDF
ACAT 2019: A hybrid deep learning approach to vertexing
PPTX
BLE Localiser for iOS Conf SG 2017
Advancements in-tiled-rendering
GPS [ Global Positioning System ]
Alpha Go: in few slides
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
An Analysis of Convolution for Inference
Visual Impression Localization of Autonomous Robots_#CASE2015
30th コンピュータビジョン勉強会@関東 DynamicFusion
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
High-Performance GPU Programming for Deep Learning
Arthur weglein
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Real-Time Visual Simulation of Smoke
ES_SAA_OG_PF_ECCTD_Pos
Paris Master Class 2011 - 02 Screen Space Material System
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
New Frontiers in LAE Science with MUSE
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
Maximizing Submodular Function over the Integer Lattice
ACAT 2019: A hybrid deep learning approach to vertexing
BLE Localiser for iOS Conf SG 2017
Ad

Similar to Neighbourhood Preserving Quantisation for LSH SIGIR Poster (20)

PDF
Data Science Research Day (Talk)
PDF
K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
PDF
Lec 5-nn-slides
PDF
large_scale_search.pdf
PDF
5 efficient-matching.ppt
PPTX
Locality sensitive hashing
PDF
Efficient nearest neighbors search for large scale
PDF
Landmark Retrieval & Recognition
PPTX
PDF
PhD thesis defence slides
PDF
Building graphs to discover information by David Martínez at Big Data Spain 2015
PDF
Scribed lec8
PPTX
Deep Residual Hashing Neural Network for Image Retrieval
PDF
Local sensitive hashing &amp; minhash on facebook friend
PDF
Graph Regularised Hashing (ECIR'15 Talk)
PDF
Exploit every bit effective caching for high dimensional nearest neighbor search
PDF
Probabilistic data structures. Part 4. Similarity
PDF
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
PDF
Locality sensitive hashing
PDF
Locality Sensitive Hashing
Data Science Research Day (Talk)
K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
Lec 5-nn-slides
large_scale_search.pdf
5 efficient-matching.ppt
Locality sensitive hashing
Efficient nearest neighbors search for large scale
Landmark Retrieval & Recognition
PhD thesis defence slides
Building graphs to discover information by David Martínez at Big Data Spain 2015
Scribed lec8
Deep Residual Hashing Neural Network for Image Retrieval
Local sensitive hashing &amp; minhash on facebook friend
Graph Regularised Hashing (ECIR'15 Talk)
Exploit every bit effective caching for high dimensional nearest neighbor search
Probabilistic data structures. Part 4. Similarity
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Locality sensitive hashing
Locality Sensitive Hashing
Ad

Recently uploaded (20)

PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPT
Geologic Time for studying geology for geologist
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Five Habits of High-Impact Board Members
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Hybrid model detection and classification of lung cancer
PDF
1 - Historical Antecedents, Social Consideration.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Geologic Time for studying geology for geologist
A review of recent deep learning applications in wood surface defect identifi...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Getting started with AI Agents and Multi-Agent Systems
Hindi spoken digit analysis for native and non-native speakers
Group 1 Presentation -Planning and Decision Making .pptx
Benefits of Physical activity for teenagers.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
Five Habits of High-Impact Board Members
Taming the Chaos: How to Turn Unstructured Data into Decisions
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
O2C Customer Invoices to Receipt V15A.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Zenith AI: Advanced Artificial Intelligence
Enhancing emotion recognition model for a student engagement use case through...
Hybrid model detection and classification of lung cancer
1 - Historical Antecedents, Social Consideration.pdf

Neighbourhood Preserving Quantisation for LSH SIGIR Poster

  • 1. NEIGHBOURHOOD PRESERVING QUANTISATION FOR LSH SEAN MORAN†, VICTOR LAVRENKO, MILES OSBORNE † SEAN.MORAN@ED.AC.UK RESEARCH QUESTION • LSH uses 1 bit per hyperplane. Can we do better with multiple bits? INTRODUCTION • Problem: Fast Nearest Neighbour (NN) search in large datasets. • Hashing-based approach: – Generate a similarity preserving binary code (fingerprint). – Use fingerprint as index into the buckets of a hash table. – If collision occurs only compare to items in the same bucket. 110101 010111 111101 H H Query Compute Similarity Nearest Neighbours Query Database Hash Table Content Based IR Image: Imense Ltd Image: Doersch et al. Image: Xu et al. Location Recognition Near duplicate detection 010101 111101 ..... • Advantages: – Constant query time (with respect to the database size). – Compact binary codes are extremely storage efficient. LOCALITY SENSITIVE HASHING (LSH) (INDYK AND MOTWANI, ’98) • Randomized algorithm for approximate nearest neighbour (ANN) search using binary codes. • Probabilistic guarantee on retrieval accuracy versus search time. • LSH for inner product similarity: – Divide input space with L random hyperplanes (e.g. L=2): x y n2 n1 h1 h2 11 0100 10 – Projection: Take dot product of data-point (x) with normal (n.x): n2 n1 – Quantisation: Threshold and apply sign function: 0 1 t n2 • Vanilla LSH: each hyperplane gives 1 bit of the binary code. NEIGHBOURHOOD PRESERVING QUANTISATION (NPQ) • Assigns multiple bits per hyperplane using multiple thresholds. • F1 optimisation using pairwise constraints matrix S: if Sij = 1 then points xi, xj with projections yi, yj are true nearest neighbours. • TP: # Sij = 1 pairs in same region. FP: # Sij = 0 pairs in same region. FN: # Sij = 1 pairs in different regions. Combine TP, FP, FN using F1: F1 score: 1.00 00 01 10 11 t1 t2 t3 n2 F1 score: 0.44 00 01 10 11 t1 t2 t3 n2 • Interpolate F1 with a regularisation term Ω(T1:u): Znpq = αF1 + (1 − α)(1 − Ω(T1:u)) with: Ω(T1:u) = 1 σ uX a=0 X i:yi∈ra {yi − µra }2 where: σ = nX i=1 {yi − µd}2 , µd is dimension mean, µra is mean of region ra .• Random restarts used to optimise Znpq. Time complexity ∼ O(N2 ), where N is # data points in training dataset. EVALUATION PROTOCOL • Task: Image retrieval on three image datasets: 22k LabelMe, CIFAR- 10 and 100k TinyImages. Images encoded with GIST descriptors. • Projections: LSH, Shift-invariant kernel hashing (SIKH), Itera- tive Quantisation (ITQ), Spectral Hashing (SH) and PCA-Hashing (PCAH). • Baselines: Single Bit Quantisation (SBQ), Manhattan Hashing (MQ) (Kong et al., ’12), Double-Bit quantisation (DBQ) (Kong and Li, ’12). • Hamming Ranking: how well do we retrieve −NN of queries? Quantify using area under the precision-recall curve (AUPRC). RESULTS • AUPRC across different projection methods at 32 bits: Dataset LabelMe CIFAR TinyImages SBQ MQ DBQ NPQ SBQ MQ DBQ NPQ SBQ MQ DBQ NPQ ITQ 0.277 0.354 0.308 0.408 0.272 0.235 0.222 0.407 0.494 0.428 0.410 0.660 SIKH 0.049 0.072 0.077 0.107 0.042 0.063 0.047 0.090 0.135 0.221 0.182 0.365 LSH 0.156 0.138 0.123 0.184 0.119 0.093 0.066 0.153 0.361 0.340 0.285 0.464 SH 0.080 0.221 0.182 0.250 0.051 0.135 0.111 0.167 0.117 0.237 0.136 0.356 PCAH 0.050 0.191 0.156 0.220 0.036 0.137 0.107 0.153 0.046 0.257 0.295 0.312 • NPQ can quantise a wide range of projection functions. • NPQ + cheap projection (e.g. LSH) can outperform SBQ + expensive projection (e.g. PCA). NPQ is faster for N < data dimensionality. • AUPRC vs. Number of bits for LabelMe, CIFAR and TinyImages: (a) LabelMe (b) CIFAR-10 (c) TinyImages • NPQ is an effective quantisation strategy across a wide bit range. FUTURE WORK • Variable bits per hyperplane: refer to our recent ACL’13 paper. • Evaluation of NPQ in a hash lookup based retrieval scenario.