Learning to Project and Binarise for Hashing-based Approximate Nearest Neighbour Search

LEARNING TO PROJECT AND BINARISE FOR HASHING-BASED
APPROXIMATE NEAREST NEIGHBOUR SEARCH
SEAN MORAN†
† SEAN.MORAN@ED.AC.UK
LEARNING THE HASHING THRESHOLDS AND HYPERPLANES
• Research Question: Can learning the hashing quantisation thresh-
olds and hyperplanes lead to greater retrieval effectiveness than
learning either in isolation?
• Approach: most hashing methods consist of two steps: data-point
projection (hyperplane learning) followed by quantisation of those
projections into binary. We show the benefits of explicitly learning
the hyperplanes and thresholds based on the data.
HASHING-BASED APPROXIMATE NEAREST NEIGHBOUR SEARCH
• Problem: Nearest Neighbour (NN) search in image datasets.
• Hashing-based approach:
– Generate a similarity preserving binary hashcode for query.
– Use the fingerprint as index into the buckets of a hash table.
– If collision occurs only compare to items in the same bucket.
110101
010111
111101
H
H
Content Based IR
Image: Imense Ltd
Image: Doersch et al.
Image: Xu et al.
Location Recognition
Near duplicate detection
010101
111101
.....
Query
Database
Query
Nearest
Neighbours
Hashtable
Compute
Similarity
• Hashtable buckets are the polytopes formed by intersecting hyper-
planes in the image descriptor space. Thresholds partition each
bucket into sub-regions, each with a unique hashcode. We want
related images to fall within the same sub-region of a bucket.
• This work: learn hyperplanes and thresholds to encourage colli-
sions between semantically related images.
PART 1: SUPERVISED DATA-SPACE PARTITIONING
• Step A: Use LSH [1] to initialise hashcode bits B ∈ {−1, 1}
Ntrd×K
Ntrd: # training data-points, K: # bits
• Repeat for M iterations:
– Step B: Graph regularisation, update the bits of each data-point
to be the average of its nearest neighbours
B ← sgn α SD−1
B + (1−α) B
∗ S ∈ {0, 1}
Ntrd×Ntrd
: adjacency matrix, D ∈ ZNtrd×Ntrd
+ di-
agonal degree matrix, B ∈ {−1, 1}
Ntrd×K
: bits, α ∈ [0, 1]:
interpolation parameter, sgn: sign function
– Step C: Data-space partitioning, learn hyperplanes that predict
the K bits with maximum margin
for k = 1. . .K : min ||wk||2
+ C
Ntrd
i=1 ξik
s.t. Bik(wkxi) ≥ 1 − ξik for i = 1. . .Ntrd
∗ wk ∈ D
: hyperplane, xi: image descriptor, Bik:
bit k for data-point xi, ξik: slack variable
• Use the learnt hyperplanes wk ∈ D K
k=1
to generate K projected
dimensions: yk
∈ Ntrd
K
k=1
for quantisation.
PART 2: SUPERVISED QUANTISATION THRESHOLD LEARNING
• Thresholds tk = [tk1, tk2, . . . , tkT ] are learnt to quantise projected
dimension yk
, where T ∈ [1, 3, 7, 15] is the number of thresholds.
• We formulate an F1-measure objective function that seeks a quan-
tisation respecting the contraints in S. Define Pk
∈ {0, 1}
Ntrd×Ntrd
:
Pk
ij =
1, if ∃γ s.t. tkγ ≤ (yk
i , yk
j ) < tk(γ+1)
0, otherwise.
• γ ∈ Z ∈ 0 ≤ γ ≤ T. Pk
indicates whether or not the projec-
tions (yk
i , yk
j ) fall within the same thresholded region. The algorithm
counts true positives (TP), false negatives (FN) and false positives (FP):
TP =
1
2
P ◦ S 1 FN =
1
2
S 1 − TP FP =
1
2
P 1 − TP
• ◦ is the Hadamard product, . 1 is the L1 matrix norm. TP is the
number of +ve pairs in the same thresholded region, FP is the −ve
pairs, and FN are the +ve pairs in different regions. Counts com-
bined using F1-measure optimised by Evolutionary Algorithms [3]:
F1(tk) =
2 P ◦ S 1
S 1 + P 1
ILLUSTRATING THE KEY ALGORITHMIC STEPS
y
x
1 -1
1 1
1 -1
1 -1
1 -1
-1 1
-1 1
-1 1
(a) Initialisation
-1 1
-1 1
1 1
-1 1
1 -1
1 -1
-1 1
-1 1
-1 1
-1 11 -1
-1 -1
-1 -1
1 -1
1 -1 1 -1
(b) Regularisation
y
w2
w1
h1
h2
x
-1 -11 -1
-1 11 1
(c) Partitioning
w1Projected Dimension #1
F-measure: 0.67
t1
1-1
Projected Dimension #1
F-measure: 1.00
w1
t1
1-1
a
a
(d) Quantisation
EXPERIMENTAL RESULTS (HAMMING RANKING AUPRC)
• Retrieval evaluation on CIFAR-10. Baselines: single static threshold
(SBQ) [1], multiple threshold optimisation (NPQ) [3], supervised
projection (GRH) [2], and variable threshold learning (VBQ) [4].
• LSH [1], PCA, SKLSH [5], SH [6] are used to initialise bits in B
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
16 32 48 64 80 96 112 128
TestAUPRC
# Bits
LSH+GRH+NPQ
LSH+GRH+SBQ
LSH+NPQ
CIFAR-10 (AUPRC)
SBQ NPQ GRH VBQ GRH+NPQ
LSH 0.0954 0.1621 0.2023 0.2035 0.2593
SH 0.0626 0.1834 0.2147 0.2380 0.2958
PCA 0.0387 0.1660 0.2186 0.2579 0.2791
SKLSH 0.0513 0.1063 0.1652 0.2122 0.2566
• Learning hyperplanes and thresholds (GRH+NPQ) most effective.
CONCLUSIONS AND REFERENCES
• Hashing model that learns the hyperplanes and thresholds. Found
to have highest retrieval effectiveness versus competing models.
• Future work: closer integration of both steps in a unified objective.
References: [1] Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC
(1998). [2] Moran S., and Lavrenko, V.: Graph Regularised Hashing. In Proc. ECIR, 2015. [3] Moran S., and Lavrenko, V., Osborne,
M. Neighbourhood Preserving Quantisation for LSH. In Proc. SIGIR, 2013. [4] Moran S., and Lavrenko, V., Osborne, M. Variable Bit
Quantisation for LSH. In Proc. ACL, 2013. [5] Raginsky M., Lazebnik, S. Locality-sensitive binary codes from shift-invariant kernels. In
Proc. NIPS, 2009. [6] Weiss Y., Torralba, A. and Fergus, R. Spectral Hashing. In Proc. NIPS, 2008.

Learning to Project and Binarise for Hashing-based Approximate Nearest Neighbour Search

More Related Content

What's hot (19)

Similar to Learning to Project and Binarise for Hashing-based Approximate Nearest Neighbour Search (20)

Recently uploaded (20)

Learning to Project and Binarise for Hashing-based Approximate Nearest Neighbour Search