Summery of Robust and Effective Metric Learning Using Capped Trace Norm

[Summery of] Robust and Eﬀective Metric Learning Using
Capped Trace Norm [1] (KDD 2016)
shiba44

Outline
Introduction
Background
Proposed Method
Experiment
Conclusion
0 / 14

Introduction
Metric Learning aims automatically learning from train data.
similar
dissimilar
Example: Learning distance of image
They introduce low-rank regularization for Mahalanobis distance metric learning.
dM (xi, xj)
√
(xi − xj)⊤M(xi − xj)
1 / 14

Background:Weak Constraints
Pairwise
similar
dissimilar
Tripletwise
similar
dissimilar
Quadrupletwise
similar
dissimilar
Pairwise: dM (xi, xj) is smaller, dM (xi′, xj′) is bigger.
Tripletwise: dM (xi, xj) is smaller than dM (xj, xk).
Quadrupletwise: dM (xi, xj) is smaller than dM (xk, xl).
2 / 14

Background:low-rank and existing method
M is low-rank → Mahalanobis distance is deﬁned in low dim space.
Rank minimization is NP Hard [2] ⇒ Use approximation of that.
Trace Norm regularization
▶ Minimizes sum of all singular values σ(M).
Reg(M)
∑
s
σs(M).
Changing of one large singular value aﬀects overall.
Fantope regularization
▶ Minimizes sum of k smallest singular values.
Reg(M)
k∑
s
σs(M).
Be sensitive for hyper-parameter k. 3 / 14

Proposed Method
In proposed method, use Capped Trace Norm regularization.
Capped Trace Norm regularization
Only minimizes the singular values that are smaller than ϵ.
Reg(M)
∑
s
min(σs(M),ϵ).
Reduce the eﬀect of changing large singular value.
Stable for hyper-parameter than Fantope.
4 / 14

Proposed Method
Optimization Problem:
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
δijkl +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
Degree of violation for quadrupletwise constraints
+
γ
2
∑
s
min(σs(M),ϵ)
Regularization term
,where A =
{
(i, j, k,l) : dM (xk, xl) ≥ dM (xi, xj) + δijkl
}
.
⇒ This function is non-convex
5 / 14

Proposed Method:Algorithm
Singular value decomposition of M:
M = UΣU⊤
= · · · us · · ·
...
σs
...
· · · us · · ·
⊤
.
Deﬁne D:
D =
1
2
k∑
s=1
σ−1
s us u⊤
s .
, where k is number of singular values that smaller than ϵ.
Transform into this convex optimization by using D.
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
ξ(i,j,k,l) +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
γ
2
Tr(M⊤
DM)
, where D is ﬁxed.
6 / 14

Proposed Method:Algorithm
Proximal Gradient Descent
Key Points
They prove the convergence of our optimization algorithm.
k is hyper-parameter.
▶ ϵ is adaptively determined.
7 / 14

Experiment:Synthetic Data
Data:
1. Make T ∈ Sd
+, where rank(T) is e.
2. Quadrupletwise constraints that are satisﬁed on Mahalanobis Distance of T, split for
train data A, validation data V, test data T.
Setting:
▶ d = 100
▶ e = 10
▶ |A| = |V| = |T | = 104
▶ γ is tuned in the range of
{
10−2,10−2,1,10,102
}
▶ k is tuned from 5 to 20
Compared Methods:
▶ ML: No regularization
▶ ML+Trace: Trace Norm regularization
▶ ML+Fantope: Fantope regularization
▶ ML+capped: Proposed Method
8 / 14

Accuracy and rank(M).
Method Accuracy rank(M)
ML 85.62% 53
ML + Trace 88.44% 41
ML + Fantope 95.50% 10
ML + capped 95.43% 10
Table 1: Synthetic experiment results.
⇒ Fantope reg and Capped Norm reg are both better than other method.
9 / 14

Accuracy on changing hyper-parameter k
Rank k
6 8 10 12 14 16 18 20
Accuracy%
88
89
90
91
92
93
94
95
96
ML+Trace
ML+Fantope
Our method
⇒ Proposed method mostly outperforms Fantope reg.
⇒ Propsoed method performs more stable than Fantope reg.
10 / 14

Experiment:Labeled Faces in The Wild
Task: Deciding if two face images are from the same person.
Data:
▶ 13233 images from 5749 persons.
▶ Use SIFT feature.
Setting:
▶ Use pairwise constraints.
▶ γ is tuned in
{
10−2,10−1,1,10,102
}
▶ k is tuned in {30,35,40,45,50,55,60,65,70}
Compared Methods:
▶ IDENTITY: Euclidean Distance
▶ MAHALANOBIS: Traditional Mahalanobis Distance
▶ KISSME: [3]
▶ ITML: [4]
▶ LDML: [5]
11 / 14

ROC curve
False Positive Rate (FPR)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TruePositiveRate(TPR)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ROC
CAP (0.817)
FANTOPE (0.814)
KISSME (0.806)
ITML (0.794)
LDML (0.797)
IDENTITY (0.675)
MAHAL (0.748)
⇒ Proposed method is better than others.
12 / 14

Accuracy on changing hyper-parameter k.
Rank k
30 35 40 45 50 55 60 65 70
Accuracy%
79.5
80
80.5
81
81.5
82
82.5
ML+Trace
ML+Fantope
Our method
⇒ Proposed method get better results than metric learning with Fantope
regularization.
13 / 14

Conclusion
They proposed a novel low-rank regularization, Capped Trace Norm regularization.
Proposed algorithm for optimization problem and prove convergence of that.
Experimental results show that our method outperforms the state-of-the-art
metric learning methods.
14 / 14

Reference I
Zhouyuan Huo, Feiping Nie, and Heng Huang.
Robust and eﬀective metric learning using capped trace norm: Metric learning via
capped trace norm.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1605–1614. ACM, 2016.
Aur´elien Bellet, Amaury Habrard, and Marc Sebban.
A survey on metric learning for feature vectors and structured data.
arXiv preprint arXiv:1306.6709, 2013.
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst
Bischof.
Large scale metric learning from equivalence constraints.
In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.
2288–2295. IEEE, 2012.
14 / 14

Reference II
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon.
Information-theoretic metric learning.
In Proceedings of the 24th international conference on Machine learning, pp.
209–216. ACM, 2007.
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid.
Is that you? metric learning approaches for face identiﬁcation.
In 2009 IEEE 12th International Conference on Computer Vision, pp. 498–505.
IEEE, 2009.
14 / 14

Summery of Robust and Effective Metric Learning Using Capped Trace Norm

More Related Content

What's hot (19)

Similar to Summery of Robust and Effective Metric Learning Using Capped Trace Norm (20)

Recently uploaded (20)

Summery of Robust and Effective Metric Learning Using Capped Trace Norm