SlideShare a Scribd company logo
[Summery of] Robust and Effective Metric Learning Using
Capped Trace Norm [1] (KDD 2016)
shiba44
Outline
Introduction
Background
Proposed Method
Experiment
Conclusion
0 / 14
Introduction
Metric Learning aims automatically learning from train data.
similar
dissimilar
Example: Learning distance of image
They introduce low-rank regularization for Mahalanobis distance metric learning.
dM (xi, xj)
√
(xi − xj)⊤M(xi − xj)
1 / 14
Background:Weak Constraints
Pairwise
similar
dissimilar
Tripletwise
similar
dissimilar
Quadrupletwise
similar
dissimilar
Pairwise: dM (xi, xj) is smaller, dM (xi′, xj′) is bigger.
Tripletwise: dM (xi, xj) is smaller than dM (xj, xk).
Quadrupletwise: dM (xi, xj) is smaller than dM (xk, xl).
2 / 14
Background:low-rank and existing method
M is low-rank → Mahalanobis distance is defined in low dim space.
Rank minimization is NP Hard [2] ⇒ Use approximation of that.
Trace Norm regularization
▶ Minimizes sum of all singular values σ(M).
Reg(M)
∑
s
σs(M).
Changing of one large singular value affects overall.
Fantope regularization
▶ Minimizes sum of k smallest singular values.
Reg(M)
k∑
s
σs(M).
Be sensitive for hyper-parameter k. 3 / 14
Proposed Method
In proposed method, use Capped Trace Norm regularization.
Capped Trace Norm regularization
Only minimizes the singular values that are smaller than ϵ.
Reg(M)
∑
s
min(σs(M),ϵ).
Reduce the effect of changing large singular value.
Stable for hyper-parameter than Fantope.
4 / 14
Proposed Method
Optimization Problem:
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
δijkl +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
Degree of violation for quadrupletwise constraints
+
γ
2
∑
s
min(σs(M),ϵ)
Regularization term
,where A =
{
(i, j, k,l) : dM (xk, xl) ≥ dM (xi, xj) + δijkl
}
.
⇒ This function is non-convex
5 / 14
Proposed Method:Algorithm
Singular value decomposition of M:
M = UΣU⊤
= · · · us · · ·
...
σs
...
· · · us · · ·
⊤
.
Define D:
D =
1
2
k∑
s=1
σ−1
s us u⊤
s .
, where k is number of singular values that smaller than ϵ.
Transform into this convex optimization by using D.
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
ξ(i,j,k,l) +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
γ
2
Tr(M⊤
DM)
, where D is fixed.
6 / 14
Proposed Method:Algorithm
Proximal Gradient Descent
Key Points
They prove the convergence of our optimization algorithm.
k is hyper-parameter.
▶ ϵ is adaptively determined.
7 / 14
Experiment:Synthetic Data
Data:
1. Make T ∈ Sd
+, where rank(T) is e.
2. Quadrupletwise constraints that are satisfied on Mahalanobis Distance of T, split for
train data A, validation data V, test data T.
Setting:
▶ d = 100
▶ e = 10
▶ |A| = |V| = |T | = 104
▶ γ is tuned in the range of
{
10−2,10−2,1,10,102
}
▶ k is tuned from 5 to 20
Compared Methods:
▶ ML: No regularization
▶ ML+Trace: Trace Norm regularization
▶ ML+Fantope: Fantope regularization
▶ ML+capped: Proposed Method
8 / 14
Experiment:Synthetic Data
Accuracy and rank(M).
Method Accuracy rank(M)
ML 85.62% 53
ML + Trace 88.44% 41
ML + Fantope 95.50% 10
ML + capped 95.43% 10
Table 1: Synthetic experiment results.
⇒ Fantope reg and Capped Norm reg are both better than other method.
9 / 14
Experiment:Synthetic Data
Accuracy on changing hyper-parameter k
Rank k
6 8 10 12 14 16 18 20
Accuracy%
88
89
90
91
92
93
94
95
96
ML+Trace
ML+Fantope
Our method
⇒ Proposed method mostly outperforms Fantope reg.
⇒ Propsoed method performs more stable than Fantope reg.
10 / 14
Experiment:Labeled Faces in The Wild
Task: Deciding if two face images are from the same person.
Data:
▶ 13233 images from 5749 persons.
▶ Use SIFT feature.
Setting:
▶ Use pairwise constraints.
▶ γ is tuned in
{
10−2,10−1,1,10,102
}
▶ k is tuned in {30,35,40,45,50,55,60,65,70}
Compared Methods:
▶ IDENTITY: Euclidean Distance
▶ MAHALANOBIS: Traditional Mahalanobis Distance
▶ KISSME: [3]
▶ ITML: [4]
▶ LDML: [5]
11 / 14
Experiment:Labeled Faces in The Wild
ROC curve
False Positive Rate (FPR)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TruePositiveRate(TPR)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ROC
CAP (0.817)
FANTOPE (0.814)
KISSME (0.806)
ITML (0.794)
LDML (0.797)
IDENTITY (0.675)
MAHAL (0.748)
⇒ Proposed method is better than others.
12 / 14
Experiment:Labeled Faces in The Wild
Accuracy on changing hyper-parameter k.
Rank k
30 35 40 45 50 55 60 65 70
Accuracy%
79.5
80
80.5
81
81.5
82
82.5
ML+Trace
ML+Fantope
Our method
⇒ Proposed method get better results than metric learning with Fantope
regularization.
13 / 14
Conclusion
They proposed a novel low-rank regularization, Capped Trace Norm regularization.
Proposed algorithm for optimization problem and prove convergence of that.
Experimental results show that our method outperforms the state-of-the-art
metric learning methods.
14 / 14
Reference I
Zhouyuan Huo, Feiping Nie, and Heng Huang.
Robust and effective metric learning using capped trace norm: Metric learning via
capped trace norm.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1605–1614. ACM, 2016.
Aur´elien Bellet, Amaury Habrard, and Marc Sebban.
A survey on metric learning for feature vectors and structured data.
arXiv preprint arXiv:1306.6709, 2013.
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst
Bischof.
Large scale metric learning from equivalence constraints.
In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.
2288–2295. IEEE, 2012.
14 / 14
Reference II
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon.
Information-theoretic metric learning.
In Proceedings of the 24th international conference on Machine learning, pp.
209–216. ACM, 2007.
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid.
Is that you? metric learning approaches for face identification.
In 2009 IEEE 12th International Conference on Computer Vision, pp. 498–505.
IEEE, 2009.
14 / 14

More Related Content

PDF
Overview on Optimization algorithms in Deep Learning
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Rank awarealgs small11
PDF
Closeszdfgjklfghjt.string
PDF
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
PDF
MLHEP 2015: Introductory Lecture #1
PDF
Classification and regression based on derivatives: a consistency result for ...
PDF
Andres hernandez ai_machine_learning_london_nov2017
Overview on Optimization algorithms in Deep Learning
Maximum likelihood estimation of regularisation parameters in inverse problem...
Rank awarealgs small11
Closeszdfgjklfghjt.string
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
MLHEP 2015: Introductory Lecture #1
Classification and regression based on derivatives: a consistency result for ...
Andres hernandez ai_machine_learning_london_nov2017

What's hot (19)

PDF
Simplified Runtime Analysis of Estimation of Distribution Algorithms
PDF
Estimating Future Initial Margin with Machine Learning
PDF
MLHEP 2015: Introductory Lecture #3
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
MLHEP 2015: Introductory Lecture #4
PDF
From L to N: Nonlinear Predictors in Generalized Models
PDF
MLHEP Lectures - day 1, basic track
PDF
Multiplicative Interaction Models in R
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PDF
MLHEP Lectures - day 2, basic track
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Approximate Bayesian Computation with Quasi-Likelihoods
PDF
18.1 combining models
PDF
MLHEP 2015: Introductory Lecture #2
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Bayesian hybrid variable selection under generalized linear models
PDF
short course at CIRM, Bayesian Masterclass, October 2018
PDF
Distributed ADMM
PDF
Kriging and spatial design accelerated by orders of magnitude
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Estimating Future Initial Margin with Machine Learning
MLHEP 2015: Introductory Lecture #3
Continuous and Discrete-Time Analysis of SGD
MLHEP 2015: Introductory Lecture #4
From L to N: Nonlinear Predictors in Generalized Models
MLHEP Lectures - day 1, basic track
Multiplicative Interaction Models in R
Coordinate sampler: A non-reversible Gibbs-like sampler
MLHEP Lectures - day 2, basic track
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Approximate Bayesian Computation with Quasi-Likelihoods
18.1 combining models
MLHEP 2015: Introductory Lecture #2
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Bayesian hybrid variable selection under generalized linear models
short course at CIRM, Bayesian Masterclass, October 2018
Distributed ADMM
Kriging and spatial design accelerated by orders of magnitude
Ad

Similar to Summery of Robust and Effective Metric Learning Using Capped Trace Norm (20)

PDF
Unbiased Bayes for Big Data
PDF
Bayesian Deep Learning
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Performance of Matching Algorithmsfor Signal Approximation
PDF
Introduction to Reinforcement Learning for Molecular Design
PPTX
Monte Carlo Berkeley.pptx
PDF
PDF
MAPE regression, seminar @ QUT (Brisbane)
PDF
2014-mo444-practical-assignment-01-paulo_faria
PDF
the ABC of ABC
PDF
lecture 5 about bayesian econometrics and statistics
PDF
Sampling based approximation of confidence intervals for functions of genetic...
PDF
Monte Carlo Statistical Methods
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
PDF
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Workshop in honour of Don Poskitt and Gael Martin
PDF
On learning statistical mixtures maximizing the complete likelihood
PDF
Gtti 10032021
PDF
New data structures and algorithms for \\post-processing large data sets and ...
Unbiased Bayes for Big Data
Bayesian Deep Learning
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Performance of Matching Algorithmsfor Signal Approximation
Introduction to Reinforcement Learning for Molecular Design
Monte Carlo Berkeley.pptx
MAPE regression, seminar @ QUT (Brisbane)
2014-mo444-practical-assignment-01-paulo_faria
the ABC of ABC
lecture 5 about bayesian econometrics and statistics
Sampling based approximation of confidence intervals for functions of genetic...
Monte Carlo Statistical Methods
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Workshop in honour of Don Poskitt and Gael Martin
On learning statistical mixtures maximizing the complete likelihood
Gtti 10032021
New data structures and algorithms for \\post-processing large data sets and ...
Ad

Recently uploaded (20)

PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
The Land of Punt — A research by Dhani Irwanto
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
C1 cut-Methane and it's Derivatives.pptx
PPTX
Pharmacology of Autonomic nervous system
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Application of enzymes in medicine (2).pptx
lecture 2026 of Sjogren's syndrome l .pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
The Land of Punt — A research by Dhani Irwanto
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Phytochemical Investigation of Miliusa longipes.pdf
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Biophysics 2.pdffffffffffffffffffffffffff
Biomechanics of the Hip - Basic Science.pptx
Hypertension_Training_materials_English_2024[1] (1).pptx
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
CORDINATION COMPOUND AND ITS APPLICATIONS
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
C1 cut-Methane and it's Derivatives.pptx
Pharmacology of Autonomic nervous system
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Application of enzymes in medicine (2).pptx

Summery of Robust and Effective Metric Learning Using Capped Trace Norm

  • 1. [Summery of] Robust and Effective Metric Learning Using Capped Trace Norm [1] (KDD 2016) shiba44
  • 3. Introduction Metric Learning aims automatically learning from train data. similar dissimilar Example: Learning distance of image They introduce low-rank regularization for Mahalanobis distance metric learning. dM (xi, xj) √ (xi − xj)⊤M(xi − xj) 1 / 14
  • 4. Background:Weak Constraints Pairwise similar dissimilar Tripletwise similar dissimilar Quadrupletwise similar dissimilar Pairwise: dM (xi, xj) is smaller, dM (xi′, xj′) is bigger. Tripletwise: dM (xi, xj) is smaller than dM (xj, xk). Quadrupletwise: dM (xi, xj) is smaller than dM (xk, xl). 2 / 14
  • 5. Background:low-rank and existing method M is low-rank → Mahalanobis distance is defined in low dim space. Rank minimization is NP Hard [2] ⇒ Use approximation of that. Trace Norm regularization ▶ Minimizes sum of all singular values σ(M). Reg(M) ∑ s σs(M). Changing of one large singular value affects overall. Fantope regularization ▶ Minimizes sum of k smallest singular values. Reg(M) k∑ s σs(M). Be sensitive for hyper-parameter k. 3 / 14
  • 6. Proposed Method In proposed method, use Capped Trace Norm regularization. Capped Trace Norm regularization Only minimizes the singular values that are smaller than ϵ. Reg(M) ∑ s min(σs(M),ϵ). Reduce the effect of changing large singular value. Stable for hyper-parameter than Fantope. 4 / 14
  • 7. Proposed Method Optimization Problem: min M ∈Sd + ∑ (i,j,k,l)∈A [ δijkl + ⟨ M, xij x⊤ ij − xkl x⊤ kl ⟩] + Degree of violation for quadrupletwise constraints + γ 2 ∑ s min(σs(M),ϵ) Regularization term ,where A = { (i, j, k,l) : dM (xk, xl) ≥ dM (xi, xj) + δijkl } . ⇒ This function is non-convex 5 / 14
  • 8. Proposed Method:Algorithm Singular value decomposition of M: M = UΣU⊤ = · · · us · · · ... σs ... · · · us · · · ⊤ . Define D: D = 1 2 k∑ s=1 σ−1 s us u⊤ s . , where k is number of singular values that smaller than ϵ. Transform into this convex optimization by using D. min M ∈Sd + ∑ (i,j,k,l)∈A [ ξ(i,j,k,l) + ⟨ M, xij x⊤ ij − xkl x⊤ kl ⟩] + γ 2 Tr(M⊤ DM) , where D is fixed. 6 / 14
  • 9. Proposed Method:Algorithm Proximal Gradient Descent Key Points They prove the convergence of our optimization algorithm. k is hyper-parameter. ▶ ϵ is adaptively determined. 7 / 14
  • 10. Experiment:Synthetic Data Data: 1. Make T ∈ Sd +, where rank(T) is e. 2. Quadrupletwise constraints that are satisfied on Mahalanobis Distance of T, split for train data A, validation data V, test data T. Setting: ▶ d = 100 ▶ e = 10 ▶ |A| = |V| = |T | = 104 ▶ γ is tuned in the range of { 10−2,10−2,1,10,102 } ▶ k is tuned from 5 to 20 Compared Methods: ▶ ML: No regularization ▶ ML+Trace: Trace Norm regularization ▶ ML+Fantope: Fantope regularization ▶ ML+capped: Proposed Method 8 / 14
  • 11. Experiment:Synthetic Data Accuracy and rank(M). Method Accuracy rank(M) ML 85.62% 53 ML + Trace 88.44% 41 ML + Fantope 95.50% 10 ML + capped 95.43% 10 Table 1: Synthetic experiment results. ⇒ Fantope reg and Capped Norm reg are both better than other method. 9 / 14
  • 12. Experiment:Synthetic Data Accuracy on changing hyper-parameter k Rank k 6 8 10 12 14 16 18 20 Accuracy% 88 89 90 91 92 93 94 95 96 ML+Trace ML+Fantope Our method ⇒ Proposed method mostly outperforms Fantope reg. ⇒ Propsoed method performs more stable than Fantope reg. 10 / 14
  • 13. Experiment:Labeled Faces in The Wild Task: Deciding if two face images are from the same person. Data: ▶ 13233 images from 5749 persons. ▶ Use SIFT feature. Setting: ▶ Use pairwise constraints. ▶ γ is tuned in { 10−2,10−1,1,10,102 } ▶ k is tuned in {30,35,40,45,50,55,60,65,70} Compared Methods: ▶ IDENTITY: Euclidean Distance ▶ MAHALANOBIS: Traditional Mahalanobis Distance ▶ KISSME: [3] ▶ ITML: [4] ▶ LDML: [5] 11 / 14
  • 14. Experiment:Labeled Faces in The Wild ROC curve False Positive Rate (FPR) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TruePositiveRate(TPR) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ROC CAP (0.817) FANTOPE (0.814) KISSME (0.806) ITML (0.794) LDML (0.797) IDENTITY (0.675) MAHAL (0.748) ⇒ Proposed method is better than others. 12 / 14
  • 15. Experiment:Labeled Faces in The Wild Accuracy on changing hyper-parameter k. Rank k 30 35 40 45 50 55 60 65 70 Accuracy% 79.5 80 80.5 81 81.5 82 82.5 ML+Trace ML+Fantope Our method ⇒ Proposed method get better results than metric learning with Fantope regularization. 13 / 14
  • 16. Conclusion They proposed a novel low-rank regularization, Capped Trace Norm regularization. Proposed algorithm for optimization problem and prove convergence of that. Experimental results show that our method outperforms the state-of-the-art metric learning methods. 14 / 14
  • 17. Reference I Zhouyuan Huo, Feiping Nie, and Heng Huang. Robust and effective metric learning using capped trace norm: Metric learning via capped trace norm. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1605–1614. ACM, 2016. Aur´elien Bellet, Amaury Habrard, and Marc Sebban. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709, 2013. Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. Large scale metric learning from equivalence constraints. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295. IEEE, 2012. 14 / 14
  • 18. Reference II Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. Information-theoretic metric learning. In Proceedings of the 24th international conference on Machine learning, pp. 209–216. ACM, 2007. Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. Is that you? metric learning approaches for face identification. In 2009 IEEE 12th International Conference on Computer Vision, pp. 498–505. IEEE, 2009. 14 / 14