ICCV2013 reading: Learning to rank using privileged information

ICCV2013 reading
2014.3.28
Akisato Kimura (@_akisato)

Paper to read
(Presented at ICCV2013)

Problem dealing with in this paper
• Learning using privileged information (LUPI)
– Training
• Feature vectors : 𝑋𝑋 = 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 , 𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑
• Label annotation : 𝑌𝑌 = 𝑦𝑦1, … , 𝑦𝑦𝑁𝑁 , 𝑦𝑦𝑖𝑖 ∈ ℕ
• Additional information : 𝑋𝑋∗ = 𝑥𝑥1
∗
, … , 𝑥𝑥𝑁𝑁
∗
, 𝑥𝑥𝑖𝑖
∗
∈ ℝ𝑑𝑑∗
– Testing
• Prediction function : 𝑓𝑓: ℝ𝑑𝑑 → ℕ
• No additional information required

Privileged information??
• Applicable to several scenarios in CV

Formulation
• Generic supervised binary classification
– Training
• Feature vectors : 𝑋𝑋 = 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 , 𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑
• Label annotation : 𝑌𝑌 = 𝑦𝑦1, … , 𝑦𝑦𝑁𝑁 , 𝑦𝑦𝑖𝑖 ∈ {+1, −1}
• Additional information : 𝑋𝑋∗ = 𝑥𝑥1
∗
, … , 𝑥𝑥𝑁𝑁
∗
, 𝑥𝑥𝑖𝑖
∗
∈ ℝ𝑑𝑑∗
– Testing
• Prediction function : 𝑓𝑓: ℝ𝑑𝑑 → ℝ
• No additional information required

Key idea
• Privileged information allow us to distinguish
between easy and hard examples
– If the privileged data is easy to classify, then the
original data would also be easy to classify.
– … under the assumption that the privileged data is
similarly informative about the problem at hand.

Linear SVM
• Ordinary convergence rate = 𝑂𝑂(𝑁𝑁−1/2
)
• It improves to 𝑂𝑂(𝑁𝑁−1
)
– if we knew the optimal slack values 𝜉𝜉𝑖𝑖 in advance
(OracleSVM [Vapnik+ 2009])
min
𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ

Slack variables in SVM
• Slack variables tell us which training examples
are easy / hard to classify
– 𝜉𝜉𝑖𝑖 = 0 → easy
– 𝜉𝜉𝑖𝑖 ≫ 0 → hard
min
𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ

SVM+
• A 1st model for LUPI
– Use privileged data as a proxy to the oracle
– Parameterize 𝜉𝜉𝑖𝑖 = 𝑤𝑤∗, 𝑥𝑥𝑖𝑖
∗
+ 𝑏𝑏∗
[Vapnik+ NN2009, NIPS2010]

Why should SVM+ be improved?
• Cannot be solved by popular SVM packages
– Although good optimization algorithms were
derived [Pechyony+ 2011], they work only with the dual.

Learning to rank setup instead
• Underlying idea is the same
• Using the privileged data to identify easy /
hard-to-separate sample pairs
– Instead of using it to identify easy / hard-to-
classify samples

SVMrank
• Slack variables tell us which training example
pairs are easy / hard / impossible to separate
[Joachims KDD2002]

Proposed method: Rank transfer
• The strategy is similar to SVM+, but indirect.
1. SVMrank on 𝑋𝑋∗ (The ranking function 𝑓𝑓∗)
2. Margins 𝜌𝜌𝑖𝑖𝑖𝑖 = 𝑓𝑓∗
𝑥𝑥𝑖𝑖
∗
− 𝑓𝑓∗
(𝑥𝑥𝑗𝑗
∗
) ∀𝑖𝑖, 𝑗𝑗 𝑦𝑦𝑖𝑖 > 𝑦𝑦𝑗𝑗
• 𝜌𝜌𝑖𝑖𝑖𝑖 ≫ 0 : easy, 𝜌𝜌𝑖𝑖𝑖𝑖 ≈ 0 : hard, 𝜌𝜌𝑖𝑖𝑖𝑖 < 0 : impossible
3. SVMrank on 𝑋𝑋 with data-dependent margins

Intuition
• If it was difficult to correctly rank a pair on 𝑋𝑋∗
,
also it will also be difficult on 𝑋𝑋
1. Pairs (𝑖𝑖, 𝑗𝑗) with small margins 𝜌𝜌𝑖𝑖𝑖𝑖 have more
limited influence on 𝑤𝑤
2. Incorrectly ranked pairs are ignored.
1.
2.

Why not Rank transfer?
• We can use standard SVM packages!
– For the SVMrank on 𝑋𝑋∗ this is clear.
– For the SVMrank on 𝑋𝑋 we need variable
transformations

Experiments
• 4 different types of privileged information
– All of those can be handled in a unified framework.
• 4 different methods to be compared
– SVM, SVMrank, SVM+, Rank transfer
• Evaluation metric = Average Precision

(1) Attributes as privileged info
• Animals with Attributes Dataset
– 10 species ( = classes), 85 properties ( = attributes)
• Features: 2000-dim SURF
• Privileged: 85-dim predicted attributes
[Lampert+ PAMI2014]
• Learn 1-vs-1 classifiers with 100 training
samples

(1) Results
• Rank transfer is the best.

ICCV2013 reading: Learning to rank using privileged information

(2) Bounding box as privileged info
• Fine-grained setup on ILSVRC2012
– 17 classes with variety of snakes
• Features: 4096-dim Fisher vector from the
whole images
• Privileged: 4096-dim Fisher vector from the
bounding box regions
• Learn 1-vs-rest classifiers

(2) Results
• SVM+ is the best, ranking strategies do not
seem suitable for this setup.

(3) Texts as privileged info
• IsraelImages dataset [Bekkerman+ CVPR2007]
– 11 classes, 1800 images with a textual description
up to 18 words
• Features: 4096-dim Fisher vectors
• Privileged: BoWs from the texts
• Learn 1-vs-1 classifiers
Desert Trees

(3) Results
• Reference (privileged only) is the best
• All the others produce almost the same.
– Note that, high accuracy in the privileged space
does not necessarily mean that the privileged
information is helpful for the target task.

(4) Rationales as privileged info
• Hot or Not dataset [Donahue+ ICCV2011]
• Features: 500-dim densely sampled SIFT from
the whole image
• Privileged: 500-dim densely sampled SIFT
from the rationales

(4) Results
• Reference is the best.
• Rank transfer performs better for male class.
• Hard to draw a conclusion.

Appendix: Margin transfer
• One possible alternative to Rank transfer

Last words
• The idea is nice, easy to use.
• More privileged information, better
performance? --- needs discussions
• Which types of privileged information are
suitable? --- unknown

ICCV2013 reading: Learning to rank using privileged information

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to ICCV2013 reading: Learning to rank using privileged information (20)

More from Akisato Kimura (20)

Recently uploaded (20)

ICCV2013 reading: Learning to rank using privileged information