Hierarchical Novelty Detection for Visual Object Recognition

Hierarchical Novelty Detection
for Visual Object Recognition
Kibok Lee*, Kimin Lee†, Kyle Min*,
Yuting Zhang*, Jinwoo Shin†, Honglak Lee*‡
University of Michigan*, KAIST†, Google Brain‡

Why Hierarchical Novelty Detection?
• Conventional novelty detection framework does not provide more
information than “novelty” of an object.
• Suppose we have training data like …
Pomeranian Welsh corgiPersian cat Siamese cat

• Then, suppose we have test images like …
Test image:
True label: Siamese cat Angora cat Dachshund Pika
Prior works: Siamese cat novel novel novel

Ours: Siamese cat novel cat novel dog novel animal
animal
dogcat
Test image:

• Our hierarchical novelty detection framework aims to find the most
specific class label of any data on the hierarchical taxonomy built
with known labels.
Ours: Siamese cat novel cat novel dog novel animal
Test image:

• This framework can be potentially useful for automatically or
interactively organizing a customized taxonomy
• Company’s product catalog
• Wildlife monitoring
• Personal photo library
• by suggesting closest categories for an image from novel
categories.
• new consumer products
• unregistered animal species
• untagged scenes or places

Hierarchical Taxonomy
• Our taxonomy has three types of classes.
• Known leaf classes are seen during training
• Super classes are ancestors of leaf classes, also known
• Novel classes are unseen during training
• Their expected prediction is the closest super class in the taxonomy in our task

Hierarchical Taxonomy
Taxonomy of known classes Set of all known leaf classes
Set of parents Set of novel classes whose closest known class is
Set of children Set of known classes except and its descendants
Set of ancestors including itself:

Approach - Top-down (TD) Method
• Multi-stage classification
• Until arriving at a known leaf class
.2 .8
.7 .1 .2

• Multi-stage classification
• Until arriving at a known leaf class
• Or classification is unconfident (novel)
.5 .5

• Classification rule:
• Classification is confident if

• Training objective:

Approach - Flatten Method
• Represent all probabilities of known leaf and novel classes in a
single vector
• Add virtual novel classes

• Represent all probabilities of known leaf and novel classes in a
single vector
• Add virtual novel classes
• And then flatten the structure

• Classification rule:
• We propose two strategies to train this model.

• Data relabeling
• Fill novel classes by hierarchical relabeling
animal
cat dog
Persian cat Siamese cat Pomeranian Welsh corgi

• Data relabeling
• Fill novel classes by hierarchical relabeling
• Relabeling rate can be chosen by validation
animal
cat dog

• Data relabeling
animal
cat dog

• Leave-one-out (LOO) strategy
• Generate deficient taxonomies
• And then train the model with them
animal
cat dog

• e.g., when
animal
cat dog

• e.g., when
animal
novel animal dog
Pomeranian Welsh corgi

animal
novel animal dog
Pomeranian Welsh corgi

Approach - Combined Method (TD+LOO)
• Compute and enumerate the output of TD
• And then feed it to LOO
• Combined method utilizes their complementary benefits.
• Top-down method leverages hierarchical structure information.
• But it suffers from error aggregation over hierarchy.
• Flatten method avoids error aggregation over hierarchy.
• But it does not leverage hierarchical structure information.

Experiments - Hierarchical Novelty Detection
• Quantitative results
Method
ImageNet AwA2 CUB
Novel AUC Novel AUC Novel AUC
DARTS 10.89 8.83 36.75 35.14 40.42 30.07
Relabel 15.29 11.51 45.71 40.28 38.23 28.75
LOO 15.72 12.00 50.00 43.63 40.78 31.92
TD+LOO 18.78 13.98 53.57 46.77 43.29 33.16

• Compared algorithms
• Baseline: DARTS (Deng et al., 2012)
• Ours: Relabel, LOO, TD+LOO
Method
ImageNet AwA2 CUB
DARTS 10.89 8.83 36.75 35.14 40.42 30.07
Relabel 15.29 11.51 45.71 40.28 38.23 28.75
LOO 15.72 12.00 50.00 43.63 40.78 31.92
TD+LOO 18.78 13.98 53.57 46.77 43.29 33.16
J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. “Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition.” In CVPR, 2012

• Datasets
• ImageNet: 1k known, 16k novel classes
• AwA2: 40 known, 10 novel classes
• CUB: 150 known, 50 novel classes
Method
ImageNet AwA2 CUB
DARTS 10.89 8.83 36.75 35.14 40.42 30.07
Relabel 15.29 11.51 45.71 40.28 38.23 28.75
LOO 15.72 12.00 50.00 43.63 40.78 31.92
TD+LOO 18.78 13.98 53.57 46.77 43.29 33.16

• Metrics
• Novel class accuracy @ known class accuracy = 50%
• By adding an appropriate score bias to all novel classes
• Area under known-novel class accuracy curve
• By varying the novel class score bias
Method
ImageNet AwA2 CUB
DARTS 10.89 8.83 36.75 35.14 40.42 30.07
Relabel 15.29 11.51 45.71 40.28 38.23 28.75
LOO 15.72 12.00 50.00 43.63 40.78 31.92
TD+LOO 18.78 13.98 53.57 46.77 43.29 33.16

• Metrics
• Novel class accuracy @ known class accuracy = 50%
• By adding an appropriate score bias to all novel classes
• Area under known-novel class accuracy curve
• By varying the novel class score bias
(a) ImageNet (b) AwA2 (c) CUB

• Qualitative results
Novel class: American foxhound
GT foxhound

Novel class: American foxhound
Method ε A Word
GT foxhound
DARTS 2 N beagle
Relabel 1 Y hound dog
LOO 0 Y foxhound
TD+LOO 0 Y foxhound

Novel class: serval
Method ε A Word
GT wildcat
DARTS 3 N Egyptian cat
Relabel 2 N domestic cat
LOO 2 Y feline
TD+LOO 1 Y cat

Novel class: song thrush
Method ε A Word
GT thrush
DARTS 3 N hummingbird
Relabel 2 Y bird
LOO 1 Y oscine bird
TD+LOO 0 Y thrush

Novel class: ice-cream sundae
Method ε A Word
GT frozen dessert
DARTS 4 Y food, nutrient
Relabel 1 N ice cream
LOO 1 Y dessert
TD+LOO 0 Y frozen dessert

Experiments - Generalized Zero-Shot Learning
• Quantitative results
Embedding AwA1 AwA2 CUB
Att Word Hier Unseen AUC Unseen AUC Unseen AUC
√ 65.29 50.02 63.87 51.27 50.05 23.60
√ 51.87 39.67 54.77 42.21 27.28 11.47
√ √ 67.80 52.84 65.76 53.18 49.83 24.13
Path 42.57 30.58 44.34 33.44 24.22 8.38
√ Path 67.09 51.45 66.58 53.50 50.25 23.70
√ Path 52.89 40.66 55.28 42.86 27.72 11.65
√ √ Path 68.04 53.21 67.28 54.31 50.87 24.20
TD 33.86 25.56 31.84 24.97 13.09 7.20
√ TD 66.13 54.66 66.86 57.49 50.17 30.31
√ TD 56.14 46.28 59.67 49.39 29.05 16.73
√ √ TD 69.23 57.67 68.80 59.24 50.17 30.31

• Semantic embeddings
• Attributes (numeric attribute values)
• Word vector (similarity among words in real coordinate space)
• Hierarchical embedding
√ 65.29 50.02 63.87 51.27 50.05 23.60
√ 51.87 39.67 54.77 42.21 27.28 11.47
√ √ 67.80 52.84 65.76 53.18 49.83 24.13
Path 42.57 30.58 44.34 33.44 24.22 8.38
√ Path 67.09 51.45 66.58 53.50 50.25 23.70
√ Path 52.89 40.66 55.28 42.86 27.72 11.65
√ √ Path 68.04 53.21 67.28 54.31 50.87 24.20
TD 33.86 25.56 31.84 24.97 13.09 7.20
√ TD 66.13 54.66 66.86 57.49 50.17 30.31
√ TD 56.14 46.28 59.67 49.39 29.05 16.73
√ √ TD 69.23 57.67 68.80 59.24 50.17 30.31

• Compared hierarchical embeddings
• Baseline: Path (Akata et al., 2015)
• Distance between classes on hierarchy
• Ours: Top-down (TD)
• Expected output of our top-down model
√ 65.29 50.02 63.87 51.27 50.05 23.60
√ 51.87 39.67 54.77 42.21 27.28 11.47
√ √ 67.80 52.84 65.76 53.18 49.83 24.13
Path 42.57 30.58 44.34 33.44 24.22 8.38
√ Path 67.09 51.45 66.58 53.50 50.25 23.70
√ Path 52.89 40.66 55.28 42.86 27.72 11.65
√ √ Path 68.04 53.21 67.28 54.31 50.87 24.20
TD 33.86 25.56 31.84 24.97 13.09 7.20
√ TD 66.13 54.66 66.86 57.49 50.17 30.31
√ TD 56.14 46.28 59.67 49.39 29.05 16.73
√ √ TD 69.23 57.67 68.80 59.24 50.17 30.31
Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele. “Evaluation of output embeddings for fine-grained image classification.” In CVPR, 2015.

• Datasets
• AwA1,2: 40 known, 10 novel classes
• CUB: 150 known, 50 novel classes
√ 65.29 50.02 63.87 51.27 50.05 23.60
√ 51.87 39.67 54.77 42.21 27.28 11.47
√ √ 67.80 52.84 65.76 53.18 49.83 24.13
Path 42.57 30.58 44.34 33.44 24.22 8.38
√ Path 67.09 51.45 66.58 53.50 50.25 23.70
√ Path 52.89 40.66 55.28 42.86 27.72 11.65
√ √ Path 68.04 53.21 67.28 54.31 50.87 24.20
TD 33.86 25.56 31.84 24.97 13.09 7.20
√ TD 66.13 54.66 66.86 57.49 50.17 30.31
√ TD 56.14 46.28 59.67 49.39 29.05 16.73
√ √ TD 69.23 57.67 68.80 59.24 50.17 30.31

• Metrics
• Unseen class accuracy (ZSL)
• Area under seen-unseen curve (GZSL)
• By varying the unseen class score bias
√ 65.29 50.02 63.87 51.27 50.05 23.60
√ 51.87 39.67 54.77 42.21 27.28 11.47
√ √ 67.80 52.84 65.76 53.18 49.83 24.13
Path 42.57 30.58 44.34 33.44 24.22 8.38
√ Path 67.09 51.45 66.58 53.50 50.25 23.70
√ Path 52.89 40.66 55.28 42.86 27.72 11.65
√ √ Path 68.04 53.21 67.28 54.31 50.87 24.20
TD 33.86 25.56 31.84 24.97 13.09 7.20
√ TD 66.13 54.66 66.86 57.49 50.17 30.31
√ TD 56.14 46.28 59.67 49.39 29.05 16.73
√ √ TD 69.23 57.67 68.80 59.24 50.17 30.31

• Metrics
• Unseen class accuracy (ZSL)
• Area under seen-unseen curve (GZSL)
• By varying the unseen class score bias
(a) AwA1 (b) AwA2 (c) CUB

Conclusion
• We propose a hierarchical novelty detection framework, which
aims to find the most specific class label of any data on the
hierarchical taxonomy built with known labels.
• We propose two approaches.
• Top-down method performs classification & novelty detection
hierarchically.
• Flatten method computes single probability vector of all candidate classes.
• Their combination takes advantage of their complementary benefits.
• Our model can be combined with other semantic embeddings to
improve generalized zero-shot learning performance.

Hierarchical Novelty Detection for Visual Object Recognition

More Related Content

Similar to Hierarchical Novelty Detection for Visual Object Recognition (20)

More from NAVER Engineering (20)

Recently uploaded (20)

Hierarchical Novelty Detection for Visual Object Recognition