Cut mix: Regularization strategy to train strong classifiers with localizable features

CutMix - Regularization Strategy to Train Strong
Classifiers with Localizable Features
Changjin Lee

Intro
● Many data augmentation and regularization methods have been proposed for vision tasks
● Random feature removal regularization - they work OK!
○ dropout
○ regional dropout - remove random spatial regions
● However, regional dropouts lead to information loss - severe conceptual limitation
❖ How to maximally utilize the deleted regions while preserving generalization and localization effects using
regional dropout?
❖ The paper addresses this by CutMix - replaces the deleted region with a patch from another image.

CutMix Intro
● Crop a region and replace with a patch from another image
● The ground truth labels are also mixed proportionally to the
pixel ratio of the two images
Advantages
● No information loss
● Enhance localization ability - should learn to identify the
object from a partial view

Comparison: Regional Dropout
● CutMix is similar to regional dropout in that they both crop a portion of image
● Regional Dropout - randomly remove a portion of image
● CutMix - randomly crop a portion and replace with a patch from another image
Comparison: Synthesizing training data
● A synthesizing technique such as Stylizing ImageNet focuses more on shape than texture
● CutMix requires only trivial additional cost for training while generating new samples
Comparison: Mixup
● Mixup samples introduce locally ambiguous and unnatural images

Complementary to other models
● CutMix is a great complementary to weight decay, batch normalization, and adding noises
● CutMix operates only on data level

CutMix Algorithm
training images: (W*H*C)
Binary Mask: (W*H)
new training
sample
labels
combination ratio: sampled from Beta(ɑ,ɑ)
-> ɑ=1
1-λ
λ
B

CutMix on Class Activation Map (CAM)
● Vanilla ResNet-50
● Cutout focuses on less discriminative parts like belly
● Mixup fully uses the pixels but it’s unnatural and
confusing model which object to choose
● This confusion results in suboptimal performance
● CutMix successfully localize the two objects
confused

Performance: CIFAR-10
Layer 0: input level (best)
Layer 1: after conv-bn
Layer 2: after layer 1
…
Variations of CutMix lead to
performance degradation
-> Original is the best!

Dive Deeper…
❖ Random Croppings sometimes replace with useless images and this definitely reduces the performance
➢ Possible improvements: 1) Object Detection (inefficient?), 2) limiting cropped size i.e, lambda=U(0.5,0.8)?
❖ Weakly Supervised Object Localization
❖ Image Captioning
Bad
Good

References
● https://arxiv.org/pdf/1905.04899.pdf
● https://github.com/clovaai/CutMix-PyTorch
Blog Post
● https://jasonlee-cp.github.io/paper/CutMix/

Cut mix: Regularization strategy to train strong classifiers with localizable features

More Related Content

What's hot (9)

Similar to Cut mix: Regularization strategy to train strong classifiers with localizable features (20)

More from Changjin Lee (6)

Recently uploaded (20)

Cut mix: Regularization strategy to train strong classifiers with localizable features