Introduction
Neural network Problem:
forgets the old task when new task is stored.

Introduction
Problem in
Applications
Image recognition
• new capabilities are to be added
• Assumes old data set is available
• Infeasible
• Vision system using CNN
• No old task data
• Train using new data
• Preserves old task
• A new method for Neural Nets
• To learn without forgetting

Introduction
CNN
Convolutional neural network
• Operates on volumes
• Convolution (filter):
• reduces to smaller size with specific information
• ReLu
• Pooling : reduces the spatial size to reduce computation and
parameters
• Maxpooling is to reduce to max parameter

Present Methods
Common approaches
Currently developed

Present methods Model
• Structure
• Parameter
• ᶿs : shared parameter
• ᶿo : old task parameter
• ᶿn : new task parameter

Present methods
Common
approaches
Feature extraction
• ᶿs & ᶿo is unchanged
• Layers used new task

Present methods
Common
approaches
Fine tuning
• ᶿs & ᶿn is optimized, ᶿo is fixed
• Low learning rate
• Can be duplicated

Present methods
Common
approaches
Joint training
• ᶿs , ᶿn & ᶿo are jointly optimised
• The tasks are interleaved
• Multi task learning

Present methods
Currently developed methods
 A-LTM (Active Long-Term Memory)
 Less forgetting Learning
 Cross-stitch Network
 WA-CNN

Present methods
Currently developed
methods
A-LTM:
• Identical to LwF
• Differ –
• weight decay regularization for training
• Warm-up step used after FT
• Dataset size
• Large loss
• Needs old task DS

Present methods
Currently developed
methods
Less Forgetting Learning:
• Similar
• Hinders change in
• Task specific decision boundary
• Shared representation
• Adds L2 loss :
• No change in ᶿs for new task
• ᶿo remains same

Present methods
Currently developed
methods
Cross-stitch Network:
• Works on MTL
• Introduces cross-stitch module
• Jointly learns:
• 2 same structure network blocks
• 2 pairs of weight -> same output(s)
• Outperforms joint training
• Needs old task DS
• Increases network size

Present methods
Currently developed
methods
WA-CNN:
• Expands the network (ᶿs)
• Improves new task performance
• Freezing ᶿo
• Maintains old-task
• Outperforms traditional fine-tuning
• But it increases network size faster than LwF

Learning
without
forgetting
The proposed method

• Uses only new task data to train
• Preserves the original capabilities
• Performs favourably :
• Feature extraction
• Fine tuning adaptation
• Similar to MTL that uses old DS
The proposed
method
18

• a Unified vision system
• The CNN has parameters:
• ᶿs : shared parameter
• ᶿo : old / specific task parameter
• The goal is to add ᶿn : new task parameter
• Learn parameters
• Works well on old & new task
• using only new & not old task DS
The proposed
method
19

• Advantage over common approaches :
• Classification performance
• Outperforms feature extraction & fine-tuning
• Computation efficiency
• Faster training & test time
• But slower than fine tuning
• Simplicity in deployment
• No need to retrain in adapting network
The proposed
method
20

Phase I : Initialization
• The output is recorded (Yo) on old task for new data
• Response is a set of label probabilities
• A node for new class
• Weights initialised randomly
Procedure
22

Phase II : Training
• Train to minimize loss for all task
• Regularization using Stochastic gradient descent
• Two steps:
• Warm-up step
• Freeze ᶿo & ᶿs & trains ᶿn
• Joint optimization step
• Train all weights
Procedure

Phase II : Training
• Logistic loss
• Knowledge distillation loss
Procedure

Phase II : Training
• For calculating Knowledge distillation function
• Recorded & current probabilities
• T > 1 ; usually T = 2
• λo is a loss balance weight = 1
• Larger = greater old task performance
• Smaller = greater new task performance
Procedure

Experiment • Use
• Large dataset to train initial net
• Smaller dataset to add new task
• Old/Original task :
• ImageNet
• Contains 1000 object category
• more than 1000K training images
• Places365-standard
• Contains 365 scene classes
• ~1600K training images
• New task:
• PASCAL VOC (“VOC”) ~6K
• Caltech-UCSD Birds (“CUB”) ~6K
• MIT indoor scene (“Scenes”) ~6K

Experiment • Two scenario:
• Single new task scenario:
• On new task, LwF outperformed
• LFL, fine-tuning FC, feature extraction
& fine-tuning in most pair
• On old task, performs
• Better than fine-tuning
• Underperforms feature extraction,fine-
tuning FC & LFL
• Multiple new task scenario:
• LwF outperforms all except joint training

Extension
of
LwF
• Network expansion
• Adds nodes to some layers
• Allows new-task-specific information to be stored
• Used along with LwF
• Performs better feature extraction
29

Limitations
30
• Can`t deal with change in domain
• All new-task data to be present before
computing their old task response
• Learning new task decreases old-task
recovery

The End
31
Learned something
without forgetting
Learned
something
without
forgetting

Neural network learning ability

More Related Content

What's hot (19)

Similar to Neural network learning ability (20)

Recently uploaded (20)

Neural network learning ability

Editor's Notes