Learning without
forgetting
Introduction Neural network
Introduction
Neural network
Benefits are:
▫ To store information
▫ To retrieve lost
information
▫ To act when faced with
new situation
Introduction
Neural network Problem:
forgets the old task when new task is stored.
Introduction
Problem in
Applications
Image recognition
• new capabilities are to be added
• Assumes old data set is available
• Infeasible
• Vision system using CNN
• No old task data
• Train using new data
• Preserves old task
• A new method for Neural Nets
• To learn without forgetting
Introduction
CNN
Convolutional neural network
• Operates on volumes
• Convolution (filter):
• reduces to smaller size with specific information
• ReLu
• Pooling : reduces the spatial size to reduce computation and
parameters
• Maxpooling is to reduce to max parameter
Present Methods
Common approaches
Currently developed
Present methods Model
• Structure
• Parameter
• ᶿs : shared parameter
• ᶿo : old task parameter
• ᶿn : new task parameter
Present methods
Common
approaches
Feature extraction
• ᶿs & ᶿo is unchanged
• Layers used new task
Present methods
Common
approaches
Fine tuning
• ᶿs & ᶿn is optimized, ᶿo is fixed
• Low learning rate
• Can be duplicated
Present methods
Common
approaches
Joint training
• ᶿs , ᶿn & ᶿo are jointly optimised
• The tasks are interleaved
• Multi task learning
Present methods
Currently developed methods
 A-LTM (Active Long-Term Memory)
 Less forgetting Learning
 Cross-stitch Network
 WA-CNN
Present methods
Currently developed
methods
A-LTM:
• Identical to LwF
• Differ –
• weight decay regularization for training
• Warm-up step used after FT
• Dataset size
• Large loss
• Needs old task DS
Present methods
Currently developed
methods
Less Forgetting Learning:
• Similar
• Hinders change in
• Task specific decision boundary
• Shared representation
• Adds L2 loss :
• No change in ᶿs for new task
• ᶿo remains same
Present methods
Currently developed
methods
Cross-stitch Network:
• Works on MTL
• Introduces cross-stitch module
• Jointly learns:
• 2 same structure network blocks
• 2 pairs of weight -> same output(s)
• Outperforms joint training
• Needs old task DS
• Increases network size
Present methods
Currently developed
methods
WA-CNN:
• Expands the network (ᶿs)
• Improves new task performance
• Freezing ᶿo
• Maintains old-task
• Outperforms traditional fine-tuning
• But it increases network size faster than LwF
Learning
without
forgetting
The proposed method
• Uses only new task data to train
• Preserves the original capabilities
• Performs favourably :
• Feature extraction
• Fine tuning adaptation
• Similar to MTL that uses old DS
The proposed
method
18
• a Unified vision system
• The CNN has parameters:
• ᶿs : shared parameter
• ᶿo : old / specific task parameter
• The goal is to add ᶿn : new task parameter
• Learn parameters
• Works well on old & new task
• using only new & not old task DS
The proposed
method
19
• Advantage over common approaches :
• Classification performance
• Outperforms feature extraction & fine-tuning
• Computation efficiency
• Faster training & test time
• But slower than fine tuning
• Simplicity in deployment
• No need to retrain in adapting network
The proposed
method
20
Procedure
21
Phase I : Initialization
• The output is recorded (Yo) on old task for new data
• Response is a set of label probabilities
• A node for new class
• Weights initialised randomly
Procedure
22
Phase II : Training
• Train to minimize loss for all task
• Regularization using Stochastic gradient descent
• Two steps:
• Warm-up step
• Freeze ᶿo & ᶿs & trains ᶿn
• Joint optimization step
• Train all weights
Procedure
Phase II : Training
• Logistic loss
• Knowledge distillation loss
Procedure
Phase II : Training
• For calculating Knowledge distillation function
• Recorded & current probabilities
• T > 1 ; usually T = 2
• λo is a loss balance weight = 1
• Larger = greater old task performance
• Smaller = greater new task performance
Procedure
Experiment • Use
• Large dataset to train initial net
• Smaller dataset to add new task
• Old/Original task :
• ImageNet
• Contains 1000 object category
• more than 1000K training images
• Places365-standard
• Contains 365 scene classes
• ~1600K training images
• New task:
• PASCAL VOC (“VOC”) ~6K
• Caltech-UCSD Birds (“CUB”) ~6K
• MIT indoor scene (“Scenes”) ~6K
Experiment • Two scenario:
• Single new task scenario:
• On new task, LwF outperformed
• LFL, fine-tuning FC, feature extraction
& fine-tuning in most pair
• On old task, performs
• Better than fine-tuning
• Underperforms feature extraction,fine-
tuning FC & LFL
• Multiple new task scenario:
• LwF outperforms all except joint training
Experiment
Extension
of
LwF
• Network expansion
• Adds nodes to some layers
• Allows new-task-specific information to be stored
• Used along with LwF
• Performs better feature extraction
29
Limitations
30
• Can`t deal with change in domain
• All new-task data to be present before
computing their old task response
• Learning new task decreases old-task
recovery
The End
31
Learned something
without forgetting
Learned
something
without
forgetting

More Related Content

PDF
Implementation of linear regression and logistic regression on Spark
PDF
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
PDF
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
PPTX
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
PPTX
Ashfaq Munshi, ML7 Fellow, Pepperdata
PDF
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
PDF
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Implementation of linear regression and logistic regression on Spark
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Ashfaq Munshi, ML7 Fellow, Pepperdata
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...

What's hot (19)

PPTX
Image net classification with deep convolutional neural networks
PPTX
2013 06-03 berlin buzzwords
PPTX
2013.09.10 Giraph at London Hadoop Users Group
PDF
Generalized Linear Models in Spark MLlib and SparkR
PPTX
Comparison between Cloud Mirror, Mesos Cluster, and Google Omega
PPTX
Simulation of Heterogeneous Cloud Infrastructures
PPTX
C-Cube: Elastic Continuous Clustering in the Cloud
PDF
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
PPTX
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
PPTX
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
PDF
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
PDF
Thinking Functionally with Clojure
PDF
Spark Summit EU talk by Herman van Hovell
PPTX
KnittingBoar Toronto Hadoop User Group Nov 27 2012
PPTX
Sql saturday azure storage by Anton Vidishchev
PDF
Optimizing Terascale Machine Learning Pipelines with Keystone ML
PDF
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
PDF
Ufuc Celebi – Stream & Batch Processing in one System
PDF
Terascale Learning
Image net classification with deep convolutional neural networks
2013 06-03 berlin buzzwords
2013.09.10 Giraph at London Hadoop Users Group
Generalized Linear Models in Spark MLlib and SparkR
Comparison between Cloud Mirror, Mesos Cluster, and Google Omega
Simulation of Heterogeneous Cloud Infrastructures
C-Cube: Elastic Continuous Clustering in the Cloud
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
"Эффективность и оптимизация кода в Java 8" Сергей Моренец
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Thinking Functionally with Clojure
Spark Summit EU talk by Herman van Hovell
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Sql saturday azure storage by Anton Vidishchev
Optimizing Terascale Machine Learning Pipelines with Keystone ML
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
Ufuc Celebi – Stream & Batch Processing in one System
Terascale Learning
Ad

Similar to Neural network learning ability (20)

PPTX
An Introduction to Deep Learning
PPTX
Introduction to Deep Learning
PPTX
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
PPTX
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
PPTX
Introduction to transfer learning,aster way of adapting a neural network by e...
PPTX
Deeplearning
PPTX
Week 12 Dimensionality Reduction Bagian 1
PDF
ResNeSt: Split-Attention Networks
PDF
Network recasting
PDF
Neural network pruning with residual connections and limited-data review [cdm]
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
PPTX
Pretrained Image Classification Model for CNN
PPTX
Parallel Distributed Deep Learning on HPCC Systems
PPTX
SOA with PHP and Symfony
PDF
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
PPTX
Deep learning summary
PPTX
Transfer Learning with Pretrained Models
PDF
Neural Network Part-2
PPTX
Regularization in deep learning
PPTX
Application of machine learning and cognitive computing in intrusion detectio...
An Introduction to Deep Learning
Introduction to Deep Learning
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
250203_JH_labseminar[BERT4Rec : Sequential Recommendation with Bidirectional ...
Introduction to transfer learning,aster way of adapting a neural network by e...
Deeplearning
Week 12 Dimensionality Reduction Bagian 1
ResNeSt: Split-Attention Networks
Network recasting
Neural network pruning with residual connections and limited-data review [cdm]
Expanding HPCC Systems Deep Neural Network Capabilities
Pretrained Image Classification Model for CNN
Parallel Distributed Deep Learning on HPCC Systems
SOA with PHP and Symfony
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
Deep learning summary
Transfer Learning with Pretrained Models
Neural Network Part-2
Regularization in deep learning
Application of machine learning and cognitive computing in intrusion detectio...
Ad

Recently uploaded (20)

PDF
Workplace Software and Skills - OpenStax
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Introduction to Windows Operating System
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Download Adobe Photoshop Crack 2025 Free
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Computer Software - Technology and Livelihood Education
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Guide to Food Delivery App Development.pdf
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Workplace Software and Skills - OpenStax
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Introduction to Windows Operating System
Topaz Photo AI Crack New Download (Latest 2025)
Trending Python Topics for Data Visualization in 2025
DNT Brochure 2025 – ISV Solutions @ D365
How Tridens DevSecOps Ensures Compliance, Security, and Agility
BoxLang Dynamic AWS Lambda - Japan Edition
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
How to Use SharePoint as an ISO-Compliant Document Management System
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Download Adobe Photoshop Crack 2025 Free
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Computer Software - Technology and Livelihood Education
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Guide to Food Delivery App Development.pdf
Full-Stack Developer Courses That Actually Land You Jobs
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Matchmaking for JVMs: How to Pick the Perfect GC Partner

Neural network learning ability

  • 3. Introduction Neural network Benefits are: ▫ To store information ▫ To retrieve lost information ▫ To act when faced with new situation
  • 4. Introduction Neural network Problem: forgets the old task when new task is stored.
  • 5. Introduction Problem in Applications Image recognition • new capabilities are to be added • Assumes old data set is available • Infeasible • Vision system using CNN • No old task data • Train using new data • Preserves old task • A new method for Neural Nets • To learn without forgetting
  • 6. Introduction CNN Convolutional neural network • Operates on volumes • Convolution (filter): • reduces to smaller size with specific information • ReLu • Pooling : reduces the spatial size to reduce computation and parameters • Maxpooling is to reduce to max parameter
  • 8. Present methods Model • Structure • Parameter • ᶿs : shared parameter • ᶿo : old task parameter • ᶿn : new task parameter
  • 9. Present methods Common approaches Feature extraction • ᶿs & ᶿo is unchanged • Layers used new task
  • 10. Present methods Common approaches Fine tuning • ᶿs & ᶿn is optimized, ᶿo is fixed • Low learning rate • Can be duplicated
  • 11. Present methods Common approaches Joint training • ᶿs , ᶿn & ᶿo are jointly optimised • The tasks are interleaved • Multi task learning
  • 12. Present methods Currently developed methods  A-LTM (Active Long-Term Memory)  Less forgetting Learning  Cross-stitch Network  WA-CNN
  • 13. Present methods Currently developed methods A-LTM: • Identical to LwF • Differ – • weight decay regularization for training • Warm-up step used after FT • Dataset size • Large loss • Needs old task DS
  • 14. Present methods Currently developed methods Less Forgetting Learning: • Similar • Hinders change in • Task specific decision boundary • Shared representation • Adds L2 loss : • No change in ᶿs for new task • ᶿo remains same
  • 15. Present methods Currently developed methods Cross-stitch Network: • Works on MTL • Introduces cross-stitch module • Jointly learns: • 2 same structure network blocks • 2 pairs of weight -> same output(s) • Outperforms joint training • Needs old task DS • Increases network size
  • 16. Present methods Currently developed methods WA-CNN: • Expands the network (ᶿs) • Improves new task performance • Freezing ᶿo • Maintains old-task • Outperforms traditional fine-tuning • But it increases network size faster than LwF
  • 18. • Uses only new task data to train • Preserves the original capabilities • Performs favourably : • Feature extraction • Fine tuning adaptation • Similar to MTL that uses old DS The proposed method 18
  • 19. • a Unified vision system • The CNN has parameters: • ᶿs : shared parameter • ᶿo : old / specific task parameter • The goal is to add ᶿn : new task parameter • Learn parameters • Works well on old & new task • using only new & not old task DS The proposed method 19
  • 20. • Advantage over common approaches : • Classification performance • Outperforms feature extraction & fine-tuning • Computation efficiency • Faster training & test time • But slower than fine tuning • Simplicity in deployment • No need to retrain in adapting network The proposed method 20
  • 22. Phase I : Initialization • The output is recorded (Yo) on old task for new data • Response is a set of label probabilities • A node for new class • Weights initialised randomly Procedure 22
  • 23. Phase II : Training • Train to minimize loss for all task • Regularization using Stochastic gradient descent • Two steps: • Warm-up step • Freeze ᶿo & ᶿs & trains ᶿn • Joint optimization step • Train all weights Procedure
  • 24. Phase II : Training • Logistic loss • Knowledge distillation loss Procedure
  • 25. Phase II : Training • For calculating Knowledge distillation function • Recorded & current probabilities • T > 1 ; usually T = 2 • λo is a loss balance weight = 1 • Larger = greater old task performance • Smaller = greater new task performance Procedure
  • 26. Experiment • Use • Large dataset to train initial net • Smaller dataset to add new task • Old/Original task : • ImageNet • Contains 1000 object category • more than 1000K training images • Places365-standard • Contains 365 scene classes • ~1600K training images • New task: • PASCAL VOC (“VOC”) ~6K • Caltech-UCSD Birds (“CUB”) ~6K • MIT indoor scene (“Scenes”) ~6K
  • 27. Experiment • Two scenario: • Single new task scenario: • On new task, LwF outperformed • LFL, fine-tuning FC, feature extraction & fine-tuning in most pair • On old task, performs • Better than fine-tuning • Underperforms feature extraction,fine- tuning FC & LFL • Multiple new task scenario: • LwF outperforms all except joint training
  • 29. Extension of LwF • Network expansion • Adds nodes to some layers • Allows new-task-specific information to be stored • Used along with LwF • Performs better feature extraction 29
  • 30. Limitations 30 • Can`t deal with change in domain • All new-task data to be present before computing their old task response • Learning new task decreases old-task recovery
  • 31. The End 31 Learned something without forgetting Learned something without forgetting

Editor's Notes

  • #2: We learn new things when new neurons are created for it in our brain Better the plasticity , better we remember As time goes, without refreshing these things tends to fade away This is same in case of ANN
  • #3: need to do : Explain about the neural network,The basics
  • #4: Notes: if I include lemur identification to the net , it forgets the dog functionality
  • #5: Forgets the old task Explain with an example of forgetting with basic neural network Need to add : forgetting example
  • #6: One of the well known victim of forgetting is in Image recognition Explain paper: slide 1 Introduction slide 1
  • #7: Explain Internet: Convolutional neural network :
  • #8: To tackle the problem of catastrophic forgetting
  • #9: Explain paper: model
  • #10: Explain paper: Feature extraction explain , disadvantages
  • #11: Explain paper: Fine tuning explain , disadvantages
  • #12: Explain paper: Joint training explain , disadvantages
  • #13: Explain paper: Concurrently developed method explain , disadvantages
  • #14: Explain paper: A-LTM explain , disadvantages
  • #15: Explain paper: Less forgetting learning, compare it with LwF
  • #16: Explain paper: Cross-stitch network Web: Cross-stitch network Cross-stitch module
  • #17: Explain paper: Cross-stitch network Web: Cross-stitch network Cross-stitch module