SlideShare a Scribd company logo
2018-11-03
Introduction to Cyclical Learning
Rates for training Neural Nets
Sayak Paul
Project Instructor @ DataCamp
(GDG DevFest, Kolkata, India)
Overview of the talk
• Why are learning rates used?
• Some existing approaches for choosing the right learning rate
• What are the shortcomings of these approaches?
• Need of a systematic approach for setting the learning rate –
Cyclical Learning Rates (CLR)
• What is CLR?
• Some amazing results shown by CLR
• Conclusion
Why are learning rates used?
Learning is an important hyperparameter for adjusting the weights of a network with
respect to the loss gradient.
Source: Andrew Ng’s lecture notes from Coursera
Some of the existing approaches for choosing the right LR
• Trying out different learning rates for a problem.
• Grid-searching/Random-searching.
• Adaptive Learning Rates / Learning Rate Schedules.
Step Decay Schedule
Grid and Random layout of parameters
Problems with these approaches
• Computationally costly.
• Gives no early clue if at all the result would get better.
Cyclical Learning Rates*
• Proposed by Leslie N. Smith in his paper entitled “Cyclical Learning Rates for Training Neural
Networks” in 2015.
• The idea is to simply keep increasing the learning rate from a very small value, until the loss stops
decreasing.
Source
* Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
How is Cyclical Learning Rate (CLR) systematic?
• The main idea behind CLR is varying learning rates between min and max values.
• LR_Range_Test() is conducted for fixing the min and max values of learning rate.
LR_Range_Test()
• One step of increasing learning rate.
min_lr
max_lr
Also called Triangular Learning Rate Policy
Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
Choosing max_lr and min_lr
• Run the model for several epochs while letting the learning rate increase linearly (use triangular
learning rate policy) between low and high learning rate values.
• Next, plot the accuracy versus learning rate curve.
• Note the learning rate value when the accuracy starts to increase and when the accuracy slows,
becomes ragged, or starts to fall. These two learning rates are good choices for defining the range
of the learning rates.
Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
Popular CLR implementations in Python
As a Keras callback
As lr_find() method
Some amazing results shown by CLR
Kaggle iMaterialist Challenge (Fashion) Leaderboard
Some amazing results shown by CLR (contd.)
DAWNBench Challenge Leaderboard and Leader’s specs
Limitations of CLR
• Limited applicability.
• Seems to work only for Cifar-10 and resnets.
• But definitely provides a more systematic way for choosing learning
rate than the earlier approaches.
Notable enhancements inspired by CLR
• Stochastic Gradient Descent with Restarts.
• Differential Learning Rates.
Some Wealth of Wisdom
Original CLR Paper DataCamp tutorial covering CLR
Slides available on my Github (Username: sayakpaul)
Thank you!
Sayak Paul
Project Instructor
spsayakpaul@gmail.com

More Related Content

PDF
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PDF
Learning to Learn by Gradient Descent by Gradient Descent
PPTX
Algorithms and Programming
PDF
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
PDF
Context-aware preference modeling with factorization
PDF
Trust Region Policy Optimization, Schulman et al, 2015
PDF
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
PPTX
Utilizing additional information in factorization methods (research overview,...
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
Learning to Learn by Gradient Descent by Gradient Descent
Algorithms and Programming
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Context-aware preference modeling with factorization
Trust Region Policy Optimization, Schulman et al, 2015
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Utilizing additional information in factorization methods (research overview,...

What's hot (20)

PDF
High-dimensional dynamics of generalization error in neural networks (Explained)
PDF
Proximal Policy Optimization Algorithms, Schulman et al, 2017
PPTX
Strategy Pattern
PPTX
Machine learning with scikitlearn
PDF
Model Based Episodic Memory
PDF
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
PPTX
An overview of gradient descent optimization algorithms
PPTX
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
PPTX
Competition winning learning rates
PDF
The Predictron: End-to-end Learning and Planning
PPTX
AWS Forcecast: DeepAR Predictor Time-series
PDF
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
PDF
Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks
PPT
Presentazione Tesi Laurea Triennale in Informatica
PDF
Exploration Strategies in Reinforcement Learning
PDF
Generative Models for General Audiences
PDF
H2O World - GBM and Random Forest in H2O- Mark Landry
PPTX
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
PPTX
Imitation Learning
PDF
Meetup_Consumer_Credit_Default_Vers_2_All
High-dimensional dynamics of generalization error in neural networks (Explained)
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Strategy Pattern
Machine learning with scikitlearn
Model Based Episodic Memory
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
An overview of gradient descent optimization algorithms
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at K...
Competition winning learning rates
The Predictron: End-to-end Learning and Planning
AWS Forcecast: DeepAR Predictor Time-series
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks
Presentazione Tesi Laurea Triennale in Informatica
Exploration Strategies in Reinforcement Learning
Generative Models for General Audiences
H2O World - GBM and Random Forest in H2O- Mark Landry
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
Imitation Learning
Meetup_Consumer_Credit_Default_Vers_2_All
Ad

Similar to Introduction to cyclical learning rates for training neural nets (20)

PDF
Winning Kaggle 101: Introduction to Stacking
PDF
Optimization as a model for few shot learning
PDF
H2O World - Ensembles with Erin LeDell
PPT
Machine Learning and its Appplications--
PPTX
part3Module 3 ppt_with classification.pptx
PPTX
Machine learning - session 3
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
PPTX
Machine learning Basics Introduction ppt
PPT
Supervised Learningclassification Part1.ppt
PPTX
weka data mining
PDF
Mini datathon
PPTX
Mini datathon - Bengaluru
PPTX
Data warehouse and Data mining presentation
PDF
Machine Learning - Implementation with Python - 3.pdf
PPTX
Machine Learning
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PPTX
Machine Learning with Python- Methods for Machine Learning.pptx
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
Model Development And Evaluation in ML.pptx
PDF
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Winning Kaggle 101: Introduction to Stacking
Optimization as a model for few shot learning
H2O World - Ensembles with Erin LeDell
Machine Learning and its Appplications--
part3Module 3 ppt_with classification.pptx
Machine learning - session 3
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Machine learning Basics Introduction ppt
Supervised Learningclassification Part1.ppt
weka data mining
Mini datathon
Mini datathon - Bengaluru
Data warehouse and Data mining presentation
Machine Learning - Implementation with Python - 3.pdf
Machine Learning
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Machine Learning with Python- Methods for Machine Learning.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
Model Development And Evaluation in ML.pptx
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Ad

Recently uploaded (20)

PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Computing-Curriculum for Schools in Ghana
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
Introduction to Building Materials
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
HVAC Specification 2024 according to central public works department
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
20th Century Theater, Methods, History.pptx
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
FORM 1 BIOLOGY MIND MAPS and their schemes
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Virtual and Augmented Reality in Current Scenario
Computing-Curriculum for Schools in Ghana
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Introduction to pro and eukaryotes and differences.pptx
AI-driven educational solutions for real-life interventions in the Philippine...
TNA_Presentation-1-Final(SAVE)) (1).pptx
Unit 4 Computer Architecture Multicore Processor.pptx
Introduction to Building Materials
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Indian roads congress 037 - 2012 Flexible pavement
HVAC Specification 2024 according to central public works department
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
20th Century Theater, Methods, History.pptx
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
History, Philosophy and sociology of education (1).pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper

Introduction to cyclical learning rates for training neural nets

  • 1. 2018-11-03 Introduction to Cyclical Learning Rates for training Neural Nets Sayak Paul Project Instructor @ DataCamp (GDG DevFest, Kolkata, India)
  • 2. Overview of the talk • Why are learning rates used? • Some existing approaches for choosing the right learning rate • What are the shortcomings of these approaches? • Need of a systematic approach for setting the learning rate – Cyclical Learning Rates (CLR) • What is CLR? • Some amazing results shown by CLR • Conclusion
  • 3. Why are learning rates used? Learning is an important hyperparameter for adjusting the weights of a network with respect to the loss gradient. Source: Andrew Ng’s lecture notes from Coursera
  • 4. Some of the existing approaches for choosing the right LR • Trying out different learning rates for a problem. • Grid-searching/Random-searching. • Adaptive Learning Rates / Learning Rate Schedules. Step Decay Schedule Grid and Random layout of parameters
  • 5. Problems with these approaches • Computationally costly. • Gives no early clue if at all the result would get better.
  • 6. Cyclical Learning Rates* • Proposed by Leslie N. Smith in his paper entitled “Cyclical Learning Rates for Training Neural Networks” in 2015. • The idea is to simply keep increasing the learning rate from a very small value, until the loss stops decreasing. Source * Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
  • 7. How is Cyclical Learning Rate (CLR) systematic? • The main idea behind CLR is varying learning rates between min and max values. • LR_Range_Test() is conducted for fixing the min and max values of learning rate.
  • 8. LR_Range_Test() • One step of increasing learning rate. min_lr max_lr Also called Triangular Learning Rate Policy Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
  • 9. Choosing max_lr and min_lr • Run the model for several epochs while letting the learning rate increase linearly (use triangular learning rate policy) between low and high learning rate values. • Next, plot the accuracy versus learning rate curve. • Note the learning rate value when the accuracy starts to increase and when the accuracy slows, becomes ragged, or starts to fall. These two learning rates are good choices for defining the range of the learning rates. Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith
  • 10. Popular CLR implementations in Python As a Keras callback As lr_find() method
  • 11. Some amazing results shown by CLR Kaggle iMaterialist Challenge (Fashion) Leaderboard
  • 12. Some amazing results shown by CLR (contd.) DAWNBench Challenge Leaderboard and Leader’s specs
  • 13. Limitations of CLR • Limited applicability. • Seems to work only for Cifar-10 and resnets. • But definitely provides a more systematic way for choosing learning rate than the earlier approaches.
  • 14. Notable enhancements inspired by CLR • Stochastic Gradient Descent with Restarts. • Differential Learning Rates.
  • 15. Some Wealth of Wisdom Original CLR Paper DataCamp tutorial covering CLR Slides available on my Github (Username: sayakpaul)
  • 16. Thank you! Sayak Paul Project Instructor spsayakpaul@gmail.com