Introduction to cyclical learning rates for training neural nets

2018-11-03
Introduction to Cyclical Learning
Rates for training Neural Nets
Sayak Paul
Project Instructor @ DataCamp
(GDG DevFest, Kolkata, India)

Overview of the talk
• Why are learning rates used?
• Some existing approaches for choosing the right learning rate
• What are the shortcomings of these approaches?
• Need of a systematic approach for setting the learning rate –
Cyclical Learning Rates (CLR)
• What is CLR?
• Some amazing results shown by CLR
• Conclusion

Why are learning rates used?
Learning is an important hyperparameter for adjusting the weights of a network with
respect to the loss gradient.
Source: Andrew Ng’s lecture notes from Coursera

Some of the existing approaches for choosing the right LR
• Trying out different learning rates for a problem.
• Grid-searching/Random-searching.
• Adaptive Learning Rates / Learning Rate Schedules.
Step Decay Schedule
Grid and Random layout of parameters

Problems with these approaches
• Computationally costly.
• Gives no early clue if at all the result would get better.

Cyclical Learning Rates*
• Proposed by Leslie N. Smith in his paper entitled “Cyclical Learning Rates for Training Neural
Networks” in 2015.
• The idea is to simply keep increasing the learning rate from a very small value, until the loss stops
decreasing.
Source
* Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith

How is Cyclical Learning Rate (CLR) systematic?
• The main idea behind CLR is varying learning rates between min and max values.
• LR_Range_Test() is conducted for fixing the min and max values of learning rate.

LR_Range_Test()
• One step of increasing learning rate.
min_lr
max_lr
Also called Triangular Learning Rate Policy
Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith

Choosing max_lr and min_lr
• Run the model for several epochs while letting the learning rate increase linearly (use triangular
learning rate policy) between low and high learning rate values.
• Next, plot the accuracy versus learning rate curve.
• Note the learning rate value when the accuracy starts to increase and when the accuracy slows,
becomes ragged, or starts to fall. These two learning rates are good choices for defining the range
of the learning rates.
Source: Cyclical Learning Rates for Training Neural Networks – Leslie N. Smith

Popular CLR implementations in Python
As a Keras callback
As lr_find() method

Some amazing results shown by CLR
Kaggle iMaterialist Challenge (Fashion) Leaderboard

Some amazing results shown by CLR (contd.)
DAWNBench Challenge Leaderboard and Leader’s specs

Limitations of CLR
• Limited applicability.
• Seems to work only for Cifar-10 and resnets.
• But definitely provides a more systematic way for choosing learning
rate than the earlier approaches.

Notable enhancements inspired by CLR
• Stochastic Gradient Descent with Restarts.
• Differential Learning Rates.

Some Wealth of Wisdom
Original CLR Paper DataCamp tutorial covering CLR
Slides available on my Github (Username: sayakpaul)

Thank you!
Sayak Paul
Project Instructor
spsayakpaul@gmail.com

Introduction to cyclical learning rates for training neural nets

More Related Content

What's hot (20)

Similar to Introduction to cyclical learning rates for training neural nets (20)

Recently uploaded (20)

Introduction to cyclical learning rates for training neural nets