SlideShare a Scribd company logo
Nonlinear Exponential Regularization :
An Improved Version of Regularization
for Deep Learning Model
Hansung University
Master course 4 𝑡ℎ
Seoung-Ho Choi
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (1/14)
• Goal :
• Let’s make a better and new regularization
• Problem :
• Exisiting linear combination of L1 and L2 regularization are not reflected w
ell about gradient for deep learning model optimization
• The properties from the combination of L1 and L2 seem to reflect as
unhelpful features in the process of optimization
• Contribution :
• We proposed new nonlinear exponential regularization
• We experimented two task experiment, additional experiment.
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (2/14)
• Related Works
• Well-known regularizations for deep learning model optimization are L1 an
d L2 regularization [2].
• L1 regularization
• L1 regularization is the absolute value of the error between the actual and predicte
d values.
• L1 regularization to be used as feature selection
• L1 regularization is suitable for sparse coding
• L2 regularization
• L2 regularization [2] is defined as the sum of squares of errors.
• Comparison of L1 and L2 regularization
• L1 regularization is more robust for outlier than L2 regularization because L2 is cal
culated by the square of the error [2]. This keeps low features and finds better pre
dictions.
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (3/14)
• Proposed Method
• Eq. (1) is the nonlinear exponential regularization
• n is the class number, m is the total sum of n, 𝑦_true is the result of the ground truth, and 𝑦_pred is th
e model prediction result. We use two terms for regularization. One is the average of absolute of the
difference between the model predictions and the model ground truth to make robustness to the outli
ers of the predicted values in Eq (1)
• The other is the average of the square of the difference between the model predictions and the model
ground truth. We combined each regularization term through the exponential function.
• We used L1 regularization as a scaling term and L2 regularization as a phase term of the exponential f
unction.
• We can also use L2 regularization as the scale term and L1 regularization as the phase term of the exp
onential function. However, we did not experiment with L2 regularization as the scale term because we
only wanted to reflect the magnitude through the absolute value.
• Consequently, we conduct the nonlinear combinations of regularization and apply the exponential oper
ation to combinations of existing regularization such as L1 and L2 regularization.
(1)
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (4/14)
• Proposed Method (Cont.)
• Figure 1 shows the visualization of the e
ffect of nonlinear exponential regularizat
ion on the L1 in terms of weight.
• Nonlinear exponential regularization res
ults reduce the complexity of the weight
by giving the scaling effect of L1
regularization.
• By lowering the complexity of the weigh
ts, the gradient in model moves efficient
ly to suboptimal optimization.
Figure 1. Effect of Ours Proposal for Suboptimal Training Path
Gradients, a) L1 regularization b) Nonlinear Exponential Regula
rization (Our proposal)
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (5/14)
• Proposed Method (Cont.)
• A is a coefficient of L1 regularization.
• B is a coefficient of L2 regularization.
𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + 𝐵 × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
(2)
𝑁𝑜𝑛𝑙𝑖𝑛𝑒𝑎𝑟 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 e^( 1 - A ) × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑐𝑜𝑛𝑣𝑒𝑥 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + ((1 − 𝐴) × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛)
(3)
(4)
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (6/14)
• Experimental Setting
• (Verification) The experimental method to verify the proposed regularizatio
n is as follows: a) L1 regularization, b) L2 regularization, c) linear combinati
on of L1 and L2 regularization, d) nonlinear exponential combination of L1
and L2 regularization.
• (Direct effect) To verify regularization effect, we experiment with four regul
arization coefficients, e.g.) 0.0, 0.25, 0.5, and 0.75.
• (Side effect) We experimented on three-loss (Cross entropy, Focal loss, and
Hinge loss) using Adam with Vanilla-GAN [3] and LSGAN [4]
• (Evalutation) We checked the performance of the methods using MSE (Mea
n Square Error) and PSNR (Peak Signal to Noise Ratio)
• (Comfirm different domain for generalizaiton technique) We experimented
2 task. (i. e. generation and segmentation), We described experimented par
ameter in paper.
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (7/14)
• Experimental Result
• (Generation task) Figure 2 shows the comparative analysis of propos
ed nonlinear exponential regularization and linear regularization.
• The proposed method shows faster convergence and lower error val
ues than the conventional method. This can be seen that the nonline
ar characteristics help to
converge faster.
Figure 2. Comparative analysis of proposed methods to help fast con
vergence in training a) Vanilla-GAN, b) LSGAN, i) Cifar10, and ii) Cifar
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (8/14)
• Experimental Result (Cont.)
• To clarify this, we extracted the top 5 values from the experimental r
esults as the 5 lowest values in the MSE and the highest values in th
e PSNR in Table 1.
• We use the average of the percentage improvement for each experi
ment. Our nonlinear exponential regularization shows lower MSE of
0.71 and better PSNR of 0.0275 than the linearly combined regulariz
ation of L1 and L2 regularization
• We have found that finding the optimal
point from our proposal can produce a
generated image with better results from
the generateon model
Table 1. Experimental results using two datasets
on two models: top 5 values of PSNR and MSE
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (9/14)
• Experimental Result (Cont.)
• Figure 3 shows generated images by linearly combined regularization and
nonlinear exponential regularization.
• Proposed regularization generates better images than the linearly combine
d regularization.
• This may be because proposed exponential regularization finds more opti
mal of gradients by focusing more important loss.
• In the image with complex and complex
color, the nonliear expotential regularize-
ation makes the clustering between pixels
more efficient than the conventional linear
method.
Figure 3. Generated images by a) linearly combined regularizati
on and b) nonlinear exponential regularization
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (10/14)
Table 2. Comparison of our proposal in two models,
a) loss, b) loss with linear combination of L1 and L2
regularization, c) loss with nonlinear exponential reg
ularization of L1 and L2 regularization, d) loss with
nonlinear exponential regularization of L1 and L2 re
gularization and linear regularization of L1 and L2 r
egularization
• Experimental Result (Cont.)
• (Segmentation) We conducted additional linear and no
nlinear combination experiments to determine the effe
cts of the combination of nonlinear and linear features.
• Performance
• Nonlinear exponential regularization improved DSC 0.06%,
F1 score 0.06%, IOU 0%, Precision 0.06%, and Recall 0.06% t
han origin. Nonlinear exponential regularization improved D
SC 0.34%, F1 score 0.34%, IOU 0.19%, Precision 0.34%, and
Recall 0.34% than linear combination of L1 and L2 regulariz
ation. Nonlinear exponential regularization improved DSC 0.
535%, F1 score 0.549%, IOU 0.337%, Precision 0.549%, and
Recall 0.549% than linear combination of regularization in T
able 2.
• The above results confirm that the method of this pap
er is improved. However, experiments that reflect both
non-linear and linear features that were tested additio
nally showed lower performance than simply applying
non-linear features.
• The two features do not appear to be complementary
to each other, and the performance is lower by applyin
g stronger regulation.
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (11/14)
• Experimental Result (Cont.)
• Table 3 shows the result of the nonlin
ear exponential average moving usin
g linear coefficient.
• Three regularization methods are test
ed:
• No regularization, linear combination, a
nd exponential moving average (EMA)
with linear combination.
• Exponential moving average has lowe
r performance among all methods w
hen comparing the average of indexe
s.
• Compared with no regularization fro
m our proposal, it has a 0.14% decre
ase in both DSC and F1 score, 0.28%
decrease in precision, and recall.
Table 3. Comparative test for verification of nonlinear exp
onential average moving in the VOC dataset. a) loss, b) lo
ss with linear combination of L1 and L2 regularization usi
ng linear coefficient, c) loss with exponential moving aver
age linear combination of L1 and L2 regularization using l
inear coefficient, i) FCN [5], ii) U-Net [6], iii) Deep lab v3
[7]
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (12/14)
• Experimental Result (Cont.)
• Performance
• Compared with no regularization,
EMA has a 0.41% decrease in bot
h DSC and F1 score, 0.54% decre
ase in precision and recall, and 0.
13% decrease in IOU.
• On the other hand, EMA has an i
ncrease in loss with 9.84%. When
compared with linear combinatio
n, EMA has the same value as lin
ear on IOU, and less value on DS
C, F1 score, loss, precision, recall,
which decrease rate is 0.14%, 0.1
4%, 0.74%, 0.14%, and 0.14% res
pectively
Table 4. Comparative test for verification of nonlinear ex
ponential average moving in ATR dataset [9] a) loss, b) l
oss with combination of L1 and L2 regularization using l
inear coefficient, c) loss with exponential moving averag
e linear combination of L1 and L2 regularization using li
near coefficient, i) FCN [6], ii) U-Net [7], iii) Deep labv3 [
8]
Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (13/14)
• Experimental Result (Cont.)
• We experiment with the exponential movi
ng average combination of L1 and L2 reg
ularization for semantic segmentation in T
able 5.
• We can see that nonlinear exponential mo
ving average combination lower the perfo
rmance. Reflecting exponential moving av
erage through recurrent method efficiently
move them to optimal in the model.
• The experimental method implanted the e
xperiment in batch size 2 because the larg
er batch size is limited to work on due to
the out of memory in our desktop device.
Some cases turned out to get better perfo
rmance but normally get lower performan
ce because of the smaller batch size [9].
• We is confirmed that nonliear exponenti
al regularization is simpler than comple
x EMA.
Table 5. Comparative test for verification of ours experiment in
an average of ATR dataset [8] and VOC [5] dataset a) loss, b) lo
ss with combined of L1 and L2 regularization using convex coef
ficient, c) loss with nonlinear exponential regularization using c
onvex coefficient, d) loss with exponential moving average regu
larization using convex coefficient
Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (14/14)
• Conclusion
• We propose the nonlinear exponential regularization of L1 and L2 for term
ed exponential regularization.
• Also, an exponential moving average regularization experiment was condu
cted. We can see that a nonlinear combination improves performance.
• Because it was confirmed that the nonlinear features helped the model to
go to the optimal that the model has efficiently.
• We experimented with nonlinear exponential regularization with fixed refle
ction and exponential average moving regularization with dynamic reflectio
n in the process of making regularization.
• In the future, we want to make sure that the hybrid regularization process i
s effectively reflected in the optimization process rather than the fixed and
dynamic regularization process.
• Also we will be conducted to the optimal in consideration of general chara
cteristics.
Future Works and Ending
• We analyze complexity for regularizaiton method
• We verify about additional idea.
• We analyze how the characteristics of exponential affect the d
eep learning model.
• Before June 2020, We will improve the journals if possible an
d prepare the results.
Thank you for listening to the presentation.
If you have any additional questions, please email us.
jcn99250@naver.com
References
[1] I. V. Tetko, D. J. Livingstone, and A.I. Luik, “Neural Network studies. 1. Comparison of Overfitting and Overtraining,” Journal of Chemical Informati
on and Modeling, vol. 35, no. 5, 1995, pp.826-833.
[2] A. Y. Ng. “Feature selection, L1 vs. L2 regularization, and rotational invariance,” In ICML, 2004.
[3] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Generative adversarial nets,”In NIPS, 2014.
[4] X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, and S. P. Smolley, “Least Squares Generative Adversarial Networks,” arXiv:1611.04076, 2016.
[5] J. Long, E. Shelhamer, and T. Darrell, “Fully Conovolutional Networks for Semantic Segmentation,” arXiv:1411.4037, 2014.
[6] O. Ronneberger, P. Fishcher, and T. Brox, “U-Net : Convolutional Networks for Biomedical Image Segmentation,” arXiv:1505.04597, 2015.
[7] L.C.Chen, Y.Zhu, G.Papandreou, F.schroff, and H.Adam, “Encoder- Decoder with Atrous Separable Convolution for Semantic Image Segmentation”,
arXiv:1802.02611, 2018
[8] X.Liang, S.Liu, X.Shen, J.Y ang, L.Liu, J.Dong, L.Lin, and S.Y an, “Deep human parsing with active template regression”, IEEE transactions on patter
n analysis and machine intelligence, vol.37, no.12, pp.2402-2414, 2015.
[9] F. JIA, J. LIU and X.-C. TAI, “A Regularized Convolutional Neural N etwork for Semantic Image Segmentation”, arXiv:1907.05287, 2019.

More Related Content

PDF
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
PDF
Classification of handwritten characters by their symmetry features
PDF
Proximal Policy Optimization Algorithms, Schulman et al, 2017
PDF
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
PDF
Trust Region Policy Optimization, Schulman et al, 2015
PPTX
Linear Programming Problem
PDF
An Inclusive Analysis on Various Image Enhancement Techniques
PPTX
Graphical RepresentationLinear programming
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Classification of handwritten characters by their symmetry features
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
Linear Programming Problem
An Inclusive Analysis on Various Image Enhancement Techniques
Graphical RepresentationLinear programming

What's hot (20)

PDF
Ijetcas14 546
PDF
Performance on Image Segmentation Resulting In Canny and MoG
PDF
Interpolation Technique using Non Linear Partial Differential Equation with E...
PDF
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
PPTX
fuzzy LBP for face recognition ppt
PDF
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
PDF
ALPHA-ROOTING COLOR IMAGE ENHANCEMENT METHOD BY TWO-SIDE 2-D QUATERNION DISCR...
PDF
Contrast enhancement using various statistical operations and neighborhood pr...
PPTX
Branch and Bound Feature Selection for Hyperspectral Image Classification
PDF
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
PDF
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
PDF
Dr azimifar pattern recognition lect2
PDF
FinalReportFoxMelle
PDF
C04302025030
PDF
Image Inpainting
PDF
A comprehensive method for image contrast enhancement based on global –local ...
DOC
352735327 rsh-qam11-tif-04-doc
PDF
Study on Contrast Enhancement with the help of Associate Regions Histogram Eq...
Ijetcas14 546
Performance on Image Segmentation Resulting In Canny and MoG
Interpolation Technique using Non Linear Partial Differential Equation with E...
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
fuzzy LBP for face recognition ppt
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
ALPHA-ROOTING COLOR IMAGE ENHANCEMENT METHOD BY TWO-SIDE 2-D QUATERNION DISCR...
Contrast enhancement using various statistical operations and neighborhood pr...
Branch and Bound Feature Selection for Hyperspectral Image Classification
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
Dr azimifar pattern recognition lect2
FinalReportFoxMelle
C04302025030
Image Inpainting
A comprehensive method for image contrast enhancement based on global –local ...
352735327 rsh-qam11-tif-04-doc
Study on Contrast Enhancement with the help of Associate Regions Histogram Eq...
Ad

Similar to Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model (20)

PDF
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
PDF
07 regularization
PDF
Neural Network Part-2
PPTX
Regularization in deep learning
PPTX
1 d,2d laplace inversion of lr nmr
PPTX
PRML 5.5
PDF
Presentation of master thesis
PPTX
Learning sparse Neural Networks using L0 Regularization
PDF
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
PDF
lec3_annotated.pdf ml csci 567 vatsal sharan
PDF
[PR12] Spectral Normalization for Generative Adversarial Networks
PDF
Deep Feed Forward Neural Networks and Regularization
PPTX
Linear Regression in machine learning.pptx
PDF
consistency regularization for generative adversarial networks_review
PPTX
lec0734523532453425324523452345245432.pptx
PPTX
linear regression1.pptx machine learning
PPTX
Unit-4 PART-4 Overfitting.pptx
PDF
Regularization Methods to Solve
PPTX
lecture 9 pdddddddddddddddddssdsdnn.pptx
PDF
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
07 regularization
Neural Network Part-2
Regularization in deep learning
1 d,2d laplace inversion of lr nmr
PRML 5.5
Presentation of master thesis
Learning sparse Neural Networks using L0 Regularization
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
lec3_annotated.pdf ml csci 567 vatsal sharan
[PR12] Spectral Normalization for Generative Adversarial Networks
Deep Feed Forward Neural Networks and Regularization
Linear Regression in machine learning.pptx
consistency regularization for generative adversarial networks_review
lec0734523532453425324523452345245432.pptx
linear regression1.pptx machine learning
Unit-4 PART-4 Overfitting.pptx
Regularization Methods to Solve
lecture 9 pdddddddddddddddddssdsdnn.pptx
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Ad

More from Seoung-Ho Choi (20)

PDF
Seoung-Ho Choi Introduction to medical deep learning
PDF
Seungho Choi Introduction to deep learning solutions
PPTX
Ensemble normalization for stable training
PPTX
To classify Alzheimer’s Disease from 3D structural MRI data
PDF
Middle school winter science garden participation certificate
PDF
Elementary school winter model aircraft school certificate
PDF
Elementary school youth science exploration contest silver prize
PDF
Elementary school youth science contest silver prize
PDF
Middle school creativity problem solving ability contest encouragement prize
PDF
Middle school youth science exploration contest bronze prize
PDF
Elementary school minister of science and technology award
PDF
Elementary school completion certificate Jung-gu education center for the gifted
PDF
Encouragement award in Korean Information Science Society for Undergraduate S...
PDF
Best paper in Korean Communication Society
PDF
PS(Personal Statement) Korean
PDF
PS(Personal Statement) English
PPTX
A Study on the Importance of Adaptive Seed Value Exploration
PPTX
Bi-activation Function : an Enhanced Version of an Activation Function in C...
PPTX
Visualization Techniques for Outlier data
PPTX
Gpt1 and 2 model review
Seoung-Ho Choi Introduction to medical deep learning
Seungho Choi Introduction to deep learning solutions
Ensemble normalization for stable training
To classify Alzheimer’s Disease from 3D structural MRI data
Middle school winter science garden participation certificate
Elementary school winter model aircraft school certificate
Elementary school youth science exploration contest silver prize
Elementary school youth science contest silver prize
Middle school creativity problem solving ability contest encouragement prize
Middle school youth science exploration contest bronze prize
Elementary school minister of science and technology award
Elementary school completion certificate Jung-gu education center for the gifted
Encouragement award in Korean Information Science Society for Undergraduate S...
Best paper in Korean Communication Society
PS(Personal Statement) Korean
PS(Personal Statement) English
A Study on the Importance of Adaptive Seed Value Exploration
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Visualization Techniques for Outlier data
Gpt1 and 2 model review

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Machine Learning_overview_presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Assigned Numbers - 2025 - Bluetooth® Document
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
A comparative analysis of optical character recognition models for extracting...
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Machine Learning_overview_presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model

  • 1. Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model Hansung University Master course 4 𝑡ℎ Seoung-Ho Choi
  • 2. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (1/14) • Goal : • Let’s make a better and new regularization • Problem : • Exisiting linear combination of L1 and L2 regularization are not reflected w ell about gradient for deep learning model optimization • The properties from the combination of L1 and L2 seem to reflect as unhelpful features in the process of optimization • Contribution : • We proposed new nonlinear exponential regularization • We experimented two task experiment, additional experiment.
  • 3. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (2/14) • Related Works • Well-known regularizations for deep learning model optimization are L1 an d L2 regularization [2]. • L1 regularization • L1 regularization is the absolute value of the error between the actual and predicte d values. • L1 regularization to be used as feature selection • L1 regularization is suitable for sparse coding • L2 regularization • L2 regularization [2] is defined as the sum of squares of errors. • Comparison of L1 and L2 regularization • L1 regularization is more robust for outlier than L2 regularization because L2 is cal culated by the square of the error [2]. This keeps low features and finds better pre dictions.
  • 4. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (3/14) • Proposed Method • Eq. (1) is the nonlinear exponential regularization • n is the class number, m is the total sum of n, 𝑦_true is the result of the ground truth, and 𝑦_pred is th e model prediction result. We use two terms for regularization. One is the average of absolute of the difference between the model predictions and the model ground truth to make robustness to the outli ers of the predicted values in Eq (1) • The other is the average of the square of the difference between the model predictions and the model ground truth. We combined each regularization term through the exponential function. • We used L1 regularization as a scaling term and L2 regularization as a phase term of the exponential f unction. • We can also use L2 regularization as the scale term and L1 regularization as the phase term of the exp onential function. However, we did not experiment with L2 regularization as the scale term because we only wanted to reflect the magnitude through the absolute value. • Consequently, we conduct the nonlinear combinations of regularization and apply the exponential oper ation to combinations of existing regularization such as L1 and L2 regularization. (1)
  • 5. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (4/14) • Proposed Method (Cont.) • Figure 1 shows the visualization of the e ffect of nonlinear exponential regularizat ion on the L1 in terms of weight. • Nonlinear exponential regularization res ults reduce the complexity of the weight by giving the scaling effect of L1 regularization. • By lowering the complexity of the weigh ts, the gradient in model moves efficient ly to suboptimal optimization. Figure 1. Effect of Ours Proposal for Suboptimal Training Path Gradients, a) L1 regularization b) Nonlinear Exponential Regula rization (Our proposal)
  • 6. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (5/14) • Proposed Method (Cont.) • A is a coefficient of L1 regularization. • B is a coefficient of L2 regularization. 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔ Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + 𝐵 × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 (2) 𝑁𝑜𝑛𝑙𝑖𝑛𝑒𝑎𝑟 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 ≔ Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 e^( 1 - A ) × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑐𝑜𝑛𝑣𝑒𝑥 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔ Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + ((1 − 𝐴) × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛) (3) (4)
  • 7. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (6/14) • Experimental Setting • (Verification) The experimental method to verify the proposed regularizatio n is as follows: a) L1 regularization, b) L2 regularization, c) linear combinati on of L1 and L2 regularization, d) nonlinear exponential combination of L1 and L2 regularization. • (Direct effect) To verify regularization effect, we experiment with four regul arization coefficients, e.g.) 0.0, 0.25, 0.5, and 0.75. • (Side effect) We experimented on three-loss (Cross entropy, Focal loss, and Hinge loss) using Adam with Vanilla-GAN [3] and LSGAN [4] • (Evalutation) We checked the performance of the methods using MSE (Mea n Square Error) and PSNR (Peak Signal to Noise Ratio) • (Comfirm different domain for generalizaiton technique) We experimented 2 task. (i. e. generation and segmentation), We described experimented par ameter in paper.
  • 8. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (7/14) • Experimental Result • (Generation task) Figure 2 shows the comparative analysis of propos ed nonlinear exponential regularization and linear regularization. • The proposed method shows faster convergence and lower error val ues than the conventional method. This can be seen that the nonline ar characteristics help to converge faster. Figure 2. Comparative analysis of proposed methods to help fast con vergence in training a) Vanilla-GAN, b) LSGAN, i) Cifar10, and ii) Cifar
  • 9. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (8/14) • Experimental Result (Cont.) • To clarify this, we extracted the top 5 values from the experimental r esults as the 5 lowest values in the MSE and the highest values in th e PSNR in Table 1. • We use the average of the percentage improvement for each experi ment. Our nonlinear exponential regularization shows lower MSE of 0.71 and better PSNR of 0.0275 than the linearly combined regulariz ation of L1 and L2 regularization • We have found that finding the optimal point from our proposal can produce a generated image with better results from the generateon model Table 1. Experimental results using two datasets on two models: top 5 values of PSNR and MSE
  • 10. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (9/14) • Experimental Result (Cont.) • Figure 3 shows generated images by linearly combined regularization and nonlinear exponential regularization. • Proposed regularization generates better images than the linearly combine d regularization. • This may be because proposed exponential regularization finds more opti mal of gradients by focusing more important loss. • In the image with complex and complex color, the nonliear expotential regularize- ation makes the clustering between pixels more efficient than the conventional linear method. Figure 3. Generated images by a) linearly combined regularizati on and b) nonlinear exponential regularization
  • 11. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (10/14) Table 2. Comparison of our proposal in two models, a) loss, b) loss with linear combination of L1 and L2 regularization, c) loss with nonlinear exponential reg ularization of L1 and L2 regularization, d) loss with nonlinear exponential regularization of L1 and L2 re gularization and linear regularization of L1 and L2 r egularization • Experimental Result (Cont.) • (Segmentation) We conducted additional linear and no nlinear combination experiments to determine the effe cts of the combination of nonlinear and linear features. • Performance • Nonlinear exponential regularization improved DSC 0.06%, F1 score 0.06%, IOU 0%, Precision 0.06%, and Recall 0.06% t han origin. Nonlinear exponential regularization improved D SC 0.34%, F1 score 0.34%, IOU 0.19%, Precision 0.34%, and Recall 0.34% than linear combination of L1 and L2 regulariz ation. Nonlinear exponential regularization improved DSC 0. 535%, F1 score 0.549%, IOU 0.337%, Precision 0.549%, and Recall 0.549% than linear combination of regularization in T able 2. • The above results confirm that the method of this pap er is improved. However, experiments that reflect both non-linear and linear features that were tested additio nally showed lower performance than simply applying non-linear features. • The two features do not appear to be complementary to each other, and the performance is lower by applyin g stronger regulation.
  • 12. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (11/14) • Experimental Result (Cont.) • Table 3 shows the result of the nonlin ear exponential average moving usin g linear coefficient. • Three regularization methods are test ed: • No regularization, linear combination, a nd exponential moving average (EMA) with linear combination. • Exponential moving average has lowe r performance among all methods w hen comparing the average of indexe s. • Compared with no regularization fro m our proposal, it has a 0.14% decre ase in both DSC and F1 score, 0.28% decrease in precision, and recall. Table 3. Comparative test for verification of nonlinear exp onential average moving in the VOC dataset. a) loss, b) lo ss with linear combination of L1 and L2 regularization usi ng linear coefficient, c) loss with exponential moving aver age linear combination of L1 and L2 regularization using l inear coefficient, i) FCN [5], ii) U-Net [6], iii) Deep lab v3 [7]
  • 13. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (12/14) • Experimental Result (Cont.) • Performance • Compared with no regularization, EMA has a 0.41% decrease in bot h DSC and F1 score, 0.54% decre ase in precision and recall, and 0. 13% decrease in IOU. • On the other hand, EMA has an i ncrease in loss with 9.84%. When compared with linear combinatio n, EMA has the same value as lin ear on IOU, and less value on DS C, F1 score, loss, precision, recall, which decrease rate is 0.14%, 0.1 4%, 0.74%, 0.14%, and 0.14% res pectively Table 4. Comparative test for verification of nonlinear ex ponential average moving in ATR dataset [9] a) loss, b) l oss with combination of L1 and L2 regularization using l inear coefficient, c) loss with exponential moving averag e linear combination of L1 and L2 regularization using li near coefficient, i) FCN [6], ii) U-Net [7], iii) Deep labv3 [ 8]
  • 14. Nonlinear Exponential Regularization : An I mproved Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (13/14) • Experimental Result (Cont.) • We experiment with the exponential movi ng average combination of L1 and L2 reg ularization for semantic segmentation in T able 5. • We can see that nonlinear exponential mo ving average combination lower the perfo rmance. Reflecting exponential moving av erage through recurrent method efficiently move them to optimal in the model. • The experimental method implanted the e xperiment in batch size 2 because the larg er batch size is limited to work on due to the out of memory in our desktop device. Some cases turned out to get better perfo rmance but normally get lower performan ce because of the smaller batch size [9]. • We is confirmed that nonliear exponenti al regularization is simpler than comple x EMA. Table 5. Comparative test for verification of ours experiment in an average of ATR dataset [8] and VOC [5] dataset a) loss, b) lo ss with combined of L1 and L2 regularization using convex coef ficient, c) loss with nonlinear exponential regularization using c onvex coefficient, d) loss with exponential moving average regu larization using convex coefficient
  • 15. Nonlinear Exponential Regularization : An Improv ed Version of Regularization for Deep Learning Model (S.-H. Choi, 2020) (14/14) • Conclusion • We propose the nonlinear exponential regularization of L1 and L2 for term ed exponential regularization. • Also, an exponential moving average regularization experiment was condu cted. We can see that a nonlinear combination improves performance. • Because it was confirmed that the nonlinear features helped the model to go to the optimal that the model has efficiently. • We experimented with nonlinear exponential regularization with fixed refle ction and exponential average moving regularization with dynamic reflectio n in the process of making regularization. • In the future, we want to make sure that the hybrid regularization process i s effectively reflected in the optimization process rather than the fixed and dynamic regularization process. • Also we will be conducted to the optimal in consideration of general chara cteristics.
  • 16. Future Works and Ending • We analyze complexity for regularizaiton method • We verify about additional idea. • We analyze how the characteristics of exponential affect the d eep learning model. • Before June 2020, We will improve the journals if possible an d prepare the results. Thank you for listening to the presentation. If you have any additional questions, please email us. jcn99250@naver.com
  • 17. References [1] I. V. Tetko, D. J. Livingstone, and A.I. Luik, “Neural Network studies. 1. Comparison of Overfitting and Overtraining,” Journal of Chemical Informati on and Modeling, vol. 35, no. 5, 1995, pp.826-833. [2] A. Y. Ng. “Feature selection, L1 vs. L2 regularization, and rotational invariance,” In ICML, 2004. [3] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Generative adversarial nets,”In NIPS, 2014. [4] X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, and S. P. Smolley, “Least Squares Generative Adversarial Networks,” arXiv:1611.04076, 2016. [5] J. Long, E. Shelhamer, and T. Darrell, “Fully Conovolutional Networks for Semantic Segmentation,” arXiv:1411.4037, 2014. [6] O. Ronneberger, P. Fishcher, and T. Brox, “U-Net : Convolutional Networks for Biomedical Image Segmentation,” arXiv:1505.04597, 2015. [7] L.C.Chen, Y.Zhu, G.Papandreou, F.schroff, and H.Adam, “Encoder- Decoder with Atrous Separable Convolution for Semantic Image Segmentation”, arXiv:1802.02611, 2018 [8] X.Liang, S.Liu, X.Shen, J.Y ang, L.Liu, J.Dong, L.Lin, and S.Y an, “Deep human parsing with active template regression”, IEEE transactions on patter n analysis and machine intelligence, vol.37, no.12, pp.2402-2414, 2015. [9] F. JIA, J. LIU and X.-C. TAI, “A Regularized Convolutional Neural N etwork for Semantic Image Segmentation”, arXiv:1907.05287, 2019.