Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model

Nonlinear Exponential Regularization :
An Improved Version of Regularization
for Deep Learning Model
Hansung University
Master course 4 𝑡ℎ
Seoung-Ho Choi

Nonlinear Exponential Regularization : An Improv
ed Version of Regularization for Deep Learning
Model (S.-H. Choi, 2020) (1/14)
• Goal :
• Let’s make a better and new regularization
• Problem :
• Exisiting linear combination of L1 and L2 regularization are not reflected w
ell about gradient for deep learning model optimization
• The properties from the combination of L1 and L2 seem to reflect as
unhelpful features in the process of optimization
• Contribution :
• We proposed new nonlinear exponential regularization
• We experimented two task experiment, additional experiment.

Model (S.-H. Choi, 2020) (2/14)
• Related Works
• Well-known regularizations for deep learning model optimization are L1 an
d L2 regularization [2].
• L1 regularization
• L1 regularization is the absolute value of the error between the actual and predicte
d values.
• L1 regularization to be used as feature selection
• L1 regularization is suitable for sparse coding
• L2 regularization
• L2 regularization [2] is defined as the sum of squares of errors.
• Comparison of L1 and L2 regularization
• L1 regularization is more robust for outlier than L2 regularization because L2 is cal
culated by the square of the error [2]. This keeps low features and finds better pre
dictions.

Model (S.-H. Choi, 2020) (3/14)
• Proposed Method
• Eq. (1) is the nonlinear exponential regularization
• n is the class number, m is the total sum of n, 𝑦_true is the result of the ground truth, and 𝑦_pred is th
e model prediction result. We use two terms for regularization. One is the average of absolute of the
difference between the model predictions and the model ground truth to make robustness to the outli
ers of the predicted values in Eq (1)
• The other is the average of the square of the difference between the model predictions and the model
ground truth. We combined each regularization term through the exponential function.
• We used L1 regularization as a scaling term and L2 regularization as a phase term of the exponential f
unction.
• We can also use L2 regularization as the scale term and L1 regularization as the phase term of the exp
onential function. However, we did not experiment with L2 regularization as the scale term because we
only wanted to reflect the magnitude through the absolute value.
• Consequently, we conduct the nonlinear combinations of regularization and apply the exponential oper
ation to combinations of existing regularization such as L1 and L2 regularization.
(1)

Nonlinear Exponential Regularization : An I
mproved Version of Regularization for Deep
Learning Model (S.-H. Choi, 2020) (4/14)
• Proposed Method (Cont.)
• Figure 1 shows the visualization of the e
ffect of nonlinear exponential regularizat
ion on the L1 in terms of weight.
• Nonlinear exponential regularization res
ults reduce the complexity of the weight
by giving the scaling effect of L1
regularization.
• By lowering the complexity of the weigh
ts, the gradient in model moves efficient
ly to suboptimal optimization.
Figure 1. Effect of Ours Proposal for Suboptimal Training Path
Gradients, a) L1 regularization b) Nonlinear Exponential Regula
rization (Our proposal)

• Proposed Method (Cont.)
• A is a coefficient of L1 regularization.
• B is a coefficient of L2 regularization.
𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑙𝑖𝑛𝑒𝑎𝑟 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + 𝐵 × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
(2)
𝑁𝑜𝑛𝑙𝑖𝑛𝑒𝑎𝑟 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 e^( 1 - A ) × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙 𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑤𝑖𝑡ℎ 𝑐𝑜𝑛𝑣𝑒𝑥 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ≔
Step 1. A × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - (1 - A) × L1 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 - B × L2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛
Step 2. 𝐴 × 𝑆𝑡𝑒𝑝1 + ((1 − 𝐴) × 𝐿2 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛)
(3)
(4)

Model (S.-H. Choi, 2020) (6/14)
• Experimental Setting
• (Verification) The experimental method to verify the proposed regularizatio
n is as follows: a) L1 regularization, b) L2 regularization, c) linear combinati
on of L1 and L2 regularization, d) nonlinear exponential combination of L1
and L2 regularization.
• (Direct effect) To verify regularization effect, we experiment with four regul
arization coefficients, e.g.) 0.0, 0.25, 0.5, and 0.75.
• (Side effect) We experimented on three-loss (Cross entropy, Focal loss, and
Hinge loss) using Adam with Vanilla-GAN [3] and LSGAN [4]
• (Evalutation) We checked the performance of the methods using MSE (Mea
n Square Error) and PSNR (Peak Signal to Noise Ratio)
• (Comfirm different domain for generalizaiton technique) We experimented
2 task. (i. e. generation and segmentation), We described experimented par
ameter in paper.

Model (S.-H. Choi, 2020) (7/14)
• Experimental Result
• (Generation task) Figure 2 shows the comparative analysis of propos
ed nonlinear exponential regularization and linear regularization.
• The proposed method shows faster convergence and lower error val
ues than the conventional method. This can be seen that the nonline
ar characteristics help to
converge faster.
Figure 2. Comparative analysis of proposed methods to help fast con
vergence in training a) Vanilla-GAN, b) LSGAN, i) Cifar10, and ii) Cifar

Model (S.-H. Choi, 2020) (8/14)
• Experimental Result (Cont.)
• To clarify this, we extracted the top 5 values from the experimental r
esults as the 5 lowest values in the MSE and the highest values in th
e PSNR in Table 1.
• We use the average of the percentage improvement for each experi
ment. Our nonlinear exponential regularization shows lower MSE of
0.71 and better PSNR of 0.0275 than the linearly combined regulariz
ation of L1 and L2 regularization
• We have found that finding the optimal
point from our proposal can produce a
generated image with better results from
the generateon model
Table 1. Experimental results using two datasets
on two models: top 5 values of PSNR and MSE

Model (S.-H. Choi, 2020) (9/14)
• Figure 3 shows generated images by linearly combined regularization and
nonlinear exponential regularization.
• Proposed regularization generates better images than the linearly combine
d regularization.
• This may be because proposed exponential regularization finds more opti
mal of gradients by focusing more important loss.
• In the image with complex and complex
color, the nonliear expotential regularize-
ation makes the clustering between pixels
more efficient than the conventional linear
method.
Figure 3. Generated images by a) linearly combined regularizati
on and b) nonlinear exponential regularization

Table 2. Comparison of our proposal in two models,
a) loss, b) loss with linear combination of L1 and L2
regularization, c) loss with nonlinear exponential reg
ularization of L1 and L2 regularization, d) loss with
nonlinear exponential regularization of L1 and L2 re
gularization and linear regularization of L1 and L2 r
egularization
• (Segmentation) We conducted additional linear and no
nlinear combination experiments to determine the effe
cts of the combination of nonlinear and linear features.
• Performance
• Nonlinear exponential regularization improved DSC 0.06%,
F1 score 0.06%, IOU 0%, Precision 0.06%, and Recall 0.06% t
han origin. Nonlinear exponential regularization improved D
SC 0.34%, F1 score 0.34%, IOU 0.19%, Precision 0.34%, and
Recall 0.34% than linear combination of L1 and L2 regulariz
ation. Nonlinear exponential regularization improved DSC 0.
535%, F1 score 0.549%, IOU 0.337%, Precision 0.549%, and
Recall 0.549% than linear combination of regularization in T
able 2.
• The above results confirm that the method of this pap
er is improved. However, experiments that reflect both
non-linear and linear features that were tested additio
nally showed lower performance than simply applying
non-linear features.
• The two features do not appear to be complementary
to each other, and the performance is lower by applyin
g stronger regulation.

• Table 3 shows the result of the nonlin
ear exponential average moving usin
g linear coefficient.
• Three regularization methods are test
ed:
• No regularization, linear combination, a
nd exponential moving average (EMA)
with linear combination.
• Exponential moving average has lowe
r performance among all methods w
hen comparing the average of indexe
s.
• Compared with no regularization fro
m our proposal, it has a 0.14% decre
ase in both DSC and F1 score, 0.28%
decrease in precision, and recall.
Table 3. Comparative test for verification of nonlinear exp
onential average moving in the VOC dataset. a) loss, b) lo
ss with linear combination of L1 and L2 regularization usi
ng linear coefficient, c) loss with exponential moving aver
age linear combination of L1 and L2 regularization using l
inear coefficient, i) FCN [5], ii) U-Net [6], iii) Deep lab v3
[7]

• Performance
• Compared with no regularization,
EMA has a 0.41% decrease in bot
h DSC and F1 score, 0.54% decre
ase in precision and recall, and 0.
13% decrease in IOU.
• On the other hand, EMA has an i
ncrease in loss with 9.84%. When
compared with linear combinatio
n, EMA has the same value as lin
ear on IOU, and less value on DS
C, F1 score, loss, precision, recall,
which decrease rate is 0.14%, 0.1
4%, 0.74%, 0.14%, and 0.14% res
pectively
Table 4. Comparative test for verification of nonlinear ex
ponential average moving in ATR dataset [9] a) loss, b) l
oss with combination of L1 and L2 regularization using l
inear coefficient, c) loss with exponential moving averag
e linear combination of L1 and L2 regularization using li
near coefficient, i) FCN [6], ii) U-Net [7], iii) Deep labv3 [
8]

• We experiment with the exponential movi
ng average combination of L1 and L2 reg
ularization for semantic segmentation in T
able 5.
• We can see that nonlinear exponential mo
ving average combination lower the perfo
rmance. Reflecting exponential moving av
erage through recurrent method efficiently
move them to optimal in the model.
• The experimental method implanted the e
xperiment in batch size 2 because the larg
er batch size is limited to work on due to
the out of memory in our desktop device.
Some cases turned out to get better perfo
rmance but normally get lower performan
ce because of the smaller batch size [9].
• We is confirmed that nonliear exponenti
al regularization is simpler than comple
x EMA.
Table 5. Comparative test for verification of ours experiment in
an average of ATR dataset [8] and VOC [5] dataset a) loss, b) lo
ss with combined of L1 and L2 regularization using convex coef
ficient, c) loss with nonlinear exponential regularization using c
onvex coefficient, d) loss with exponential moving average regu
larization using convex coefficient

Model (S.-H. Choi, 2020) (14/14)
• Conclusion
• We propose the nonlinear exponential regularization of L1 and L2 for term
ed exponential regularization.
• Also, an exponential moving average regularization experiment was condu
cted. We can see that a nonlinear combination improves performance.
• Because it was confirmed that the nonlinear features helped the model to
go to the optimal that the model has efficiently.
• We experimented with nonlinear exponential regularization with fixed refle
ction and exponential average moving regularization with dynamic reflectio
n in the process of making regularization.
• In the future, we want to make sure that the hybrid regularization process i
s effectively reflected in the optimization process rather than the fixed and
dynamic regularization process.
• Also we will be conducted to the optimal in consideration of general chara
cteristics.

Future Works and Ending
• We analyze complexity for regularizaiton method
• We verify about additional idea.
• We analyze how the characteristics of exponential affect the d
eep learning model.
• Before June 2020, We will improve the journals if possible an
d prepare the results.
Thank you for listening to the presentation.
If you have any additional questions, please email us.
jcn99250@naver.com

References
[1] I. V. Tetko, D. J. Livingstone, and A.I. Luik, “Neural Network studies. 1. Comparison of Overfitting and Overtraining,” Journal of Chemical Informati
on and Modeling, vol. 35, no. 5, 1995, pp.826-833.
[2] A. Y. Ng. “Feature selection, L1 vs. L2 regularization, and rotational invariance,” In ICML, 2004.
[3] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Generative adversarial nets,”In NIPS, 2014.
[4] X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, and S. P. Smolley, “Least Squares Generative Adversarial Networks,” arXiv:1611.04076, 2016.
[5] J. Long, E. Shelhamer, and T. Darrell, “Fully Conovolutional Networks for Semantic Segmentation,” arXiv:1411.4037, 2014.
[6] O. Ronneberger, P. Fishcher, and T. Brox, “U-Net : Convolutional Networks for Biomedical Image Segmentation,” arXiv:1505.04597, 2015.
[7] L.C.Chen, Y.Zhu, G.Papandreou, F.schroff, and H.Adam, “Encoder- Decoder with Atrous Separable Convolution for Semantic Image Segmentation”,
arXiv:1802.02611, 2018
[8] X.Liang, S.Liu, X.Shen, J.Y ang, L.Liu, J.Dong, L.Lin, and S.Y an, “Deep human parsing with active template regression”, IEEE transactions on patter
n analysis and machine intelligence, vol.37, no.12, pp.2402-2414, 2015.
[9] F. JIA, J. LIU and X.-C. TAI, “A Regularized Convolutional Neural N etwork for Semantic Image Segmentation”, arXiv:1907.05287, 2019.

Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model

More Related Content

What's hot (20)

Similar to Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model (20)

More from Seoung-Ho Choi (20)

Recently uploaded (20)

Nonlinear Exponential Regularization : An Improved Version of Regularization for Deep Learning Model