Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network

Bi-activation Function :
an Enhanced Version of an Activation
Function in Convolutional Neural Network
Hansung University
Master course 4 𝑡ℎ
Seoung-Ho Choi and
Undergraduate course 8 𝑡ℎ
Kyoungyeon Kim
KICS-Winter 2020

Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
2
Goal : Let’s make a better and new activation function
Problem
• The existing activation function does not reflect information well. Because we se
e only one part of information, the complexity of information processing seems
to increase.
• The existing activation function does not reflect generalized characteristics in the
model because it does not reflect positive and negative partial information.
Contribution
• To alleviate this problem, we propose a bi-activation function.
• We experimented with CNN and typical datasets such as MNIST, Fashion MNIST,
CIFAR-10, and CIFAR-100. The Bi-activation function performed better than RELU
and ELU.

• An activation function determines how much information is delivered to
the next node for decision goal
• Some kinds of function are RELU [1] and eLU [2].
Figure 1. Example of activation function in neural network
a) Pos-activation, b) Neg-activation, c) Bi-activation
3
Bi-Activation Function : Related Works

Bi-Activation Function : Related Works
4
2. eLU
1. RELU
• Information near 0 cannot be differentiate
d and information below 0 cannot be refle
cted.
• Positive information is reflected one to one.
• Information near 0 can be differentiated an
d information below 0 is reflected.
• Positive information is an exponential reflec
tion.

Bi-Activation Function : Proposed Method
So, We propose a bi-activation function
• Figure 2 shows making process of a bi-activation function.
• The bi-activation function consists of two types:
• a pos-activation function and a neg-activation function.
• The pos-activation function is the same as the existing
activation function.
• The neg-activation function has vice versa.
• Effect of Bi-activation function
• CNNs reflect the nonlinearity of training data and
outputs of training data by using an activation
function to properly select the linear output area.
• However, the existing activation function has difficulty
in training the nonlinearity of training data because
it has the linear output area only when x >= 0.
• On the other hand, the bi-activation function is more
flexible than the activation function because it has the
neg-activation function. This flexibility causes it to be
able to learn bi-information.
5
Figure 2. Bi-activation function a) Pos-activation,
b) Neg-activation, c) Bi-activation

• We applied existing activation functions to
our proposal in Figure 3.
• We experimented to verify activation functions
that can stably reflect sparse information near 0.
• This can be a stable training through the stable
acquisition of fine-grained information.
• Bi-activation can cause information overlap.
However, the analysis of information overlap
did not proceed.
• We experimented four CNN in Figure 4.
• We have analyzed the effects of filtering using
activation functions as the amount of information
in the model's parameters changes.
• This method allows for precise filtering analysis
on the amount of information coming from
parameters in the model.
6
Figure 4. Experiment of CNN Sample : (a) CNN-Large,
(b) CNN-Middle, (c) CNN-Small, and (d) CNN-Little.
Bi-Activation Function : Experimental Setting
Figure 3. Experiment of activation function : (a) Existing
method, (b) Proposal method, (a).i RELU, (a).ii eLU,
(b).i bi-RELU, (b).ii bi-eLU.

• We use Log SoftMax
• The CNN used that is intended to see the effect of a large-margin model prediction effect u
sing Log SoftMax.
• When applying this log SoftMax, NLL Loss is generally applied. However, there is a problem
that a non-convex effect causes the NLL loss.
• Therefore, cross-entropy is applied to generate the convex effect of the model. This is becau
se the convex function creates a unique solution and can be easily solved using the gradient
method.
• Therefore, studies have been conducted to apply convex after applying cross-entropy to log
SoftMax [3].
• Various seed for filtering
• In addition, we compare the experiments using the
MNIST dataset with Seed 999, 500, and 1 to analyze
the influence of each filter number using Figure 5.
7
Figure 5. Experiment of two layer CNN
Bi-Activation Function : Experimental Setting

• Experimental results are to verify results
obtained from Table 1.
• The tendency of Table 1 is summarized
at the top of the page.
• The proposal method receives both pos
itive and negative information from the
activation function and outputs less err
or value and improved performance tha
n activation that seems to improve perf
ormance by making the feature a little
clearer by processing both positive and
negative information at the same time.
• The comparison of the same number of
filters showed that most of them impro
ved over a existing activation function. Table 1. Comparison of influence according to the number of CNN mod
el feature maps (a) CNN-Large, (b) CNN-Middle, (c) CNN-Small, (d) C
NN-Little.
RELU
eLU
Bi-RELU
Bi-eLU
Key point Filter number Performance
DecreaseLarge number
linear
feature
nonlinear
feature
Increase
Bi ploar
linear feature
Bi ploar non
linear feature
Decrese number
Decrese number Increase
Decrese number Increase
8
Bi-Activation Function : Experimental Result

• We found the importance of the change according
to random seed generation and tried to confirm th
e precision of filtering according to the amount of
change caused by random seed generation.
• Some of the reasons why exploration is important
to change are described in the poster session.
Table 4. Train / Test accuracy and loss error in C
NN-Small according to Seed 1 on CNN small.
Table 3. Train / Test accuracy and loss error
in CNN-Small according to Seed 500 on CN
N small.
Table 2. Train / Test accuracy and loss in CNN-Small
according to Seed 999 on CNN small.
9

• Figure 5 is x) small of CNN, y) two lay
er of CNN, a) Using MNIST dataset, b
) Using Fashion MNIST, A) Activation f
unction of RELU, B) Activation functio
n of eLU, C) Activation function of bi-
RELU, D) Activation function of bi-eL
U, i) Seed 1, ii) Seed 250, iii) Seed 50
0, iv) Seed 750, v) Seed 999
• Figure 5 shows more clustered plots
of ELU variance compared to RELU. T
his is because the nonlinear character
istics reflect more clustered results.
10Figure 5 Performance analysis of proposal methods.

Conclusion
• We propose an improved performance by the bi-activation function.
• The bi-activation function combines a pos-activation function and
a neg-activation function in a small number of parameter spaces.
• We verified the effect of the initialization method on the input of the
bi-directional information in a few parameters that show that bi-direc
tional information is a little bit better when it is nonlinear.
• This property of the proposed one is used effectively for optimization
or lightweight deep learning for edge device.
11

Thank you for listening to the presentation.
If you have any additional questions, please email us.
• jcn99250@naver.com
12

References
• [1] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricte
d Boltzmann Machines,” In ICML, 2010.
• [2] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate
Deep Network Learning by Exponential Linear Units,” arXiv:1511.0728
9, 2015.
• [3] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-Margin Softmax Loss fo
r Convolutional Neural Networks,” arXiv:1612.022954v4, 2017
13

Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network

More Related Content

What's hot (10)

Similar to Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network (20)

More from Seoung-Ho Choi (20)

Recently uploaded (20)

Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network