SlideShare a Scribd company logo
Bi-activation Function :
an Enhanced Version of an Activation
Function in Convolutional Neural Network
Hansung University
Master course 4 𝑡ℎ
Seoung-Ho Choi and
Undergraduate course 8 𝑡ℎ
Kyoungyeon Kim
KICS-Winter 2020
Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
2
Goal : Let’s make a better and new activation function
Problem
• The existing activation function does not reflect information well. Because we se
e only one part of information, the complexity of information processing seems
to increase.
• The existing activation function does not reflect generalized characteristics in the
model because it does not reflect positive and negative partial information.
Contribution
• To alleviate this problem, we propose a bi-activation function.
• We experimented with CNN and typical datasets such as MNIST, Fashion MNIST,
CIFAR-10, and CIFAR-100. The Bi-activation function performed better than RELU
and ELU.
• An activation function determines how much information is delivered to
the next node for decision goal
• Some kinds of function are RELU [1] and eLU [2].
Figure 1. Example of activation function in neural network
a) Pos-activation, b) Neg-activation, c) Bi-activation
3
Bi-Activation Function : Related Works
Bi-Activation Function : Related Works
4
2. eLU
1. RELU
• Information near 0 cannot be differentiate
d and information below 0 cannot be refle
cted.
• Positive information is reflected one to one.
• Information near 0 can be differentiated an
d information below 0 is reflected.
• Positive information is an exponential reflec
tion.
Bi-Activation Function : Proposed Method
So, We propose a bi-activation function
• Figure 2 shows making process of a bi-activation function.
• The bi-activation function consists of two types:
• a pos-activation function and a neg-activation function.
• The pos-activation function is the same as the existing
activation function.
• The neg-activation function has vice versa.
• Effect of Bi-activation function
• CNNs reflect the nonlinearity of training data and
outputs of training data by using an activation
function to properly select the linear output area.
• However, the existing activation function has difficulty
in training the nonlinearity of training data because
it has the linear output area only when x >= 0.
• On the other hand, the bi-activation function is more
flexible than the activation function because it has the
neg-activation function. This flexibility causes it to be
able to learn bi-information.
5
Figure 2. Bi-activation function a) Pos-activation,
b) Neg-activation, c) Bi-activation
• We applied existing activation functions to
our proposal in Figure 3.
• We experimented to verify activation functions
that can stably reflect sparse information near 0.
• This can be a stable training through the stable
acquisition of fine-grained information.
• Bi-activation can cause information overlap.
However, the analysis of information overlap
did not proceed.
• We experimented four CNN in Figure 4.
• We have analyzed the effects of filtering using
activation functions as the amount of information
in the model's parameters changes.
• This method allows for precise filtering analysis
on the amount of information coming from
parameters in the model.
6
Figure 4. Experiment of CNN Sample : (a) CNN-Large,
(b) CNN-Middle, (c) CNN-Small, and (d) CNN-Little.
Bi-Activation Function : Experimental Setting
Figure 3. Experiment of activation function : (a) Existing
method, (b) Proposal method, (a).i RELU, (a).ii eLU,
(b).i bi-RELU, (b).ii bi-eLU.
• We use Log SoftMax
• The CNN used that is intended to see the effect of a large-margin model prediction effect u
sing Log SoftMax.
• When applying this log SoftMax, NLL Loss is generally applied. However, there is a problem
that a non-convex effect causes the NLL loss.
• Therefore, cross-entropy is applied to generate the convex effect of the model. This is becau
se the convex function creates a unique solution and can be easily solved using the gradient
method.
• Therefore, studies have been conducted to apply convex after applying cross-entropy to log
SoftMax [3].
• Various seed for filtering
• In addition, we compare the experiments using the
MNIST dataset with Seed 999, 500, and 1 to analyze
the influence of each filter number using Figure 5.
7
Figure 5. Experiment of two layer CNN
Bi-Activation Function : Experimental Setting
• Experimental results are to verify results
obtained from Table 1.
• The tendency of Table 1 is summarized
at the top of the page.
• The proposal method receives both pos
itive and negative information from the
activation function and outputs less err
or value and improved performance tha
n activation that seems to improve perf
ormance by making the feature a little
clearer by processing both positive and
negative information at the same time.
• The comparison of the same number of
filters showed that most of them impro
ved over a existing activation function. Table 1. Comparison of influence according to the number of CNN mod
el feature maps (a) CNN-Large, (b) CNN-Middle, (c) CNN-Small, (d) C
NN-Little.
RELU
eLU
Bi-RELU
Bi-eLU
Key point Filter number Performance
DecreaseLarge number
linear
feature
nonlinear
feature
Increase
Bi ploar
linear feature
Bi ploar non
linear feature
Decrese number
Decrese number Increase
Decrese number Increase
8
Bi-Activation Function : Experimental Result
• We found the importance of the change according
to random seed generation and tried to confirm th
e precision of filtering according to the amount of
change caused by random seed generation.
• Some of the reasons why exploration is important
to change are described in the poster session.
Table 4. Train / Test accuracy and loss error in C
NN-Small according to Seed 1 on CNN small.
Table 3. Train / Test accuracy and loss error
in CNN-Small according to Seed 500 on CN
N small.
Table 2. Train / Test accuracy and loss in CNN-Small
according to Seed 999 on CNN small.
9
Bi-Activation Function : Experimental Result
• Figure 5 is x) small of CNN, y) two lay
er of CNN, a) Using MNIST dataset, b
) Using Fashion MNIST, A) Activation f
unction of RELU, B) Activation functio
n of eLU, C) Activation function of bi-
RELU, D) Activation function of bi-eL
U, i) Seed 1, ii) Seed 250, iii) Seed 50
0, iv) Seed 750, v) Seed 999
• Figure 5 shows more clustered plots
of ELU variance compared to RELU. T
his is because the nonlinear character
istics reflect more clustered results.
10Figure 5 Performance analysis of proposal methods.
Bi-Activation Function : Experimental Result
Conclusion
• We propose an improved performance by the bi-activation function.
• The bi-activation function combines a pos-activation function and
a neg-activation function in a small number of parameter spaces.
• We verified the effect of the initialization method on the input of the
bi-directional information in a few parameters that show that bi-direc
tional information is a little bit better when it is nonlinear.
• This property of the proposed one is used effectively for optimization
or lightweight deep learning for edge device.
11
Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
Thank you for listening to the presentation.
If you have any additional questions, please email us.
• jcn99250@naver.com
12
Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
References
• [1] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricte
d Boltzmann Machines,” In ICML, 2010.
• [2] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate
Deep Network Learning by Exponential Linear Units,” arXiv:1511.0728
9, 2015.
• [3] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-Margin Softmax Loss fo
r Convolutional Neural Networks,” arXiv:1612.022954v4, 2017
13
Bi-Activation Function (S.-H. Choi and K. Kim, 2020)

More Related Content

PDF
Emerging Properties in Self-Supervised Vision Transformers
PDF
Segmentation of Images by using Fuzzy k-means clustering with ACO
PPTX
Ensemble normalization for stable training
PDF
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
PDF
PPTX
Exploring Simple Siamese Representation Learning
PDF
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
PDF
Mlp mixer image_process_210613 deeplearning paper review!
Emerging Properties in Self-Supervised Vision Transformers
Segmentation of Images by using Fuzzy k-means clustering with ACO
Ensemble normalization for stable training
A STUDY OF METHODS FOR TRAINING WITH DIFFERENT DATASETS IN IMAGE CLASSIFICATION
Exploring Simple Siamese Representation Learning
IRJET- A Review on Data Dependent Label Distribution Learning for Age Estimat...
Mlp mixer image_process_210613 deeplearning paper review!

What's hot (10)

PDF
Comparision of Clustering Algorithms usingNeural Network Classifier for Satel...
PDF
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
PDF
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
PDF
D0361034037
PPTX
Sigmoid function machine learning made simple
PPT
face recognition system using LBP
PDF
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
PDF
Change Detection from Remotely Sensed Images Based on Stationary Wavelet Tran...
PPTX
Activation function
PDF
Facial recognition using modified local binary pattern and random forest
Comparision of Clustering Algorithms usingNeural Network Classifier for Satel...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
D0361034037
Sigmoid function machine learning made simple
face recognition system using LBP
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
Change Detection from Remotely Sensed Images Based on Stationary Wavelet Tran...
Activation function
Facial recognition using modified local binary pattern and random forest
Ad

Similar to Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network (20)

PPTX
addressing tim/quality trade-off in view maintenance
PDF
Presentation of master thesis
PDF
Performance Analysis of Various Activation Functions in Generalized MLP Archi...
PPTX
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
PPTX
Lecture02_Updated_Shallow Neural Networks.pptx
PDF
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
PPTX
Convolutional neural networks 이론과 응용
PPTX
Introduction to deep Learning Fundamentals
PPTX
Introduction to deep Learning Fundamentals
PPTX
Activation Function.pptx
PPTX
Inhibitory Control in Task Switching
PPTX
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
PPTX
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
PPTX
14_cnn complete.pptx
PPTX
SBSI optimization tutorial
PPTX
cnn ppt.pptx
PPTX
ANN Lec 5 Activation functions In deep learning.pptx
PDF
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
PPTX
Functional Data Analysis
PDF
ImageNet Classification with Deep Convolutional Neural Networks
addressing tim/quality trade-off in view maintenance
Presentation of master thesis
Performance Analysis of Various Activation Functions in Generalized MLP Archi...
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
Lecture02_Updated_Shallow Neural Networks.pptx
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Convolutional neural networks 이론과 응용
Introduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
Activation Function.pptx
Inhibitory Control in Task Switching
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
14_cnn complete.pptx
SBSI optimization tutorial
cnn ppt.pptx
ANN Lec 5 Activation functions In deep learning.pptx
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
Functional Data Analysis
ImageNet Classification with Deep Convolutional Neural Networks
Ad

More from Seoung-Ho Choi (20)

PDF
Seoung-Ho Choi Introduction to medical deep learning
PDF
Seungho Choi Introduction to deep learning solutions
PPTX
To classify Alzheimer’s Disease from 3D structural MRI data
PDF
Middle school winter science garden participation certificate
PDF
Elementary school winter model aircraft school certificate
PDF
Elementary school youth science exploration contest silver prize
PDF
Elementary school youth science contest silver prize
PDF
Middle school creativity problem solving ability contest encouragement prize
PDF
Middle school youth science exploration contest bronze prize
PDF
Elementary school minister of science and technology award
PDF
Elementary school completion certificate Jung-gu education center for the gifted
PDF
Encouragement award in Korean Information Science Society for Undergraduate S...
PDF
Best paper in Korean Communication Society
PDF
PS(Personal Statement) Korean
PDF
PS(Personal Statement) English
PPTX
A Study on the Importance of Adaptive Seed Value Exploration
PPTX
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
PPTX
Visualization Techniques for Outlier data
PPTX
Gpt1 and 2 model review
PPTX
Unsupervised learning for real-world super-resolution review presentation
Seoung-Ho Choi Introduction to medical deep learning
Seungho Choi Introduction to deep learning solutions
To classify Alzheimer’s Disease from 3D structural MRI data
Middle school winter science garden participation certificate
Elementary school winter model aircraft school certificate
Elementary school youth science exploration contest silver prize
Elementary school youth science contest silver prize
Middle school creativity problem solving ability contest encouragement prize
Middle school youth science exploration contest bronze prize
Elementary school minister of science and technology award
Elementary school completion certificate Jung-gu education center for the gifted
Encouragement award in Korean Information Science Society for Undergraduate S...
Best paper in Korean Communication Society
PS(Personal Statement) Korean
PS(Personal Statement) English
A Study on the Importance of Adaptive Seed Value Exploration
Nonlinear Exponential Regularization : An Improved Version of Regularization ...
Visualization Techniques for Outlier data
Gpt1 and 2 model review
Unsupervised learning for real-world super-resolution review presentation

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
sap open course for s4hana steps from ECC to s4
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
sap open course for s4hana steps from ECC to s4
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf

Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network

  • 1. Bi-activation Function : an Enhanced Version of an Activation Function in Convolutional Neural Network Hansung University Master course 4 𝑡ℎ Seoung-Ho Choi and Undergraduate course 8 𝑡ℎ Kyoungyeon Kim KICS-Winter 2020
  • 2. Bi-Activation Function (S.-H. Choi and K. Kim, 2020) 2 Goal : Let’s make a better and new activation function Problem • The existing activation function does not reflect information well. Because we se e only one part of information, the complexity of information processing seems to increase. • The existing activation function does not reflect generalized characteristics in the model because it does not reflect positive and negative partial information. Contribution • To alleviate this problem, we propose a bi-activation function. • We experimented with CNN and typical datasets such as MNIST, Fashion MNIST, CIFAR-10, and CIFAR-100. The Bi-activation function performed better than RELU and ELU.
  • 3. • An activation function determines how much information is delivered to the next node for decision goal • Some kinds of function are RELU [1] and eLU [2]. Figure 1. Example of activation function in neural network a) Pos-activation, b) Neg-activation, c) Bi-activation 3 Bi-Activation Function : Related Works
  • 4. Bi-Activation Function : Related Works 4 2. eLU 1. RELU • Information near 0 cannot be differentiate d and information below 0 cannot be refle cted. • Positive information is reflected one to one. • Information near 0 can be differentiated an d information below 0 is reflected. • Positive information is an exponential reflec tion.
  • 5. Bi-Activation Function : Proposed Method So, We propose a bi-activation function • Figure 2 shows making process of a bi-activation function. • The bi-activation function consists of two types: • a pos-activation function and a neg-activation function. • The pos-activation function is the same as the existing activation function. • The neg-activation function has vice versa. • Effect of Bi-activation function • CNNs reflect the nonlinearity of training data and outputs of training data by using an activation function to properly select the linear output area. • However, the existing activation function has difficulty in training the nonlinearity of training data because it has the linear output area only when x >= 0. • On the other hand, the bi-activation function is more flexible than the activation function because it has the neg-activation function. This flexibility causes it to be able to learn bi-information. 5 Figure 2. Bi-activation function a) Pos-activation, b) Neg-activation, c) Bi-activation
  • 6. • We applied existing activation functions to our proposal in Figure 3. • We experimented to verify activation functions that can stably reflect sparse information near 0. • This can be a stable training through the stable acquisition of fine-grained information. • Bi-activation can cause information overlap. However, the analysis of information overlap did not proceed. • We experimented four CNN in Figure 4. • We have analyzed the effects of filtering using activation functions as the amount of information in the model's parameters changes. • This method allows for precise filtering analysis on the amount of information coming from parameters in the model. 6 Figure 4. Experiment of CNN Sample : (a) CNN-Large, (b) CNN-Middle, (c) CNN-Small, and (d) CNN-Little. Bi-Activation Function : Experimental Setting Figure 3. Experiment of activation function : (a) Existing method, (b) Proposal method, (a).i RELU, (a).ii eLU, (b).i bi-RELU, (b).ii bi-eLU.
  • 7. • We use Log SoftMax • The CNN used that is intended to see the effect of a large-margin model prediction effect u sing Log SoftMax. • When applying this log SoftMax, NLL Loss is generally applied. However, there is a problem that a non-convex effect causes the NLL loss. • Therefore, cross-entropy is applied to generate the convex effect of the model. This is becau se the convex function creates a unique solution and can be easily solved using the gradient method. • Therefore, studies have been conducted to apply convex after applying cross-entropy to log SoftMax [3]. • Various seed for filtering • In addition, we compare the experiments using the MNIST dataset with Seed 999, 500, and 1 to analyze the influence of each filter number using Figure 5. 7 Figure 5. Experiment of two layer CNN Bi-Activation Function : Experimental Setting
  • 8. • Experimental results are to verify results obtained from Table 1. • The tendency of Table 1 is summarized at the top of the page. • The proposal method receives both pos itive and negative information from the activation function and outputs less err or value and improved performance tha n activation that seems to improve perf ormance by making the feature a little clearer by processing both positive and negative information at the same time. • The comparison of the same number of filters showed that most of them impro ved over a existing activation function. Table 1. Comparison of influence according to the number of CNN mod el feature maps (a) CNN-Large, (b) CNN-Middle, (c) CNN-Small, (d) C NN-Little. RELU eLU Bi-RELU Bi-eLU Key point Filter number Performance DecreaseLarge number linear feature nonlinear feature Increase Bi ploar linear feature Bi ploar non linear feature Decrese number Decrese number Increase Decrese number Increase 8 Bi-Activation Function : Experimental Result
  • 9. • We found the importance of the change according to random seed generation and tried to confirm th e precision of filtering according to the amount of change caused by random seed generation. • Some of the reasons why exploration is important to change are described in the poster session. Table 4. Train / Test accuracy and loss error in C NN-Small according to Seed 1 on CNN small. Table 3. Train / Test accuracy and loss error in CNN-Small according to Seed 500 on CN N small. Table 2. Train / Test accuracy and loss in CNN-Small according to Seed 999 on CNN small. 9 Bi-Activation Function : Experimental Result
  • 10. • Figure 5 is x) small of CNN, y) two lay er of CNN, a) Using MNIST dataset, b ) Using Fashion MNIST, A) Activation f unction of RELU, B) Activation functio n of eLU, C) Activation function of bi- RELU, D) Activation function of bi-eL U, i) Seed 1, ii) Seed 250, iii) Seed 50 0, iv) Seed 750, v) Seed 999 • Figure 5 shows more clustered plots of ELU variance compared to RELU. T his is because the nonlinear character istics reflect more clustered results. 10Figure 5 Performance analysis of proposal methods. Bi-Activation Function : Experimental Result
  • 11. Conclusion • We propose an improved performance by the bi-activation function. • The bi-activation function combines a pos-activation function and a neg-activation function in a small number of parameter spaces. • We verified the effect of the initialization method on the input of the bi-directional information in a few parameters that show that bi-direc tional information is a little bit better when it is nonlinear. • This property of the proposed one is used effectively for optimization or lightweight deep learning for edge device. 11 Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
  • 12. Thank you for listening to the presentation. If you have any additional questions, please email us. • jcn99250@naver.com 12 Bi-Activation Function (S.-H. Choi and K. Kim, 2020)
  • 13. References • [1] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricte d Boltzmann Machines,” In ICML, 2010. • [2] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units,” arXiv:1511.0728 9, 2015. • [3] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-Margin Softmax Loss fo r Convolutional Neural Networks,” arXiv:1612.022954v4, 2017 13 Bi-Activation Function (S.-H. Choi and K. Kim, 2020)