SlideShare a Scribd company logo
2
Most read
10
Most read
16
Most read
Deep Learning
Activation Functions
Rakshith
Walk Through
• Mathematical Functions
• Types of functions
• Activation function
• Laws of activation function
• Types of Activation functions
• Limitations of activation function
What is Function ?
Which takes some input and munch on it and
generate some output
relates an input to an output
f(x) = 𝑥2
f(q) = 1 - q + q2
h(A) = 1 - A + A2
w(θ) = 1 - θ + θ2
Function
special rules:
•It must work for every possible input value
•And it has only one relationship for each input value
x
1 3 8
𝑥2
1 9 16
1 3 8
1 9 16
The relationship x → x2
Kinds of Functions
A. Function With Discontinuities
Consider the function
Factoring the denominator gives:
We observe that the function is not defined for x=0 and x=1.
Here is the graph of the function.
We see that small changes in x near 0 (and near 1) produce large changes in the value of the function.
We say the function is discontinuous when x = 0 and x = 1.
plot(2/(x(x-1))
B. Continuous Functions
• Consider the graph of f(x) = x3 − 6x2 − x + 30 a continuous graph.
• We can see that there are no "gaps" in the curve. Any value of x will give us a corresponding value of y. We could continue
the graph in the negative and positive directions, and we would never need to take the pencil off the paper.
• Such functions are called continuous functions.
Activation function | Transfer Function
NN without activation function !
• A Neural Network without Activation function would simply be a Linear regression Model
• Linear function is simple polynomial of one degree
• Linear regression are easy to solve but they have less power to learn complex functional mappings from data.
• We want our Neural Network to not just learn and compute a linear function but something more complicated than that
• i.e complicated kinds of data such as images, videos , audio , speech etc.
Why activation function ?
• we need to apply a Activation function f(x) so as to make the network more powerful
• Add ability to it to learn something complex and complicated from the data
• And represent non-linear complex arbitrary functional mappings between inputs and outputs.
Output of one function is fed as input to next function so it can be approximated to function
composition
Picture 1
Picture 2
LAWS of Activation function
Monotonic : A function is said to be monotonically increasing if X it’s corresponding Y also
increases by some unit or remain constant for some time and then increases , monotonically decreasing
function is vice versa it
Monotonic increasing Monotonic decreasing Non monotonic
Continues : A continues curve without any gape
Easily differentiable : It should be smooth “continues” curve and differentiable at every point .
Non continues .Curves can not be differentiated
Popular Activation functions
Sigmoid Function Range 0 -1
Differentiation Range 0 -0.25
Tanh Function Range -1 + 1
Differentiation Range 0 -1
Rectified liner unit Range 0 , x
Differentiations range 0 or 1
[tan 45 = 1]
Liki Rectified liner unit Range -x , x
Differentiations range -x or 1
Continued………..
Additional info : https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html
Continued………..
Weight update function
• Since we random initialize the weights we end up with some error.
• Now adjust your weights by differentiating your pervious output
• until you reach minimum error
Example : simple weight update
F(x) = x2 -3x + 2
df/dx = 0
df/dx = 2x – 3 + 0 = 0
x = 1.5 is this maxima or minima ?
put 1.5 in x2 – 3x + 2
f(1.5) = -0.25
f(1) = 0
Standard weight update functions
• SGD
• Min batch SGD
• SGD with momentum
• Ada grad
• Ada delta and RMS prop
• Adam [Adaptive momentum estimate ]Heart of gradient decent is chain rule in differentiation
When tangent is horizontal to x axis you may be at minima or maxima
Vanishing Gradient in Sigmoid and Tanh functions
Dated from 1980 - 2012 people tried 2 - 3 layer neural networks
• Biggest problem faced is vanishing gradient [mathematical problem]
• Too little labelled data
• computation powers
History of activation function
We can get stuck in local minima or at saddle points
How to avoid vanishing gradient ?
-- Rectified linear units ReLu
• Partial derivation of relu is either 0 or 1
• Since max value is 1 there is no problem of exploding gradient and
again values are not in between 0 -1 there is no vanishing gradient
• Relu function converge faster then other function because value is
either 1 or 0
• But it may lead into Dead activation
• If z is negative f(z) = 0
• df/dz = 0
• If one of the derivative is zero then complete = Dead activation
• Sigmoid and tanh may subjected to dead activation when you have
very high negative values

More Related Content

PPTX
Activation functions
PPTX
Activation function
PPTX
Activation functions and Training Algorithms for Deep Neural network
PDF
Deep Feed Forward Neural Networks and Regularization
PDF
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
PPT
Intro to Deep learning - Autoencoders
PDF
Lecture 2: Artificial Neural Network
Activation functions
Activation function
Activation functions and Training Algorithms for Deep Neural network
Deep Feed Forward Neural Networks and Regularization
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Intro to Deep learning - Autoencoders
Lecture 2: Artificial Neural Network

What's hot (20)

PDF
Convolutional Neural Network Models - Deep Learning
PDF
Convolutional Neural Networks (CNN)
PDF
Deep Learning - Convolutional Neural Networks
PPTX
Feedforward neural network
PPTX
Deep learning
PPTX
Deep neural networks
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PPTX
Recurrent Neural Network
PDF
Machine Learning: Introduction to Neural Networks
PPTX
Convolutional Neural Networks
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
Introduction to Recurrent Neural Network
PDF
Convolutional neural network
PPTX
Deep learning
PPTX
Recurrent neural network
PPTX
Convolution Neural Network (CNN)
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
1.Introduction to deep learning
PPTX
Perceptron & Neural Networks
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Networks (CNN)
Deep Learning - Convolutional Neural Networks
Feedforward neural network
Deep learning
Deep neural networks
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network
Machine Learning: Introduction to Neural Networks
Convolutional Neural Networks
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Introduction to Recurrent Neural Network
Convolutional neural network
Deep learning
Recurrent neural network
Convolution Neural Network (CNN)
Artificial Neural Networks Lect3: Neural Network Learning rules
1.Introduction to deep learning
Perceptron & Neural Networks
Ad

Similar to Activation function (20)

PPTX
ANN Lec 5 Activation functions In deep learning.pptx
PDF
NNAF_DRK.pdf
PPTX
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
PPTX
Activation Function.pptx
PDF
What are activation functions and why do we need those.pdf
PDF
6_1_course-notes-deep-nets-overview.pdf
PPTX
Neural Network_basic_Reza_Lecture_3.pptx
PPTX
Activation_function.pptx
PPTX
Lecture02_Updated_Shallow Neural Networks.pptx
PDF
APznzaY3lgBI1_4Em3SL-evNrENZMqxOBcBfS-4LCLWoo0zOcyDIrRDXBMqCS3EPzJW34i1165zNf...
PPTX
MACHINE LEARNING NEURAL NETWORK PPT UNIT 4
PPTX
Neural Networks and its related Concepts
PDF
Deep learning
PPTX
Introduction to deep Learning Fundamentals
PPTX
Introduction to Neural Networks and its application
PPTX
V2.0 open power ai virtual university deep learning and ai introduction
PDF
How machine learning is changing the world
PDF
What is activation function in the context of neural network?
PPTX
UNIT IV NEURAL NETWORKS - Multilayer perceptron
ANN Lec 5 Activation functions In deep learning.pptx
NNAF_DRK.pdf
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
Activation Function.pptx
What are activation functions and why do we need those.pdf
6_1_course-notes-deep-nets-overview.pdf
Neural Network_basic_Reza_Lecture_3.pptx
Activation_function.pptx
Lecture02_Updated_Shallow Neural Networks.pptx
APznzaY3lgBI1_4Em3SL-evNrENZMqxOBcBfS-4LCLWoo0zOcyDIrRDXBMqCS3EPzJW34i1165zNf...
MACHINE LEARNING NEURAL NETWORK PPT UNIT 4
Neural Networks and its related Concepts
Deep learning
Introduction to deep Learning Fundamentals
Introduction to Neural Networks and its application
V2.0 open power ai virtual university deep learning and ai introduction
How machine learning is changing the world
What is activation function in the context of neural network?
UNIT IV NEURAL NETWORKS - Multilayer perceptron
Ad

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Database Infoormation System (DBIS).pptx
Introduction to Knowledge Engineering Part 1
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Foundation of Data Science unit number two notes
Launch Your Data Science Career in Kochi – 2025
Business Ppt On Nestle.pptx huunnnhhgfvu

Activation function

  • 2. Walk Through • Mathematical Functions • Types of functions • Activation function • Laws of activation function • Types of Activation functions • Limitations of activation function
  • 3. What is Function ? Which takes some input and munch on it and generate some output relates an input to an output f(x) = 𝑥2 f(q) = 1 - q + q2 h(A) = 1 - A + A2 w(θ) = 1 - θ + θ2 Function
  • 4. special rules: •It must work for every possible input value •And it has only one relationship for each input value x 1 3 8 𝑥2 1 9 16 1 3 8 1 9 16 The relationship x → x2
  • 5. Kinds of Functions A. Function With Discontinuities Consider the function Factoring the denominator gives: We observe that the function is not defined for x=0 and x=1. Here is the graph of the function. We see that small changes in x near 0 (and near 1) produce large changes in the value of the function. We say the function is discontinuous when x = 0 and x = 1. plot(2/(x(x-1))
  • 6. B. Continuous Functions • Consider the graph of f(x) = x3 − 6x2 − x + 30 a continuous graph. • We can see that there are no "gaps" in the curve. Any value of x will give us a corresponding value of y. We could continue the graph in the negative and positive directions, and we would never need to take the pencil off the paper. • Such functions are called continuous functions.
  • 7. Activation function | Transfer Function
  • 8. NN without activation function ! • A Neural Network without Activation function would simply be a Linear regression Model • Linear function is simple polynomial of one degree • Linear regression are easy to solve but they have less power to learn complex functional mappings from data. • We want our Neural Network to not just learn and compute a linear function but something more complicated than that • i.e complicated kinds of data such as images, videos , audio , speech etc.
  • 9. Why activation function ? • we need to apply a Activation function f(x) so as to make the network more powerful • Add ability to it to learn something complex and complicated from the data • And represent non-linear complex arbitrary functional mappings between inputs and outputs. Output of one function is fed as input to next function so it can be approximated to function composition Picture 1 Picture 2
  • 10. LAWS of Activation function Monotonic : A function is said to be monotonically increasing if X it’s corresponding Y also increases by some unit or remain constant for some time and then increases , monotonically decreasing function is vice versa it Monotonic increasing Monotonic decreasing Non monotonic Continues : A continues curve without any gape Easily differentiable : It should be smooth “continues” curve and differentiable at every point . Non continues .Curves can not be differentiated
  • 11. Popular Activation functions Sigmoid Function Range 0 -1 Differentiation Range 0 -0.25 Tanh Function Range -1 + 1 Differentiation Range 0 -1
  • 12. Rectified liner unit Range 0 , x Differentiations range 0 or 1 [tan 45 = 1] Liki Rectified liner unit Range -x , x Differentiations range -x or 1 Continued………..
  • 13. Additional info : https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html Continued………..
  • 14. Weight update function • Since we random initialize the weights we end up with some error. • Now adjust your weights by differentiating your pervious output • until you reach minimum error Example : simple weight update F(x) = x2 -3x + 2 df/dx = 0 df/dx = 2x – 3 + 0 = 0 x = 1.5 is this maxima or minima ? put 1.5 in x2 – 3x + 2 f(1.5) = -0.25 f(1) = 0 Standard weight update functions • SGD • Min batch SGD • SGD with momentum • Ada grad • Ada delta and RMS prop • Adam [Adaptive momentum estimate ]Heart of gradient decent is chain rule in differentiation When tangent is horizontal to x axis you may be at minima or maxima
  • 15. Vanishing Gradient in Sigmoid and Tanh functions Dated from 1980 - 2012 people tried 2 - 3 layer neural networks • Biggest problem faced is vanishing gradient [mathematical problem] • Too little labelled data • computation powers History of activation function We can get stuck in local minima or at saddle points
  • 16. How to avoid vanishing gradient ? -- Rectified linear units ReLu • Partial derivation of relu is either 0 or 1 • Since max value is 1 there is no problem of exploding gradient and again values are not in between 0 -1 there is no vanishing gradient • Relu function converge faster then other function because value is either 1 or 0 • But it may lead into Dead activation • If z is negative f(z) = 0 • df/dz = 0 • If one of the derivative is zero then complete = Dead activation • Sigmoid and tanh may subjected to dead activation when you have very high negative values