SlideShare a Scribd company logo
9. Neural Networks - Learning
 COST FUNCTION:
Cost function in Neural Networks in just a generalization of Logistic
regression:
With regularization term included.
We denote hΘ(x)k as being a hypothesis that results in the kth
output.
For Neural Networks:
BACKPROPOGATION ALGORITHM: to calculate the gradient of cost
function to minimize it
Computing gradient:
x,y are vectors
1. First, we do the forward propagation:
2.Next, we do back propagation:
j = a particular unit in layer L
g’ = g-prime = derivative of g(z) wrt z
we don’t calculate δ for 1st layer as it’s the input layer and thus
it has no errors.
➢First, we calculate δ for all units of o/p layer
➢then for all other layers in backwards order
➢we don’t calculate δ for 1st
layer
➢then using all δ’s we calculate Δ for all layers
➢then using Δ we calculate D for all layers . D=derivative of J(Θ)
wrt Θ of layer l)
j=0 => corresponds to bias unit in layer l
Summary:
"error values" for the last layer are simply the differences of our
actual results in the last layer and the correct outputs in y
BACKPROPPOGATION: INTUTION:
MATRIX vs VECTORS DURING IMPLEMENTATION:
Matrices are useful when doing forward and backward prorogation
Vectors are useful when using advanced optimization algo like
fminunc()
Fminunc assume the Θ passed as argument is a vector and the
gradient which the cost fxn returns is also a vector
But, original Θ and gradient are matrices: so we need to unroll
them into vectors
Example: Binary Classification
To unroll into vectors:
GRADIENT CHECKING:
Octave code:
RANDOM INITIALIZATION:
Θ = 0 doesn’t work in Neural Networks:
When we backpropagate, all nodes will update to the same value
repeatedly. Instead we can randomly initialize our weights.
This ε is different form the one used in gradient checking.
Doing this will give a good variation in values of Θ and the J(Θ) will
be best minimized.
PUTTING IT TOGETHER:
First, pick a network architecture; choose the layout of your neural
network, including how many hidden units in each layer and how
many layers in total you want to have.
Number of hidden units per layer → usually more the better (must
balance with cost of computation as it increases with more hidden
units)
Defaults: 1 hidden layer. If you have more than 1 hidden layer, then
it is recommended that you have the same number of units in every
hidden layer.

More Related Content

PDF
5 octave tutorial
PDF
4 linear regeression with multiple variables
PDF
8 neural network representation
PDF
12 support vector machines
PDF
2 linear regression with one variable
PDF
7 regularization
PDF
13 unsupervised learning clustering
PDF
6 logistic regression classification algo
5 octave tutorial
4 linear regeression with multiple variables
8 neural network representation
12 support vector machines
2 linear regression with one variable
7 regularization
13 unsupervised learning clustering
6 logistic regression classification algo

What's hot (20)

PDF
14 dimentionality reduction
PDF
15 anomaly detection
PDF
17 large scale machine learning
PDF
10 advice for applying ml
PPT
Calc 2.1
PDF
Random number generator
PPTX
Teknik Simulasi
ODP
U6 Cn2 Definite Integrals Intro
PPTX
Electrical Engineering Assignment Help
PDF
Graph Representation
PPTX
Algorithms
PDF
Firefly exact MCMC for Big Data
PPTX
Komunikasi digital minggu 2
PPTX
Random number generation
PDF
Quick Sort , Merge Sort , Heap Sort
PPTX
31A WePrep Presentation
PPT
simplex method
PPTX
Merge sort
PPTX
Fourier Transform Assignment Help
14 dimentionality reduction
15 anomaly detection
17 large scale machine learning
10 advice for applying ml
Calc 2.1
Random number generator
Teknik Simulasi
U6 Cn2 Definite Integrals Intro
Electrical Engineering Assignment Help
Graph Representation
Algorithms
Firefly exact MCMC for Big Data
Komunikasi digital minggu 2
Random number generation
Quick Sort , Merge Sort , Heap Sort
31A WePrep Presentation
simplex method
Merge sort
Fourier Transform Assignment Help
Ad

Similar to 9 neural network learning (20)

PPTX
Illustrative Introductory Neural Networks
PPTX
Neural Networks - How do they work?
PDF
Neural Networks on Steroids (Poster)
PPTX
PPTX
Multilayer & Back propagation algorithm
PPT
Ann
PPTX
ML_ Unit 2_Part_B
PDF
Recurrent and Recursive Networks (Part 1)
PPTX
Deep learning (2)
PPT
Lec 6-bp
PPTX
DYNAMIC PROGRAMMING AND GREEDY TECHNIQUE
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
PPSX
Perceptron (neural network)
PDF
International Journal of Engineering Research and Development (IJERD)
PPT
ai7.ppt
PDF
Machine Learning 1
PPTX
Deep learning: Mathematical Perspective
Illustrative Introductory Neural Networks
Neural Networks - How do they work?
Neural Networks on Steroids (Poster)
Multilayer & Back propagation algorithm
Ann
ML_ Unit 2_Part_B
Recurrent and Recursive Networks (Part 1)
Deep learning (2)
Lec 6-bp
DYNAMIC PROGRAMMING AND GREEDY TECHNIQUE
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Perceptron (neural network)
International Journal of Engineering Research and Development (IJERD)
ai7.ppt
Machine Learning 1
Deep learning: Mathematical Perspective
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
A comparative analysis of optical character recognition models for extracting...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Machine Learning_overview_presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

9 neural network learning

  • 1. 9. Neural Networks - Learning  COST FUNCTION: Cost function in Neural Networks in just a generalization of Logistic regression: With regularization term included. We denote hΘ(x)k as being a hypothesis that results in the kth output.
  • 2. For Neural Networks: BACKPROPOGATION ALGORITHM: to calculate the gradient of cost function to minimize it
  • 3. Computing gradient: x,y are vectors 1. First, we do the forward propagation:
  • 4. 2.Next, we do back propagation: j = a particular unit in layer L g’ = g-prime = derivative of g(z) wrt z we don’t calculate δ for 1st layer as it’s the input layer and thus it has no errors. ➢First, we calculate δ for all units of o/p layer ➢then for all other layers in backwards order ➢we don’t calculate δ for 1st layer ➢then using all δ’s we calculate Δ for all layers ➢then using Δ we calculate D for all layers . D=derivative of J(Θ) wrt Θ of layer l)
  • 5. j=0 => corresponds to bias unit in layer l Summary: "error values" for the last layer are simply the differences of our actual results in the last layer and the correct outputs in y
  • 7. MATRIX vs VECTORS DURING IMPLEMENTATION: Matrices are useful when doing forward and backward prorogation Vectors are useful when using advanced optimization algo like fminunc() Fminunc assume the Θ passed as argument is a vector and the gradient which the cost fxn returns is also a vector
  • 8. But, original Θ and gradient are matrices: so we need to unroll them into vectors Example: Binary Classification To unroll into vectors:
  • 11. RANDOM INITIALIZATION: Θ = 0 doesn’t work in Neural Networks: When we backpropagate, all nodes will update to the same value repeatedly. Instead we can randomly initialize our weights.
  • 12. This ε is different form the one used in gradient checking. Doing this will give a good variation in values of Θ and the J(Θ) will be best minimized. PUTTING IT TOGETHER: First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.
  • 13. Number of hidden units per layer → usually more the better (must balance with cost of computation as it increases with more hidden units) Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.