9 neural network learning

9. Neural Networks - Learning
 COST FUNCTION:
Cost function in Neural Networks in just a generalization of Logistic
regression:
With regularization term included.
We denote hΘ(x)k as being a hypothesis that results in the kth
output.

For Neural Networks:
BACKPROPOGATION ALGORITHM: to calculate the gradient of cost
function to minimize it

Computing gradient:
x,y are vectors
1. First, we do the forward propagation:

2.Next, we do back propagation:
j = a particular unit in layer L
g’ = g-prime = derivative of g(z) wrt z
we don’t calculate δ for 1st layer as it’s the input layer and thus
it has no errors.
➢First, we calculate δ for all units of o/p layer
➢then for all other layers in backwards order
➢we don’t calculate δ for 1st
layer
➢then using all δ’s we calculate Δ for all layers
➢then using Δ we calculate D for all layers . D=derivative of J(Θ)
wrt Θ of layer l)

j=0 => corresponds to bias unit in layer l
Summary:
"error values" for the last layer are simply the differences of our
actual results in the last layer and the correct outputs in y

MATRIX vs VECTORS DURING IMPLEMENTATION:
Matrices are useful when doing forward and backward prorogation
Vectors are useful when using advanced optimization algo like
fminunc()
Fminunc assume the Θ passed as argument is a vector and the
gradient which the cost fxn returns is also a vector

But, original Θ and gradient are matrices: so we need to unroll
them into vectors
Example: Binary Classification
To unroll into vectors:

RANDOM INITIALIZATION:
Θ = 0 doesn’t work in Neural Networks:
When we backpropagate, all nodes will update to the same value
repeatedly. Instead we can randomly initialize our weights.

This ε is different form the one used in gradient checking.
Doing this will give a good variation in values of Θ and the J(Θ) will
be best minimized.
PUTTING IT TOGETHER:
First, pick a network architecture; choose the layout of your neural
network, including how many hidden units in each layer and how
many layers in total you want to have.

Number of hidden units per layer → usually more the better (must
balance with cost of computation as it increases with more hidden
units)
Defaults: 1 hidden layer. If you have more than 1 hidden layer, then
it is recommended that you have the same number of units in every
hidden layer.

9 neural network learning

More Related Content

What's hot (20)

Similar to 9 neural network learning (20)

Recently uploaded (20)

9 neural network learning