Neural Networks
Ahmed
𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑠 ∈ 𝑆𝑢𝑝𝑒𝑟𝑣𝑖𝑠𝑒𝑑 𝑀𝑎𝑐ℎ𝑖𝑛𝑒 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔
History
Neural Networks Architectures
Standard Artificial
Neural Networks
ANN
Convolutional Neural
Networks
CNN
Recurrent Neural
Networks
RNN
Perceptron
z
𝑧 =
𝑖=0
𝑛
𝑤𝑖 𝑥𝑖 = 𝑊 ∗ 𝑋
• z = w0*1 + w1*x1+ w2*x2 + wn*xn
Perceptron
z 𝑎 = 𝑔(𝑧)g(z)
a
Activation functions and non linearity
Tanh(z) Relu(z)σ(z)
. g(z)
. g(z)
x1
x2
x3
𝑤11
𝑤12
𝑤21
𝑤22
𝑤32
𝑤31
𝑧1
𝑧2
𝑎1
𝑎2
. Relu(z)
. Relu(z)
1
2
3
0.5
0.4
0.5
0.4
0.4
0.5
𝑧1
1
= 0.5 ∗ 1 + 0.5 ∗ 2 + 0.5 ∗ 3 = 3
3
2.4 2.4
𝑧2
1
= 0.4 ∗ 1 + 0.4 ∗ 2 + 0.4 ∗ 3 = 2.4
𝑎2
1
= Relu(2.4)=2.4
𝑎1
1
= Relu(3)=3
3
. Relu(z)
. Relu(z)
1
2
3
0.5
0.4
0.5
0.4
0.4
0.5
3
2.4
3
2.4
Relu(z).
2.5 2.5
𝑦
0.2
0.8
𝑧1
2
= 0.2 ∗ 3 + 0.8 ∗ 2.4 = 2.52
𝑎1
2
= Relu(2.52)=2.52
Forward propagation
𝑍 = 𝑊 ∗ 𝑋
𝐴 = 𝑔(𝑍)
𝐴 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟′
𝑠 input
𝑍 = 𝑊 ∗ 𝐴
Learning
z
g(z)
a
Learning problem
• 𝑇ℎ𝑒 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 ′𝑎′
• 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 ′𝑦′
• 𝑇ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 ‘𝐽’ 𝑖𝑠 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 ‘𝑎’ 𝑎𝑛𝑑 ‘𝑦’ 𝑓𝑜𝑟 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
J(a,y) = (𝑎 − 𝑦)2
• How to minimise the error?
change w
• How to find w?
Optimisation: Gradient Descent?
• w = w- α
𝑑𝐽
𝑑𝑤
Problem statement
Error in output (J), how to change W?
How to find
𝜕𝐽
𝜕𝑤
? Back propagation?
𝜕𝑎
𝜕𝑧
= 𝑔′
𝑧
𝜕𝑧
𝜕𝑤
= 𝑥
𝜕𝐽
𝜕𝑤
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝜕𝑧
𝜕𝑤
w z a J
𝜕𝐽
𝜕𝑤
=
𝜕𝐽
𝜕𝑎
∗ 𝑔′
𝑧 ∗ 𝑥
x
Chain Rule:
g'()
How to find
𝜕𝐽
𝜕𝑥
? Back propagation?
𝜕𝑎
𝜕𝑧
= 𝑔′
𝑧
𝜕𝑧
𝜕𝑥
= 𝑤
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝜕𝑧
𝜕𝑥
x z a J
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
∗ 𝑔′
𝑧 ∗ 𝑤
w
Chain Rule:
g'()
Update parameters
• 𝑤 𝑛𝑒𝑤 = w- α
𝑑𝐽
𝑑𝑤
• Assume α=0.01
•
𝜕𝐽
𝜕𝑤
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑥 = -88.24*𝑥
• w1_new = 0.1- 0.01*(-88.24)*1= 0.1+0.9=1
• w2_new = 0.2- 0.01*(-88.24)*2.5= 2.4
• w3_new = 0.4- 0.01*(-88.24)*2.9= 2.9
0.1
0.2 1.76
0.4
1.76
Obtained
Output
2.5
2.9
2.5
2.9
1.76
1
L2
Correct value is 90
Obtained value is 1.76
Error = - 88.24
Distribute the penalty to previous neurons
•
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑤 = -88.24*w
•
𝜕𝐽
𝜕𝑥2
= -88.24*0.2= -17.6
•
𝜕𝐽
𝜕𝑥3
= -88.24*0.4= -35.3
𝛛𝐉
𝛛𝐱𝟐
= -17.6
𝝏𝑱
𝝏𝒙𝟑
= -35.3
Error = -88.
0.1
0.2 1.76
0.4
1.76
Obtained
Output
2.5
2.9
2.5
2.9
1.76
1
L2
𝛛𝐉
𝛛𝒂
= -88.24
Feed Forward
Z= W*X
a= g(z)
J= Cost function
Feed backward
dJ/da = (y-a) or calculated
da/dz = g'(z)
dJ/dz = (dJ/da)*(da/dz)
dJ/dx = (dJ/dz)*W
dJ/dw = (dJ/dz)*X
dJ/dx dz/dx da/dz dJ/da
wi g'(z) …
w11
w12 dJ/da1
w21
w22
w31 dJ/da2
w32
z2 a2
x2
x3
x1
z1 a1
dz/dw
x
Summary
How about other search and optimisation
methods?
Forward
propagation
calculate error
Back
propagation
Update
parameters
Learning the price of a flat in Al weibdeh
• Description:
• Ground Floor? Yes
• 2 bathrooms
• 3 bedrooms
• The price is 90 K JoD.
[1,2,3]-> -> 90NN
ANN: 3 inputs, 1 output, 2 hidden Layers
0.1
0.0
0.5 0.0
0.4 0.2 0.3
0.5 0.8
0.4 0.8
-0.1
0.5 0.2
0.4
0.6
3
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
0.6
1 1
1
2
L2L1
0.1
0.0
0.5 0.0
0.4 0.2 0.2
0.5 0.8
0.4 0.8
0.4
0.5 0.2
0.4
1.76
3
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1.76
1 1
1
2
Initialisation
Relu
Relu
Relu
Relu
Relu
. Relu(z)
. Relu(z)
1
2
3
0.5
0.4
0.5
0.4
0.4
0.5
𝑧1
1
= 0.5 ∗ 1 + 0.5 ∗ 2 + 0.5 ∗ 3 = 3
3
2.4
3
2.4
𝑧2
1
= 0.4 ∗ 1 + 0.4 ∗ 2 + 0.4 ∗ 3 = 2.4
𝑎2
1
= Relu(2.4)=2.4
𝑎1
1
= Relu(3)=3
Matrix multiplication
• 𝑍 = 𝑊 ∗ 𝑋
•
𝑧1
𝑧2
=
𝑤11 𝑤21
𝑤12 𝑤22
𝑤31
𝑤32
𝑥1
𝑥2
𝑥3
•
𝑧1
𝑧2
=
0.5 0.5
0.4 0.4
0.5
0.4
1
2
3
=
0.5 ∗ 1 + 0.5 ∗ 2 +
0.4 ∗ 1 + 0.4 ∗ 2 +
0.5 ∗ 3
0.4 ∗ 3
=
3
2.4
• 𝑎 = 𝑅𝑒𝑙𝑢 𝑍 = 𝑟𝑒𝑙𝑢
3
2.4
=
3
2.4
0.1
0.0
0.5 0.0
0.4 0.2 0.2
0.5 0.8
0.4 0.8
0.4
0.5 0.2
0.4
1.76
3
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1.76
1 1
1
2
L2L1
Relu
Relu
Relu
Second Layer and Output Layer
• Second Layer
• 𝑍 = 𝑊 ∗ 𝑋
•
𝑧1
𝑧2
=
0 0.2
0 0.8
0.8
0.2
1
3
2.4
=
0 ∗ 1 + 0.2 ∗ 3 +
0 ∗ 1 + 0.8 ∗ 3 +
0.8 ∗ 2.4
0.2 ∗ 2.4
=
2.52
2.88
• 𝑎 = 𝑅𝑒𝑙𝑢
2.52
2.88
=
2.52
2.88
• Output Layer
• 𝑍 = 0.1 0.2 0.4
1
2.52
2.88
= 0.1 ∗ 1 + 0.2 ∗ 2.52 0.4 ∗ 2.88 = 1.76
• 𝑦 = 𝑎 = 𝑅𝑒𝑙𝑢 𝑍 = 𝑅𝑒𝑙𝑢 1.76 =1.76
0.1
0.0
0.5 0.0
0.4 0.2 0.2 1.76
0.5 0.8
0.4 0.8
0.4
0.5 0.2
0.4
1.76
Obtained
Output
3
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1.76
1 1
1
2
L2L1
• 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 1.76, correct value is 90!
• Error = 1.76-90 = - 88.24
The other way around: BackProp
• 𝐶𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐽 =
(a− y) 2
2
• Penalty:
•
𝜕𝐽
𝜕𝑎
= 𝑎 − 𝑦 = 1.76−90 = − 88.24
x z a J
w
Calculate new output parameters
• 𝑤 𝑛𝑒𝑤 = w- α
𝑑𝐽
𝑑𝑤
• Assume α=0.01
•
𝜕𝐽
𝜕𝑤
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑥 =
𝜕𝐽
𝜕𝑧
𝑥 = -88.24*x
𝑤1_𝑛𝑒𝑤
𝑤2_𝑛𝑒𝑤
𝑤3_𝑛𝑒𝑤
= …slide 18 ……=
1
2.4
2.9
0.1
0.2 1.76
0.4
1.76
Obtained
Output
2.5
2.9
2.5
2.9
1.76
1
L2
Correct value is 90
Error = - 88.24
Distribute the penalty to L2 neurons
•
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑤 =
𝜕𝐽
𝜕𝑧
𝑤 = -88.24*w
•
𝜕𝐽
𝜕𝑥2
= -88.24*0.2= -17.6
•
𝜕𝐽
𝜕𝑥3
= -88.24*0.4= -35.3
𝛛𝐉
𝛛𝐱𝟐
= -17.6
𝝏𝑱
𝝏𝒙𝟑
= -35.3
Error = -88.24
0.1
0.2 1.76
0.4
1.76
Obtained
Output
2.5
2.9
2.5
2.9
1.76
1
L2
Calculate L2 parameters
• 𝑤 𝑛𝑒𝑤 = w− α
𝑑𝐽
𝑑𝑧
X
• Weights connected to upper neuron
• Weights connected to lower neuron
0.0
0.0
0.2
0.8
0.8
0.2
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1 1
L2L1
𝛛𝐉
𝛛𝐚𝟐
= -17.6
𝛛𝐉
𝛛𝐚𝟑
= -35.3
𝑤11_𝑛𝑒𝑤
𝑤21_𝑛𝑒𝑤
𝑤31_𝑛𝑒𝑤
=
𝑤11
𝑤21
𝑤31
- α *
𝑑𝐽
𝑑𝑧
*
𝑥1
𝑥2
𝑥3
=
0
0.2
0.8
- 0.01*(−17.6) *
1
3
2.4
=
0.2
0.7
1.2
𝑤12_𝑛𝑒𝑤
𝑤22_𝑛𝑒𝑤
𝑤32_𝑛𝑒𝑤
=
𝑤12
𝑤22
𝑤32
- α *
𝑑𝐽
𝑑𝑧
*
𝑥1
𝑥2
𝑥3
=
0
0.8
0.2
- 0.01*(35.3) *
1
3
2.4
=
0.4
1.9
1.0
Distribute the penalty to L1 neurons
•
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑤 =
𝜕𝐽
𝜕𝑧
𝑤
• Which one should I take?!
𝛛𝐉
𝛛𝐚𝟐
= -17.6
𝝏𝑱
𝝏𝒂𝟑
= -35.3
0.0
0.0
0.2
0.8
0.8
0.2
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1 1
L2L1
Distribute the penalty to L1 neurons
•
𝜕𝐽
𝜕𝑥
=
𝜕𝐽
𝜕𝑎
𝜕𝑎
𝜕𝑧
𝑤 =
𝜕𝐽
𝜕𝑧
𝑤
•
𝜕𝐽
𝜕𝑥2
= -17.6*0.2 + -35.3*0.8 =
= -31.8
•
𝜕𝐽
𝜕𝑥3
= -17.6*0.8+ -35.3*0.2 = 8.9
= -21.2
𝛛𝐉
𝛛𝐚𝟐
= -17.6
𝝏𝑱
𝝏𝒂𝟑
= -35.3
0.0
0.0
0.2
0.8
0.8
0.2
3.0
2.4
2.5
2.9
2.5
2.9
3.0
2.4
1 1
L2L1
𝛛𝐉
𝛛𝐱𝟐
= -31.8
𝛛𝐉
𝛛𝐱𝟐
= -21.2
Calculate new input parameters
• 𝑤 𝑛𝑒𝑤 = w− α
𝑑𝐽
𝑑𝑧
X
• Weights connected to upper neuron
• Weights connected to lower neuron
𝛛𝐉
𝛛𝐚𝟐
= -31.8
𝛛𝐉
𝛛𝐚𝟑
= -21.2
𝑤11_𝑛𝑒𝑤
𝑤21_𝑛𝑒𝑤
𝑤31_𝑛𝑒𝑤
=
𝑤11
𝑤21
𝑤31
- α *
𝑑𝐽
𝑑𝑧
*
𝑥1
𝑥2
𝑥3
=
0.5
0.5
0.5
- 0.01*(−31.8) *
1
2
3
=
0.8
1.1
1.4
𝑤12_𝑛𝑒𝑤
𝑤22_𝑛𝑒𝑤
𝑤32_𝑛𝑒𝑤
=
𝑤12
𝑤22
𝑤32
- α *
𝑑𝐽
𝑑𝑧
*
𝑥1
𝑥2
𝑥3
=
0.4
0.4
0.4
- 0.01*(−21.2) *
1
2
3
=
0.6
0.8
1.0
0.5
0.4
0.5
0.4
0.5
0.4
3
3.0
2.4
3.0
2.4
1
1
2
L1
1.0
0.2
0.8 0.4
0.6 0.7 2.4 86.60
1.1 1.9
0.8 1.2
2.9
1.4 1.0
1.0
2
5.4 5.4 19.3 19.3
3
7.1 7.1 12.0 12.0 86.60 86.60
L1 L2
1 1
1 Obtained
Output
Update parameters
Output ? 86.6, Error -3.4
Examples
https://playground.tensorflow.org/
Neural networks and Deep Learning
Inception LeNet (GoogLe Network)*
The name actually comes from the
movie Inception
*Going deeper with convolutions
[Szegedy 2014]
Neural Networks can generate Music!
• 30 seconds of Jazz generated by an RNN.
• https://soundcloud.com/user-559668657/machine-generated-jazz
• Do you like it?

More Related Content

PPTX
Computer graphics
PDF
IVR - Chapter 4 - Variational methods
PDF
130 problemas dispositivos electronicos lopez meza brayan
PDF
2018 MUMS Fall Course - Sampling-based techniques for uncertainty propagation...
PDF
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
PDF
PDF
Manual solucoes ex_extras
Computer graphics
IVR - Chapter 4 - Variational methods
130 problemas dispositivos electronicos lopez meza brayan
2018 MUMS Fall Course - Sampling-based techniques for uncertainty propagation...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
Manual solucoes ex_extras

What's hot (18)

PDF
Pr083 Non-local Neural Networks
PDF
Fundamentals of Transport Phenomena ChE 715
PDF
Numerical Methods: Solution of system of equations
PPTX
Digital Signal Processing
PDF
MinFill_Presentation
PPT
Convergence Criteria
DOC
It 05104 digsig_1
PPTX
PDF
Linear Programming Problems : Dr. Purnima Pandit
PDF
Rosser's theorem
PDF
One sided z transform
PDF
tsoulkas_cumulants
PPTX
Measures of dispersion - united world school of business
PPTX
Lec05 circle ellipse
PPTX
Computer Graphic - Lines, Circles and Ellipse
PDF
METHOD OF JACOBI
PPT
DC servo motor
Pr083 Non-local Neural Networks
Fundamentals of Transport Phenomena ChE 715
Numerical Methods: Solution of system of equations
Digital Signal Processing
MinFill_Presentation
Convergence Criteria
It 05104 digsig_1
Linear Programming Problems : Dr. Purnima Pandit
Rosser's theorem
One sided z transform
tsoulkas_cumulants
Measures of dispersion - united world school of business
Lec05 circle ellipse
Computer Graphic - Lines, Circles and Ellipse
METHOD OF JACOBI
DC servo motor
Ad

Similar to Introduction to neural networks (20)

PPT
PDF
Calculo integral - Larson
PDF
Manual solucoes ex_extras
PDF
Manual solucoes ex_extras
PDF
Solved exercises double integration
PDF
Quantum factorization.pdf
PDF
Formulario Geometria Analitica
PDF
Integration techniques
PDF
Practical and Worst-Case Efficient Apportionment
PDF
Solids of revolution
PDF
Math formulas worksheet answers key to the way to pass easily
PDF
Lecture 2: Artificial Neural Network
PDF
Section4 stochastic
PDF
Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet...
PDF
Gaussian quadratures
PDF
Neet class 11 12 basic mathematics notes
PDF
Part 1.1.Neural network and training algorithm.pdf
PDF
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
DOCX
DSP LAB COMPLETE CODES.docx
PPT
Laplace equation
Calculo integral - Larson
Manual solucoes ex_extras
Manual solucoes ex_extras
Solved exercises double integration
Quantum factorization.pdf
Formulario Geometria Analitica
Integration techniques
Practical and Worst-Case Efficient Apportionment
Solids of revolution
Math formulas worksheet answers key to the way to pass easily
Lecture 2: Artificial Neural Network
Section4 stochastic
Solution Manual : Chapter - 06 Application of the Definite Integral in Geomet...
Gaussian quadratures
Neet class 11 12 basic mathematics notes
Part 1.1.Neural network and training algorithm.pdf
Calculus Early Transcendentals 10th Edition Anton Solutions Manual
DSP LAB COMPLETE CODES.docx
Laplace equation
Ad

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPT
Geologic Time for studying geology for geologist
PPTX
Tartificialntelligence_presentation.pptx
PDF
Five Habits of High-Impact Board Members
PPTX
The various Industrial Revolutions .pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Architecture types and enterprise applications.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
Hybrid model detection and classification of lung cancer
A contest of sentiment analysis: k-nearest neighbor versus neural network
Group 1 Presentation -Planning and Decision Making .pptx
Chapter 5: Probability Theory and Statistics
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Geologic Time for studying geology for geologist
Tartificialntelligence_presentation.pptx
Five Habits of High-Impact Board Members
The various Industrial Revolutions .pptx
O2C Customer Invoices to Receipt V15A.pptx
A review of recent deep learning applications in wood surface defect identifi...
WOOl fibre morphology and structure.pdf for textiles
Architecture types and enterprise applications.pdf
observCloud-Native Containerability and monitoring.pptx
Getting started with AI Agents and Multi-Agent Systems
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Benefits of Physical activity for teenagers.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A comparative study of natural language inference in Swahili using monolingua...

Introduction to neural networks

  • 2. 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑠 ∈ 𝑆𝑢𝑝𝑒𝑟𝑣𝑖𝑠𝑒𝑑 𝑀𝑎𝑐ℎ𝑖𝑛𝑒 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔
  • 4. Neural Networks Architectures Standard Artificial Neural Networks ANN Convolutional Neural Networks CNN Recurrent Neural Networks RNN
  • 5. Perceptron z 𝑧 = 𝑖=0 𝑛 𝑤𝑖 𝑥𝑖 = 𝑊 ∗ 𝑋 • z = w0*1 + w1*x1+ w2*x2 + wn*xn
  • 6. Perceptron z 𝑎 = 𝑔(𝑧)g(z) a
  • 7. Activation functions and non linearity Tanh(z) Relu(z)σ(z)
  • 9. . Relu(z) . Relu(z) 1 2 3 0.5 0.4 0.5 0.4 0.4 0.5 𝑧1 1 = 0.5 ∗ 1 + 0.5 ∗ 2 + 0.5 ∗ 3 = 3 3 2.4 2.4 𝑧2 1 = 0.4 ∗ 1 + 0.4 ∗ 2 + 0.4 ∗ 3 = 2.4 𝑎2 1 = Relu(2.4)=2.4 𝑎1 1 = Relu(3)=3 3
  • 10. . Relu(z) . Relu(z) 1 2 3 0.5 0.4 0.5 0.4 0.4 0.5 3 2.4 3 2.4 Relu(z). 2.5 2.5 𝑦 0.2 0.8 𝑧1 2 = 0.2 ∗ 3 + 0.8 ∗ 2.4 = 2.52 𝑎1 2 = Relu(2.52)=2.52
  • 11. Forward propagation 𝑍 = 𝑊 ∗ 𝑋 𝐴 = 𝑔(𝑍) 𝐴 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟′ 𝑠 input 𝑍 = 𝑊 ∗ 𝐴
  • 13. Learning problem • 𝑇ℎ𝑒 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 ′𝑎′ • 𝑡ℎ𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖𝑠 ′𝑦′ • 𝑇ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 ‘𝐽’ 𝑖𝑠 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 ‘𝑎’ 𝑎𝑛𝑑 ‘𝑦’ 𝑓𝑜𝑟 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 J(a,y) = (𝑎 − 𝑦)2 • How to minimise the error? change w • How to find w?
  • 14. Optimisation: Gradient Descent? • w = w- α 𝑑𝐽 𝑑𝑤 Problem statement Error in output (J), how to change W?
  • 15. How to find 𝜕𝐽 𝜕𝑤 ? Back propagation? 𝜕𝑎 𝜕𝑧 = 𝑔′ 𝑧 𝜕𝑧 𝜕𝑤 = 𝑥 𝜕𝐽 𝜕𝑤 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝜕𝑧 𝜕𝑤 w z a J 𝜕𝐽 𝜕𝑤 = 𝜕𝐽 𝜕𝑎 ∗ 𝑔′ 𝑧 ∗ 𝑥 x Chain Rule: g'()
  • 16. How to find 𝜕𝐽 𝜕𝑥 ? Back propagation? 𝜕𝑎 𝜕𝑧 = 𝑔′ 𝑧 𝜕𝑧 𝜕𝑥 = 𝑤 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝜕𝑧 𝜕𝑥 x z a J 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 ∗ 𝑔′ 𝑧 ∗ 𝑤 w Chain Rule: g'()
  • 17. Update parameters • 𝑤 𝑛𝑒𝑤 = w- α 𝑑𝐽 𝑑𝑤 • Assume α=0.01 • 𝜕𝐽 𝜕𝑤 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑥 = -88.24*𝑥 • w1_new = 0.1- 0.01*(-88.24)*1= 0.1+0.9=1 • w2_new = 0.2- 0.01*(-88.24)*2.5= 2.4 • w3_new = 0.4- 0.01*(-88.24)*2.9= 2.9 0.1 0.2 1.76 0.4 1.76 Obtained Output 2.5 2.9 2.5 2.9 1.76 1 L2 Correct value is 90 Obtained value is 1.76 Error = - 88.24
  • 18. Distribute the penalty to previous neurons • 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑤 = -88.24*w • 𝜕𝐽 𝜕𝑥2 = -88.24*0.2= -17.6 • 𝜕𝐽 𝜕𝑥3 = -88.24*0.4= -35.3 𝛛𝐉 𝛛𝐱𝟐 = -17.6 𝝏𝑱 𝝏𝒙𝟑 = -35.3 Error = -88. 0.1 0.2 1.76 0.4 1.76 Obtained Output 2.5 2.9 2.5 2.9 1.76 1 L2 𝛛𝐉 𝛛𝒂 = -88.24
  • 19. Feed Forward Z= W*X a= g(z) J= Cost function Feed backward dJ/da = (y-a) or calculated da/dz = g'(z) dJ/dz = (dJ/da)*(da/dz) dJ/dx = (dJ/dz)*W dJ/dw = (dJ/dz)*X dJ/dx dz/dx da/dz dJ/da wi g'(z) … w11 w12 dJ/da1 w21 w22 w31 dJ/da2 w32 z2 a2 x2 x3 x1 z1 a1 dz/dw x Summary
  • 20. How about other search and optimisation methods? Forward propagation calculate error Back propagation Update parameters
  • 21. Learning the price of a flat in Al weibdeh • Description: • Ground Floor? Yes • 2 bathrooms • 3 bedrooms • The price is 90 K JoD. [1,2,3]-> -> 90NN
  • 22. ANN: 3 inputs, 1 output, 2 hidden Layers 0.1 0.0 0.5 0.0 0.4 0.2 0.3 0.5 0.8 0.4 0.8 -0.1 0.5 0.2 0.4 0.6 3 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 0.6 1 1 1 2 L2L1
  • 23. 0.1 0.0 0.5 0.0 0.4 0.2 0.2 0.5 0.8 0.4 0.8 0.4 0.5 0.2 0.4 1.76 3 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1.76 1 1 1 2 Initialisation Relu Relu Relu Relu Relu
  • 24. . Relu(z) . Relu(z) 1 2 3 0.5 0.4 0.5 0.4 0.4 0.5 𝑧1 1 = 0.5 ∗ 1 + 0.5 ∗ 2 + 0.5 ∗ 3 = 3 3 2.4 3 2.4 𝑧2 1 = 0.4 ∗ 1 + 0.4 ∗ 2 + 0.4 ∗ 3 = 2.4 𝑎2 1 = Relu(2.4)=2.4 𝑎1 1 = Relu(3)=3
  • 25. Matrix multiplication • 𝑍 = 𝑊 ∗ 𝑋 • 𝑧1 𝑧2 = 𝑤11 𝑤21 𝑤12 𝑤22 𝑤31 𝑤32 𝑥1 𝑥2 𝑥3 • 𝑧1 𝑧2 = 0.5 0.5 0.4 0.4 0.5 0.4 1 2 3 = 0.5 ∗ 1 + 0.5 ∗ 2 + 0.4 ∗ 1 + 0.4 ∗ 2 + 0.5 ∗ 3 0.4 ∗ 3 = 3 2.4 • 𝑎 = 𝑅𝑒𝑙𝑢 𝑍 = 𝑟𝑒𝑙𝑢 3 2.4 = 3 2.4
  • 26. 0.1 0.0 0.5 0.0 0.4 0.2 0.2 0.5 0.8 0.4 0.8 0.4 0.5 0.2 0.4 1.76 3 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1.76 1 1 1 2 L2L1 Relu Relu Relu
  • 27. Second Layer and Output Layer • Second Layer • 𝑍 = 𝑊 ∗ 𝑋 • 𝑧1 𝑧2 = 0 0.2 0 0.8 0.8 0.2 1 3 2.4 = 0 ∗ 1 + 0.2 ∗ 3 + 0 ∗ 1 + 0.8 ∗ 3 + 0.8 ∗ 2.4 0.2 ∗ 2.4 = 2.52 2.88 • 𝑎 = 𝑅𝑒𝑙𝑢 2.52 2.88 = 2.52 2.88 • Output Layer • 𝑍 = 0.1 0.2 0.4 1 2.52 2.88 = 0.1 ∗ 1 + 0.2 ∗ 2.52 0.4 ∗ 2.88 = 1.76 • 𝑦 = 𝑎 = 𝑅𝑒𝑙𝑢 𝑍 = 𝑅𝑒𝑙𝑢 1.76 =1.76
  • 28. 0.1 0.0 0.5 0.0 0.4 0.2 0.2 1.76 0.5 0.8 0.4 0.8 0.4 0.5 0.2 0.4 1.76 Obtained Output 3 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1.76 1 1 1 2 L2L1 • 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 1.76, correct value is 90! • Error = 1.76-90 = - 88.24
  • 29. The other way around: BackProp • 𝐶𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐽 = (a− y) 2 2 • Penalty: • 𝜕𝐽 𝜕𝑎 = 𝑎 − 𝑦 = 1.76−90 = − 88.24 x z a J w
  • 30. Calculate new output parameters • 𝑤 𝑛𝑒𝑤 = w- α 𝑑𝐽 𝑑𝑤 • Assume α=0.01 • 𝜕𝐽 𝜕𝑤 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑥 = 𝜕𝐽 𝜕𝑧 𝑥 = -88.24*x 𝑤1_𝑛𝑒𝑤 𝑤2_𝑛𝑒𝑤 𝑤3_𝑛𝑒𝑤 = …slide 18 ……= 1 2.4 2.9 0.1 0.2 1.76 0.4 1.76 Obtained Output 2.5 2.9 2.5 2.9 1.76 1 L2 Correct value is 90 Error = - 88.24
  • 31. Distribute the penalty to L2 neurons • 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑤 = 𝜕𝐽 𝜕𝑧 𝑤 = -88.24*w • 𝜕𝐽 𝜕𝑥2 = -88.24*0.2= -17.6 • 𝜕𝐽 𝜕𝑥3 = -88.24*0.4= -35.3 𝛛𝐉 𝛛𝐱𝟐 = -17.6 𝝏𝑱 𝝏𝒙𝟑 = -35.3 Error = -88.24 0.1 0.2 1.76 0.4 1.76 Obtained Output 2.5 2.9 2.5 2.9 1.76 1 L2
  • 32. Calculate L2 parameters • 𝑤 𝑛𝑒𝑤 = w− α 𝑑𝐽 𝑑𝑧 X • Weights connected to upper neuron • Weights connected to lower neuron 0.0 0.0 0.2 0.8 0.8 0.2 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1 1 L2L1 𝛛𝐉 𝛛𝐚𝟐 = -17.6 𝛛𝐉 𝛛𝐚𝟑 = -35.3 𝑤11_𝑛𝑒𝑤 𝑤21_𝑛𝑒𝑤 𝑤31_𝑛𝑒𝑤 = 𝑤11 𝑤21 𝑤31 - α * 𝑑𝐽 𝑑𝑧 * 𝑥1 𝑥2 𝑥3 = 0 0.2 0.8 - 0.01*(−17.6) * 1 3 2.4 = 0.2 0.7 1.2 𝑤12_𝑛𝑒𝑤 𝑤22_𝑛𝑒𝑤 𝑤32_𝑛𝑒𝑤 = 𝑤12 𝑤22 𝑤32 - α * 𝑑𝐽 𝑑𝑧 * 𝑥1 𝑥2 𝑥3 = 0 0.8 0.2 - 0.01*(35.3) * 1 3 2.4 = 0.4 1.9 1.0
  • 33. Distribute the penalty to L1 neurons • 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑤 = 𝜕𝐽 𝜕𝑧 𝑤 • Which one should I take?! 𝛛𝐉 𝛛𝐚𝟐 = -17.6 𝝏𝑱 𝝏𝒂𝟑 = -35.3 0.0 0.0 0.2 0.8 0.8 0.2 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1 1 L2L1
  • 34. Distribute the penalty to L1 neurons • 𝜕𝐽 𝜕𝑥 = 𝜕𝐽 𝜕𝑎 𝜕𝑎 𝜕𝑧 𝑤 = 𝜕𝐽 𝜕𝑧 𝑤 • 𝜕𝐽 𝜕𝑥2 = -17.6*0.2 + -35.3*0.8 = = -31.8 • 𝜕𝐽 𝜕𝑥3 = -17.6*0.8+ -35.3*0.2 = 8.9 = -21.2 𝛛𝐉 𝛛𝐚𝟐 = -17.6 𝝏𝑱 𝝏𝒂𝟑 = -35.3 0.0 0.0 0.2 0.8 0.8 0.2 3.0 2.4 2.5 2.9 2.5 2.9 3.0 2.4 1 1 L2L1 𝛛𝐉 𝛛𝐱𝟐 = -31.8 𝛛𝐉 𝛛𝐱𝟐 = -21.2
  • 35. Calculate new input parameters • 𝑤 𝑛𝑒𝑤 = w− α 𝑑𝐽 𝑑𝑧 X • Weights connected to upper neuron • Weights connected to lower neuron 𝛛𝐉 𝛛𝐚𝟐 = -31.8 𝛛𝐉 𝛛𝐚𝟑 = -21.2 𝑤11_𝑛𝑒𝑤 𝑤21_𝑛𝑒𝑤 𝑤31_𝑛𝑒𝑤 = 𝑤11 𝑤21 𝑤31 - α * 𝑑𝐽 𝑑𝑧 * 𝑥1 𝑥2 𝑥3 = 0.5 0.5 0.5 - 0.01*(−31.8) * 1 2 3 = 0.8 1.1 1.4 𝑤12_𝑛𝑒𝑤 𝑤22_𝑛𝑒𝑤 𝑤32_𝑛𝑒𝑤 = 𝑤12 𝑤22 𝑤32 - α * 𝑑𝐽 𝑑𝑧 * 𝑥1 𝑥2 𝑥3 = 0.4 0.4 0.4 - 0.01*(−21.2) * 1 2 3 = 0.6 0.8 1.0 0.5 0.4 0.5 0.4 0.5 0.4 3 3.0 2.4 3.0 2.4 1 1 2 L1
  • 36. 1.0 0.2 0.8 0.4 0.6 0.7 2.4 86.60 1.1 1.9 0.8 1.2 2.9 1.4 1.0 1.0 2 5.4 5.4 19.3 19.3 3 7.1 7.1 12.0 12.0 86.60 86.60 L1 L2 1 1 1 Obtained Output Update parameters Output ? 86.6, Error -3.4
  • 38. Neural networks and Deep Learning
  • 39. Inception LeNet (GoogLe Network)* The name actually comes from the movie Inception *Going deeper with convolutions [Szegedy 2014]
  • 40. Neural Networks can generate Music! • 30 seconds of Jazz generated by an RNN. • https://soundcloud.com/user-559668657/machine-generated-jazz • Do you like it?