Artificial Neural Network

Artificial Neural Network
Dessy Amirudin
May 2016
Data Science Indonesia
Bootcamp

Brain Plasticity
http://www.nytimes.com/2000/04/25/science/rewired-ferrets-overturn-theories-of-brain-growth.html
auditory cortex
“One learning algorithm” to rule them all

Any sensor input!
Hearing with vibration
http://www.eaglemanlab.net/sensory-substitution
Human Echolocation
http://www.sciencemag.org/news/2014/11/how-blind-
people-use-batlike-sonar

Any sensor input!
Seeing with sound:
https://www.newscientist.com/article/mg20727731-500-
sensory-hijack-rewiring-brains-to-see-with-sound/
3rd Eye Frog:
https://www.newscientist.com/article/mg20727731-500-
sensory-hijack-rewiring-brains-to-see-with-sound/

Neuron
http://learn.genetics.utah.edu/

Neural Network
input hidden layer output

Neural Network in Brief History
• Algorithm to mimic the brain
• Was widely use in 80s and early 90s
• Popularity diminishes in late 90s. Why?
• Recent resurgence: State-of-the-art technique
for many application
• Can be used for regression and classification

Recent Application of NN
• Speech recognition
• Image recognition and search
• Playlist recommendation
• Skype Translate

Other application of NN
• Stock market prediction
• Credit Worthiness
• Credit Rating
FINANCIAL
• Medical Diagnosis
• Electronic Noses
MEDICAL
• Churn prediction
• Targeted Marketing
• Service Usage Forecast
SALES & MARKETING
& many more

One Layer Neural Network
input output
input node =
𝑖
𝑤𝑖 𝑎𝑖 =
output node = ø(wTa)
wTa
Ø is the activation function

Some Common Activation Function
linear
step
sigmoid
tanh
ø(wTa) = wTa
ø(wTa) =
0 𝑓𝑜𝑟 wTa < 𝑡
1 𝑓𝑜𝑟 wTa ≥ 𝑡
ø(wTa) =
1
1+𝑒−𝐰 𝐓 𝐚
ø(wTa) =
𝑒 𝐰 𝐓 𝐚−𝑒−𝐰 𝐓 𝐚
𝑒 𝐰 𝐓 𝐚+𝑒−𝐰 𝐓 𝐚

Revisit One Layer Neural Network
a. If the activation function is linear, what will
happen?
b. If the activation function is sigmoid, what will
happen?

Look at a classification problem
• Linear classification model is not enough
• Add quadratic or qubic term as necessary
Suppose we have a classification problem, with n=100 feature
Adding all quadratic term, the number of variable will become ~5000
Adding all qubic and quadratic term, the number of variable will become ~170K
http://sebastianraschka.com

Dog vs Cat
vs
100 x 100 pixels (example) ~10000 variables
Adding all quadratic term, the number of variable
become ~ 50 million variables

Multilayer Network
sigmoid
ø(wTa)
node end
w
w
w

AND Function
𝑎1
𝑎2
𝑎0
𝑎1 𝑎2 output
0 0 0
0 1 0
1 0 0
1 1 1
𝑎0 =1, this is the bias value
Activation function is sigmoid
Suppose we assign the weight
𝑤0 = -20
𝑤1 = 15
𝑤1 = 15
The AND logic will be correct
𝑤0
𝑤1
𝑤2

OR Function
𝑎1
𝑎2
𝑎0
𝑎1 𝑎2 output
0 0 0
0 1 1
1 0 1
1 1 1
Task 1:
Find value of 𝑤0, 𝑤1 and 𝑤2
to make the OR logic is TRUE
What is the value of the weight if the
logic is NOT (𝑎1 OR 𝑎2)?
𝑤0
𝑤1
𝑤2

XOR Function
𝑎1 𝑎2 output
0 0 0
0 1 1
1 0 1
1 1 0
Can you find the weight for XOR
function?
XOR = NOT(NOT(𝑎1 OR 𝑎2) OR (𝑎1 AND 𝑎2))
𝑎1 𝑎2 AND NOT (OR) output
0 0 0 1 0
0 1 0 0 1
1 0 0 0 1
1 1 1 0 0

Multilayered Network for XOR
Representation
𝑎0
1
𝑎1
1
𝑎2
1
𝑎0
2
𝑎1
2
𝑎2
2
𝑤01
23
𝑤11
23
𝑤21
23
AND
NOT OR
𝑤01
12
𝑤02
12
𝑤11
12
𝑤12
12
𝑤21
12
𝑤22
12
Now, a multilayered network is necessary

Intro to Optimization
How to find minimum value of this function? How to find minimum value of this function?

Gradient Descent Method
• Suppose the function is 𝑓(𝑥) the descent direction is the first derivative 𝑓′(𝑥)
• Parameter to start the algorithm
α = learning parameter, usually set with small value such as 0.001
ϵ = convergence parameter, usually set with very small value such as 1e-6

Gradient Descent Method
 Initialize with k=0, some value of α and ϵ
 Start from random 𝑥 as 𝑥 𝑘
 Calculate cost function 𝑓 𝑥 𝑘
 Update value of 𝑥 as 𝑥 𝑘+1 = 𝑥 𝑘 − α𝑓′ 𝑥 𝑘
 Calculate cost difference δ= 𝑓 𝑥 𝑘
- 𝑓 𝑥 𝑘+1
 If δ< ϵ , STOP
We can write linear regression learning as an optimization problem
min
𝛽
1
𝑛
(𝑦𝑖 − 𝛽 𝑇
𝑥𝑖)2

Exercise 1
• Load “auto_data.csv”
• Create linear regression model with dependent variable (y) = “weight” and
independent variable(x) = “mpg”
• What is the value of intercept?
• What is the value of mpg’s coefficient?
• What is the MSE’s value?
• Plot the mpg ~ weight and it’s model
• Can you write R code to find the optimum value of intercept’s coefficient and
mpg’s coefficient using the gradient descent method?

Forward and Backward Propagation

Forward Propagation
• 𝑎1
= input
• 𝑠2
= 𝑤12
𝑎1
• 𝑎2 = ∅(𝑠2) (add bias)
• 𝑠3 = 𝑤23 𝑎2
• 𝑎3 = ∅(𝑠3) (add bias)
• 𝑠4
= 𝑤34
𝑎3
• 𝑎4 = ∅(𝑠4)
𝑠𝑖
3
𝑠𝑖
2
𝑠𝑖
4
𝑎𝑖
3𝑎𝑖
2𝑎𝑖
1
𝑎𝑖
4
𝑤𝑖𝑗
12
𝑤𝑖𝑗
23
𝑤𝑖𝑗
34
Get the output using Forward
Propagation
How to update weight using gradient descent?

Backward Propagation
• 𝑠2 = 𝑤12 𝑎1
• 𝑎2 = ∅ 𝑠2
• 𝑠3 = 𝑤23 𝑎2
• 𝑎3 = ∅ 𝑠3
• 𝑠4 = 𝑤34 𝑎3
• 𝑎4 = ∅ 𝑠4 = 𝑦𝑜
Define error δ =
1
2
(𝑦𝑜 − 𝑦𝑖)2
input
output
𝑎1 𝑠2 𝑎2 𝑠3 𝑎3 𝑠4 𝑎4
𝑤12 𝑤23 𝑤34
Given training example (𝑎𝑖, 𝑦𝑖) update 𝑤34
𝜕𝛿
𝜕𝑤34
=
𝜕
𝜕𝑤34
1
2
= 𝑦𝑜 − 𝑦𝑖
𝜕
𝜕𝑤34
∅ 𝑠4
= (𝑦𝑜 − 𝑦𝑖)∅′ 𝑠4 𝑎3
update 𝑤23
𝜕𝛿
𝜕𝑤23
=
𝜕
𝜕𝑤23
1
2
= 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4
𝜕
𝜕𝑤23
𝑤34∅ 𝑠3
= 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34∅′ 𝑠3 𝑎2
sigm sigm sigm

Backward Propagation
In case of one output with many hidden layer, the formulation for hidden layer in one
particular node become
𝜕𝛿
𝜕𝑤23
= ∅′ 𝑠3 𝑎2
𝑘∈𝐾
𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34

Neural Network Tips
• Most of the time, one hidden layer is enough
• Number of neuron between input layer size and output layer size
• Number of neuron in hidden usually 2/3 input size
HOWEVER, this is not always TRUE. The best way is to keep experimenting

Neural Network for Regression
Load MASS library
Use “Boston” data
Predict the median value of the house (medv)
Do the following:
Data preparation
- Split the data into train and test set. Train set comprises 0.75 % of the data
Model 1:
- Create the linear regression model using the train data set (using lm or glm)
- Predict the “medv” from the test data set
- Calculate the RSS of the test set
- Calculate the TSS of the test set
- Calculate R2 of the test set
- Calculate MSE of the test set

Regression Continues
Model 2:
- Load “neuralnet” library
library(neuralnet)
- Create the regression model using neural network algorithm with one hidden layer
with 8 node. Follow the code?
n=names(train)
f=as.formula(paste("medv~",paste(n[!n %in% "medv"],collapse="+")))
nn <- neuralnet(f,data=train,hidden=c(8),linear.output=T)
What happened? Do you see message like this?
“algorithm did not converge in 1 of 1 repetition(s) within the stepmax”
• Predict the “medv” from the test data set
• Calculate the RSS of the test set
• Calculate the TSS of the test set
• Calculate R2 of the test set
• Calculate MSE of the test set
• Plot the model graph
Compare with the result from linear model.

Neural Network Plot
Use plot(“nn model”) to plot the graph

Neural Network Additional Tips
• Preprocessed data using normalization
• Usually scaling in the intervals [0,1] or [-1,1] tends to give better results.

Exercise 2
Same as exercise model 2, but normalized the data.
• Predict the “medv” from the test data set
• Calculate the RSS of the test set
• Calculate the TSS of the test set
• Calculate R2 of the test set
• Calculate MSE of the test set
Compare with the result from linear model. Can you improve it?
What is the lesson learned?

Neural Network for Binomial
Classification

Data Exploration and Modeling
Use “credit_dlqn.csv” data
Explore the data
• How many variable?
• What is the type of variable?
• Any other variable that you think are needed to create credit model?
Do the following:
Data preparation
- Split the data into train and test set. Train set comprises 0.75 % of the data
Model 1:
• Use logistic regression to predict the default in 2 years
Model 2:
• Use neural network to predict the default in 2 years
• Use one hidden layer with number of node is 2/3 of input (equal to 7)

Continue Experimenting
How long do you finish the model?
NOTE : Neural Network is very slow to converge. Depend on the objective of the
business, as Data Scientist you have to be very considerate when choosing an
algorithm
• Try with another number of node in hidden layer, such as 2 node
• How is the result?
• How is the accuracy compared to logistic regression?

Recall on Confusion Table
• Source wikipedia

Assignment – Due to Next Week
• Increase the precision of the neural network model. Use neural network
with different parameter
• In word document, tell what is the improvement that you can obtaind,
what is your method, why it is work, why it doesn’t work
• Submit your code and word document to trainer.datascience@gmail.com
before 23 May 2016 23:59:59

References
• Machine Learning. Courses in Coursera by
Andrew Ng, 2013.
• Hastie T., Tibshirani R., Witten D. and James G.
The Introduction of Statistical Learning.
Springer. 2014.

Artificial Neural Network

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Artificial Neural Network (20)

Recently uploaded (20)

Artificial Neural Network