SlideShare a Scribd company logo
Artificial Neural Network
Dessy Amirudin
May 2016
Data Science Indonesia
Bootcamp
Intro
Brain Plasticity
http://www.nytimes.com/2000/04/25/science/rewired-ferrets-overturn-theories-of-brain-growth.html
auditory cortex
“One learning algorithm” to rule them all
Any sensor input!
Hearing with vibration
http://www.eaglemanlab.net/sensory-substitution
Human Echolocation
http://www.sciencemag.org/news/2014/11/how-blind-
people-use-batlike-sonar
Any sensor input!
Seeing with sound:
https://www.newscientist.com/article/mg20727731-500-
sensory-hijack-rewiring-brains-to-see-with-sound/
3rd Eye Frog:
https://www.newscientist.com/article/mg20727731-500-
sensory-hijack-rewiring-brains-to-see-with-sound/
Neuron
http://learn.genetics.utah.edu/
Neuron
http://learn.genetics.utah.edu/
Neural Network
input hidden layer output
Neural Network in Brief History
• Algorithm to mimic the brain
• Was widely use in 80s and early 90s
• Popularity diminishes in late 90s. Why?
• Recent resurgence: State-of-the-art technique
for many application
• Can be used for regression and classification
Recent Application of NN
• Speech recognition
• Image recognition and search
• Playlist recommendation
• Skype Translate
Other application of NN
• Stock market prediction
• Credit Worthiness
• Credit Rating
FINANCIAL
• Medical Diagnosis
• Electronic Noses
MEDICAL
• Churn prediction
• Targeted Marketing
• Service Usage Forecast
SALES & MARKETING
& many more
One Layer Neural Network
input output
input node =
𝑖
𝑤𝑖 𝑎𝑖 =
output node = ø(wTa)
wTa
Ø is the activation function
Some Common Activation Function
linear
step
sigmoid
tanh
ø(wTa) = wTa
ø(wTa) =
0 𝑓𝑜𝑟 wTa < 𝑡
1 𝑓𝑜𝑟 wTa ≥ 𝑡
ø(wTa) =
1
1+𝑒−𝐰 𝐓 𝐚
ø(wTa) =
𝑒 𝐰 𝐓 𝐚−𝑒−𝐰 𝐓 𝐚
𝑒 𝐰 𝐓 𝐚+𝑒−𝐰 𝐓 𝐚
Revisit One Layer Neural Network
a. If the activation function is linear, what will
happen?
b. If the activation function is sigmoid, what will
happen?
Do we really need many layer?
Look at a classification problem
• Linear classification model is not enough
• Add quadratic or qubic term as necessary
Suppose we have a classification problem, with n=100 feature
Adding all quadratic term, the number of variable will become ~5000
Adding all qubic and quadratic term, the number of variable will become ~170K
http://sebastianraschka.com
Artificial Neural Network
Dog vs Cat
vs
100 x 100 pixels (example) ~10000 variables
Adding all quadratic term, the number of variable
become ~ 50 million variables
Multilayer Network
sigmoid
ø(wTa)
node end
w
w
w
Recall a sigmoid function
AND Function
𝑎1
𝑎2
𝑎0
𝑎1 𝑎2 output
0 0 0
0 1 0
1 0 0
1 1 1
𝑎0 =1, this is the bias value
Activation function is sigmoid
Suppose we assign the weight
𝑤0 = -20
𝑤1 = 15
𝑤1 = 15
The AND logic will be correct
𝑤0
𝑤1
𝑤2
OR Function
𝑎1
𝑎2
𝑎0
𝑎1 𝑎2 output
0 0 0
0 1 1
1 0 1
1 1 1
𝑎0 =1, this is the bias value
Task 1:
Find value of 𝑤0, 𝑤1 and 𝑤2
to make the OR logic is TRUE
What is the value of the weight if the
logic is NOT (𝑎1 OR 𝑎2)?
𝑤0
𝑤1
𝑤2
XOR Function
𝑎1 𝑎2 output
0 0 0
0 1 1
1 0 1
1 1 0
𝑎0 =1, this is the bias value
Can you find the weight for XOR
function?
XOR = NOT(NOT(𝑎1 OR 𝑎2) OR (𝑎1 AND 𝑎2))
𝑎1 𝑎2 AND NOT (OR) output
0 0 0 1 0
0 1 0 0 1
1 0 0 0 1
1 1 1 0 0
Multilayered Network for XOR
Representation
𝑎0
1
𝑎1
1
𝑎2
1
𝑎0
2
𝑎1
2
𝑎2
2
𝑤01
23
𝑤11
23
𝑤21
23
AND
NOT OR
𝑤01
12
𝑤02
12
𝑤11
12
𝑤12
12
𝑤21
12
𝑤22
12
Now, a multilayered network is necessary
How to assign the weight?
Intro to Optimization
How to find minimum value of this function? How to find minimum value of this function?
Gradient Descent Method
• Suppose the function is 𝑓(𝑥) the descent direction is the first derivative 𝑓′(𝑥)
• Parameter to start the algorithm
α = learning parameter, usually set with small value such as 0.001
ϵ = convergence parameter, usually set with very small value such as 1e-6
Gradient Descent Method
 Initialize with k=0, some value of α and ϵ
 Start from random 𝑥 as 𝑥 𝑘
 Calculate cost function 𝑓 𝑥 𝑘
 Update value of 𝑥 as 𝑥 𝑘+1 = 𝑥 𝑘 − α𝑓′ 𝑥 𝑘
 Calculate cost difference δ= 𝑓 𝑥 𝑘
- 𝑓 𝑥 𝑘+1
 If δ< ϵ , STOP
We can write linear regression learning as an optimization problem
min
𝛽
1
𝑛
(𝑦𝑖 − 𝛽 𝑇
𝑥𝑖)2
Exercise 1
• Load “auto_data.csv”
• Create linear regression model with dependent variable (y) = “weight” and
independent variable(x) = “mpg”
• What is the value of intercept?
• What is the value of mpg’s coefficient?
• What is the MSE’s value?
• Plot the mpg ~ weight and it’s model
• Can you write R code to find the optimum value of intercept’s coefficient and
mpg’s coefficient using the gradient descent method?
Artificial Neural Network
Forward and Backward Propagation
Forward Propagation
• 𝑎1
= input
• 𝑠2
= 𝑤12
𝑎1
• 𝑎2 = ∅(𝑠2) (add bias)
• 𝑠3 = 𝑤23 𝑎2
• 𝑎3 = ∅(𝑠3) (add bias)
• 𝑠4
= 𝑤34
𝑎3
• 𝑎4 = ∅(𝑠4)
𝑠𝑖
3
𝑠𝑖
2
𝑠𝑖
4
𝑎𝑖
3𝑎𝑖
2𝑎𝑖
1
𝑎𝑖
4
𝑤𝑖𝑗
12
𝑤𝑖𝑗
23
𝑤𝑖𝑗
34
Get the output using Forward
Propagation
How to update weight using gradient descent?
Backward Propagation
• 𝑠2 = 𝑤12 𝑎1
• 𝑎2 = ∅ 𝑠2
• 𝑠3 = 𝑤23 𝑎2
• 𝑎3 = ∅ 𝑠3
• 𝑠4 = 𝑤34 𝑎3
• 𝑎4 = ∅ 𝑠4 = 𝑦𝑜
Define error δ =
1
2
(𝑦𝑜 − 𝑦𝑖)2
input
output
𝑎1 𝑠2 𝑎2 𝑠3 𝑎3 𝑠4 𝑎4
𝑤12 𝑤23 𝑤34
Given training example (𝑎𝑖, 𝑦𝑖) update 𝑤34
𝜕𝛿
𝜕𝑤34
=
𝜕
𝜕𝑤34
1
2
(𝑦𝑜 − 𝑦𝑖)2
= 𝑦𝑜 − 𝑦𝑖
𝜕
𝜕𝑤34
∅ 𝑠4
= (𝑦𝑜 − 𝑦𝑖)∅′ 𝑠4 𝑎3
update 𝑤23
𝜕𝛿
𝜕𝑤23
=
𝜕
𝜕𝑤23
1
2
(𝑦𝑜 − 𝑦𝑖)2
= 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4
𝜕
𝜕𝑤23
𝑤34∅ 𝑠3
= 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34∅′ 𝑠3 𝑎2
sigm sigm sigm
Backward Propagation
In case of one output with many hidden layer, the formulation for hidden layer in one
particular node become
𝜕𝛿
𝜕𝑤23
= ∅′ 𝑠3 𝑎2
𝑘∈𝐾
𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34
Neural Network Tips
• Most of the time, one hidden layer is enough
• Number of neuron between input layer size and output layer size
• Number of neuron in hidden usually 2/3 input size
HOWEVER, this is not always TRUE. The best way is to keep experimenting
Exercise
Neural Network for Regression
Load MASS library
Use “Boston” data
Predict the median value of the house (medv)
Do the following:
Data preparation
- Split the data into train and test set. Train set comprises 0.75 % of the data
Model 1:
- Create the linear regression model using the train data set (using lm or glm)
- Predict the “medv” from the test data set
- Calculate the RSS of the test set
- Calculate the TSS of the test set
- Calculate R2 of the test set
- Calculate MSE of the test set
Regression Continues
Model 2:
- Load “neuralnet” library
library(neuralnet)
- Create the regression model using neural network algorithm with one hidden layer
with 8 node. Follow the code?
n=names(train)
f=as.formula(paste("medv~",paste(n[!n %in% "medv"],collapse="+")))
nn <- neuralnet(f,data=train,hidden=c(8),linear.output=T)
What happened? Do you see message like this?
“algorithm did not converge in 1 of 1 repetition(s) within the stepmax”
• Predict the “medv” from the test data set
• Calculate the RSS of the test set
• Calculate the TSS of the test set
• Calculate R2 of the test set
• Calculate MSE of the test set
• Plot the model graph
Compare with the result from linear model.
Neural Network Plot
Use plot(“nn model”) to plot the graph
Neural Network Additional Tips
• Preprocessed data using normalization
• Usually scaling in the intervals [0,1] or [-1,1] tends to give better results.
Exercise 2
Same as exercise model 2, but normalized the data.
• Predict the “medv” from the test data set
• Calculate the RSS of the test set
• Calculate the TSS of the test set
• Calculate R2 of the test set
• Calculate MSE of the test set
Compare with the result from linear model. Can you improve it?
What is the lesson learned?
Neural Network for Binomial
Classification
Data Exploration and Modeling
Use “credit_dlqn.csv” data
Explore the data
• How many variable?
• What is the type of variable?
• Any other variable that you think are needed to create credit model?
Do the following:
Data preparation
- Split the data into train and test set. Train set comprises 0.75 % of the data
Model 1:
• Use logistic regression to predict the default in 2 years
Model 2:
• Use neural network to predict the default in 2 years
• Use one hidden layer with number of node is 2/3 of input (equal to 7)
Continue Experimenting
How long do you finish the model?
NOTE : Neural Network is very slow to converge. Depend on the objective of the
business, as Data Scientist you have to be very considerate when choosing an
algorithm
• Try with another number of node in hidden layer, such as 2 node
• How is the result?
• How is the accuracy compared to logistic regression?
Recall on Confusion Table
• Source wikipedia
Assignment – Due to Next Week
• Increase the precision of the neural network model. Use neural network
with different parameter
• In word document, tell what is the improvement that you can obtaind,
what is your method, why it is work, why it doesn’t work
• Submit your code and word document to trainer.datascience@gmail.com
before 23 May 2016 23:59:59
References
• Machine Learning. Courses in Coursera by
Andrew Ng, 2013.
• Hastie T., Tibshirani R., Witten D. and James G.
The Introduction of Statistical Learning.
Springer. 2014.

More Related Content

PPTX
Indian railway history ppt
PDF
Artificial Neural Network report
PPTX
Artificial neural network
PPT
DOCX
Link analysis .. Data Mining
PPT
Latent Semantic Indexing and Analysis
PDF
Neural Networks: Model Building Through Linear Regression
Indian railway history ppt
Artificial Neural Network report
Artificial neural network
Link analysis .. Data Mining
Latent Semantic Indexing and Analysis
Neural Networks: Model Building Through Linear Regression

What's hot (20)

PPT
5.2 mining time series data
PPTX
Smart grid paper presentation
PPTX
ppt.pptx
PPTX
Millions quotes per second in pure java
PPTX
Braking and multi-quadrant operation of VSI drives,Cycloconverter based indu...
PPTX
Automated Meter Reading System
PDF
Torque Ripple Minimization of a BLDC Motor Drive by Using Electronic Commutat...
PPTX
Body Area Network Presentation
PPTX
Electric traction system final upload
PPTX
Vector control of pmsm
PPTX
Vector Control of AC Induction Motors
PPTX
Evolutionary computing - soft computing
PPT
Unit 4
PPTX
Summer training from Indian Railway ppt (2)
PPTX
Data Mining: Application and trends in data mining
PPTX
Railway project
PPTX
Eee ppt
DOCX
Pega 7 csa questions,pega training in USA
PDF
Machine Learning: Introduction to Neural Networks
DOCX
Comparative study of ANNs and BNNs and mathematical modeling of a neuron
5.2 mining time series data
Smart grid paper presentation
ppt.pptx
Millions quotes per second in pure java
Braking and multi-quadrant operation of VSI drives,Cycloconverter based indu...
Automated Meter Reading System
Torque Ripple Minimization of a BLDC Motor Drive by Using Electronic Commutat...
Body Area Network Presentation
Electric traction system final upload
Vector control of pmsm
Vector Control of AC Induction Motors
Evolutionary computing - soft computing
Unit 4
Summer training from Indian Railway ppt (2)
Data Mining: Application and trends in data mining
Railway project
Eee ppt
Pega 7 csa questions,pega training in USA
Machine Learning: Introduction to Neural Networks
Comparative study of ANNs and BNNs and mathematical modeling of a neuron
Ad

Viewers also liked (20)

PPTX
Customer Segmentation using Clustering
PDF
Artificial neural networks
PPT
Ann by rutul mehta
PPTX
K Nearest Neighbor Presentation
PDF
Introduction to Artificial Neural Network
PDF
Intoduction to Neural Network
PPTX
Artificial neural network
PPTX
Artificial neural network
PPTX
Neural networks...
PPT
Back propagation
PPTX
neural network
PPTX
Artificial neural network
PPT
Artificial neural networks
PDF
(Artificial) Neural Network
DOCX
PDF
Artificial Neural Network Implementation on FPGA – a Modular Approach
PPTX
14. mohsin dalvi artificial neural networks presentation
PDF
Artificial Neural Network Abstract
PPTX
what is neural network....???
PPTX
Research professional activity network analysis
Customer Segmentation using Clustering
Artificial neural networks
Ann by rutul mehta
K Nearest Neighbor Presentation
Introduction to Artificial Neural Network
Intoduction to Neural Network
Artificial neural network
Artificial neural network
Neural networks...
Back propagation
neural network
Artificial neural network
Artificial neural networks
(Artificial) Neural Network
Artificial Neural Network Implementation on FPGA – a Modular Approach
14. mohsin dalvi artificial neural networks presentation
Artificial Neural Network Abstract
what is neural network....???
Research professional activity network analysis
Ad

Similar to Artificial Neural Network (20)

PPSX
Perceptron in ANN
PPTX
08 neural networks
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PPTX
Neural networks
PPTX
Deeplearning for Computer Vision PPT with
PPTX
UNIT IV NEURAL NETWORKS - Multilayer perceptron
PPTX
Deep learning from scratch
PDF
Electricity Demand Forecasting Using ANN
PDF
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
PPTX
PRML Chapter 5
PDF
Boundness of a neural network weights using the notion of a limit of a sequence
PDF
unit 1- NN concpts.pptx.pdf withautomstion
PDF
Deep Feed Forward Neural Networks and Regularization
PDF
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
PDF
Artificial Neural Network for machine learning
PDF
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
PPT
eam2
PPTX
Deep Learning with Apache MXNet (September 2017)
PDF
Neural Network Part-2
Perceptron in ANN
08 neural networks
Machine Learning Essentials Demystified part2 | Big Data Demystified
Neural networks
Deeplearning for Computer Vision PPT with
UNIT IV NEURAL NETWORKS - Multilayer perceptron
Deep learning from scratch
Electricity Demand Forecasting Using ANN
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
PRML Chapter 5
Boundness of a neural network weights using the notion of a limit of a sequence
unit 1- NN concpts.pptx.pdf withautomstion
Deep Feed Forward Neural Networks and Regularization
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Artificial Neural Network for machine learning
Economic Load Dispatch (ELD), Economic Emission Dispatch (EED), Combined Econ...
eam2
Deep Learning with Apache MXNet (September 2017)
Neural Network Part-2

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Computer network topology notes for revision
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Business Acumen Training GuidePresentation.pptx
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Computer network topology notes for revision
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Ppt On Nestle.pptx huunnnhhgfvu
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
oil_refinery_comprehensive_20250804084928 (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction-to-Cloud-ComputingFinal.pptx

Artificial Neural Network

  • 1. Artificial Neural Network Dessy Amirudin May 2016 Data Science Indonesia Bootcamp
  • 4. Any sensor input! Hearing with vibration http://www.eaglemanlab.net/sensory-substitution Human Echolocation http://www.sciencemag.org/news/2014/11/how-blind- people-use-batlike-sonar
  • 5. Any sensor input! Seeing with sound: https://www.newscientist.com/article/mg20727731-500- sensory-hijack-rewiring-brains-to-see-with-sound/ 3rd Eye Frog: https://www.newscientist.com/article/mg20727731-500- sensory-hijack-rewiring-brains-to-see-with-sound/
  • 9. Neural Network in Brief History • Algorithm to mimic the brain • Was widely use in 80s and early 90s • Popularity diminishes in late 90s. Why? • Recent resurgence: State-of-the-art technique for many application • Can be used for regression and classification
  • 10. Recent Application of NN • Speech recognition • Image recognition and search • Playlist recommendation • Skype Translate
  • 11. Other application of NN • Stock market prediction • Credit Worthiness • Credit Rating FINANCIAL • Medical Diagnosis • Electronic Noses MEDICAL • Churn prediction • Targeted Marketing • Service Usage Forecast SALES & MARKETING & many more
  • 12. One Layer Neural Network input output input node = 𝑖 𝑤𝑖 𝑎𝑖 = output node = ø(wTa) wTa Ø is the activation function
  • 13. Some Common Activation Function linear step sigmoid tanh ø(wTa) = wTa ø(wTa) = 0 𝑓𝑜𝑟 wTa < 𝑡 1 𝑓𝑜𝑟 wTa ≥ 𝑡 ø(wTa) = 1 1+𝑒−𝐰 𝐓 𝐚 ø(wTa) = 𝑒 𝐰 𝐓 𝐚−𝑒−𝐰 𝐓 𝐚 𝑒 𝐰 𝐓 𝐚+𝑒−𝐰 𝐓 𝐚
  • 14. Revisit One Layer Neural Network a. If the activation function is linear, what will happen? b. If the activation function is sigmoid, what will happen?
  • 15. Do we really need many layer?
  • 16. Look at a classification problem • Linear classification model is not enough • Add quadratic or qubic term as necessary Suppose we have a classification problem, with n=100 feature Adding all quadratic term, the number of variable will become ~5000 Adding all qubic and quadratic term, the number of variable will become ~170K http://sebastianraschka.com
  • 18. Dog vs Cat vs 100 x 100 pixels (example) ~10000 variables Adding all quadratic term, the number of variable become ~ 50 million variables
  • 20. Recall a sigmoid function
  • 21. AND Function 𝑎1 𝑎2 𝑎0 𝑎1 𝑎2 output 0 0 0 0 1 0 1 0 0 1 1 1 𝑎0 =1, this is the bias value Activation function is sigmoid Suppose we assign the weight 𝑤0 = -20 𝑤1 = 15 𝑤1 = 15 The AND logic will be correct 𝑤0 𝑤1 𝑤2
  • 22. OR Function 𝑎1 𝑎2 𝑎0 𝑎1 𝑎2 output 0 0 0 0 1 1 1 0 1 1 1 1 𝑎0 =1, this is the bias value Task 1: Find value of 𝑤0, 𝑤1 and 𝑤2 to make the OR logic is TRUE What is the value of the weight if the logic is NOT (𝑎1 OR 𝑎2)? 𝑤0 𝑤1 𝑤2
  • 23. XOR Function 𝑎1 𝑎2 output 0 0 0 0 1 1 1 0 1 1 1 0 𝑎0 =1, this is the bias value Can you find the weight for XOR function? XOR = NOT(NOT(𝑎1 OR 𝑎2) OR (𝑎1 AND 𝑎2)) 𝑎1 𝑎2 AND NOT (OR) output 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 1 1 1 0 0
  • 24. Multilayered Network for XOR Representation 𝑎0 1 𝑎1 1 𝑎2 1 𝑎0 2 𝑎1 2 𝑎2 2 𝑤01 23 𝑤11 23 𝑤21 23 AND NOT OR 𝑤01 12 𝑤02 12 𝑤11 12 𝑤12 12 𝑤21 12 𝑤22 12 Now, a multilayered network is necessary
  • 25. How to assign the weight?
  • 26. Intro to Optimization How to find minimum value of this function? How to find minimum value of this function?
  • 27. Gradient Descent Method • Suppose the function is 𝑓(𝑥) the descent direction is the first derivative 𝑓′(𝑥) • Parameter to start the algorithm α = learning parameter, usually set with small value such as 0.001 ϵ = convergence parameter, usually set with very small value such as 1e-6
  • 28. Gradient Descent Method  Initialize with k=0, some value of α and ϵ  Start from random 𝑥 as 𝑥 𝑘  Calculate cost function 𝑓 𝑥 𝑘  Update value of 𝑥 as 𝑥 𝑘+1 = 𝑥 𝑘 − α𝑓′ 𝑥 𝑘  Calculate cost difference δ= 𝑓 𝑥 𝑘 - 𝑓 𝑥 𝑘+1  If δ< ϵ , STOP We can write linear regression learning as an optimization problem min 𝛽 1 𝑛 (𝑦𝑖 − 𝛽 𝑇 𝑥𝑖)2
  • 29. Exercise 1 • Load “auto_data.csv” • Create linear regression model with dependent variable (y) = “weight” and independent variable(x) = “mpg” • What is the value of intercept? • What is the value of mpg’s coefficient? • What is the MSE’s value? • Plot the mpg ~ weight and it’s model • Can you write R code to find the optimum value of intercept’s coefficient and mpg’s coefficient using the gradient descent method?
  • 31. Forward and Backward Propagation
  • 32. Forward Propagation • 𝑎1 = input • 𝑠2 = 𝑤12 𝑎1 • 𝑎2 = ∅(𝑠2) (add bias) • 𝑠3 = 𝑤23 𝑎2 • 𝑎3 = ∅(𝑠3) (add bias) • 𝑠4 = 𝑤34 𝑎3 • 𝑎4 = ∅(𝑠4) 𝑠𝑖 3 𝑠𝑖 2 𝑠𝑖 4 𝑎𝑖 3𝑎𝑖 2𝑎𝑖 1 𝑎𝑖 4 𝑤𝑖𝑗 12 𝑤𝑖𝑗 23 𝑤𝑖𝑗 34 Get the output using Forward Propagation How to update weight using gradient descent?
  • 33. Backward Propagation • 𝑠2 = 𝑤12 𝑎1 • 𝑎2 = ∅ 𝑠2 • 𝑠3 = 𝑤23 𝑎2 • 𝑎3 = ∅ 𝑠3 • 𝑠4 = 𝑤34 𝑎3 • 𝑎4 = ∅ 𝑠4 = 𝑦𝑜 Define error δ = 1 2 (𝑦𝑜 − 𝑦𝑖)2 input output 𝑎1 𝑠2 𝑎2 𝑠3 𝑎3 𝑠4 𝑎4 𝑤12 𝑤23 𝑤34 Given training example (𝑎𝑖, 𝑦𝑖) update 𝑤34 𝜕𝛿 𝜕𝑤34 = 𝜕 𝜕𝑤34 1 2 (𝑦𝑜 − 𝑦𝑖)2 = 𝑦𝑜 − 𝑦𝑖 𝜕 𝜕𝑤34 ∅ 𝑠4 = (𝑦𝑜 − 𝑦𝑖)∅′ 𝑠4 𝑎3 update 𝑤23 𝜕𝛿 𝜕𝑤23 = 𝜕 𝜕𝑤23 1 2 (𝑦𝑜 − 𝑦𝑖)2 = 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝜕 𝜕𝑤23 𝑤34∅ 𝑠3 = 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34∅′ 𝑠3 𝑎2 sigm sigm sigm
  • 34. Backward Propagation In case of one output with many hidden layer, the formulation for hidden layer in one particular node become 𝜕𝛿 𝜕𝑤23 = ∅′ 𝑠3 𝑎2 𝑘∈𝐾 𝑦𝑜 − 𝑦𝑖 ∅′ 𝑠4 𝑤34
  • 35. Neural Network Tips • Most of the time, one hidden layer is enough • Number of neuron between input layer size and output layer size • Number of neuron in hidden usually 2/3 input size HOWEVER, this is not always TRUE. The best way is to keep experimenting
  • 37. Neural Network for Regression Load MASS library Use “Boston” data Predict the median value of the house (medv) Do the following: Data preparation - Split the data into train and test set. Train set comprises 0.75 % of the data Model 1: - Create the linear regression model using the train data set (using lm or glm) - Predict the “medv” from the test data set - Calculate the RSS of the test set - Calculate the TSS of the test set - Calculate R2 of the test set - Calculate MSE of the test set
  • 38. Regression Continues Model 2: - Load “neuralnet” library library(neuralnet) - Create the regression model using neural network algorithm with one hidden layer with 8 node. Follow the code? n=names(train) f=as.formula(paste("medv~",paste(n[!n %in% "medv"],collapse="+"))) nn <- neuralnet(f,data=train,hidden=c(8),linear.output=T) What happened? Do you see message like this? “algorithm did not converge in 1 of 1 repetition(s) within the stepmax” • Predict the “medv” from the test data set • Calculate the RSS of the test set • Calculate the TSS of the test set • Calculate R2 of the test set • Calculate MSE of the test set • Plot the model graph Compare with the result from linear model.
  • 39. Neural Network Plot Use plot(“nn model”) to plot the graph
  • 40. Neural Network Additional Tips • Preprocessed data using normalization • Usually scaling in the intervals [0,1] or [-1,1] tends to give better results.
  • 41. Exercise 2 Same as exercise model 2, but normalized the data. • Predict the “medv” from the test data set • Calculate the RSS of the test set • Calculate the TSS of the test set • Calculate R2 of the test set • Calculate MSE of the test set Compare with the result from linear model. Can you improve it? What is the lesson learned?
  • 42. Neural Network for Binomial Classification
  • 43. Data Exploration and Modeling Use “credit_dlqn.csv” data Explore the data • How many variable? • What is the type of variable? • Any other variable that you think are needed to create credit model? Do the following: Data preparation - Split the data into train and test set. Train set comprises 0.75 % of the data Model 1: • Use logistic regression to predict the default in 2 years Model 2: • Use neural network to predict the default in 2 years • Use one hidden layer with number of node is 2/3 of input (equal to 7)
  • 44. Continue Experimenting How long do you finish the model? NOTE : Neural Network is very slow to converge. Depend on the objective of the business, as Data Scientist you have to be very considerate when choosing an algorithm • Try with another number of node in hidden layer, such as 2 node • How is the result? • How is the accuracy compared to logistic regression?
  • 45. Recall on Confusion Table • Source wikipedia
  • 46. Assignment – Due to Next Week • Increase the precision of the neural network model. Use neural network with different parameter • In word document, tell what is the improvement that you can obtaind, what is your method, why it is work, why it doesn’t work • Submit your code and word document to trainer.datascience@gmail.com before 23 May 2016 23:59:59
  • 47. References • Machine Learning. Courses in Coursera by Andrew Ng, 2013. • Hastie T., Tibshirani R., Witten D. and James G. The Introduction of Statistical Learning. Springer. 2014.