Regression and Classification: An Artificial Neural Network Approach

Welcome to my presentation on
Regression and Classification: An Artificial
Neural Network Approach
Presented by
Md. Menhazul Abedin
Research student
Dept. of Statistics
University of Rajshahi
Rajshahi-6205

Dedication
• This presentation is dedicated to my
honorable supervisor
12/12/2016 2

Three pioneer of ANN
Warren McCulloch Walter Pitts
Frank Rosenblatt
12/12/2016 3

Outlines
Motivation/Why this study?
Objectives
Methodology
Findings
Conclusion
Limitation
Area of further research
12/12/2016 4

Motivation/Why this study?
• Vector, matrix, sound, image, wave, string, text etc.
• How to analyze them? Pitfall of human civilization from several decades.
12/12/2016 5

Objectives?
• To study neural network as a technique for
regression and classification.
• To compare neural network with classical
regression and classification techniques.
• To study the limitations of neural network.
12/12/2016 6

• Structure of neuron
12/12/2016 7

What is ANN?
Biological neural network
Artificial neural network
12/12/2016 8

• How many hidden layers considered?
 More hidden layer more approximate nonlinearity
• More hidden layer  need much time to converge.
• Weight adjusted by iterative method (backpropagation)
• Analogy between biological and artificial
neural networks
12/12/2016 9

Historical Background of Artificial
Neural Network
• In 1943, neurophysiologist Warren McCulloch and mathematician Walter
Pitts wrote a paper on how neurons might work.
• In 1949, Donald Hebb wrote The Organization of Behavior (the ways in
which humans learn)
• M. Minsky (1951) built a reinforcement-based network learning system.
• F. Rosenblatt (1958) the first practical Artificial Neural Network (ANN) - the
perceptron,
• B. Widrow & M.E. Hoff (1960) introduced adaptive percepton-like network
using Least Mean Square (LMS) error algorithm.
• 1969 – Marvin Minsky and Seymour showed that perceptron model is not
capable of representing many important problems
• 1973 – Christoph Von Der Malsburg used a neuron model that was
nonlinear and biologically more motivated
• 1974 – Paul Werbos Developed a learning precedure called
backpropagation of error.
12/12/2016 10

Historical Background of Artificial
Neural Network
• 1986, The application area of the MLP networks
remained rather limited until the breakthrough when a
general back propagation algorithm for a multi-layered
perceptron was introduced by Rummelhart and
Mclelland.
• 1988, Radial Basis Function (RBF) networks were first
introduced by Broomhead & Lowe. Although the basic
idea of RBF was developed 30 years ago under the
name method of potential function, the work by
Broomhead & Lowe opened a new frontier in the
neural network community.
12/12/2016 11

ANN regression
• Linear activation function  Gives continuous
values.
12/12/2016 12

ANN classification
• For two class  Sigmoid function
( threshold > 0.5 one class & threshold < 0.5 another class)
• More class  Softmax function
(Gives probability for each class)
• tanh function may used as activation function
12/12/2016 13

Activation functions
• Linear function , 𝜑 η = η
• Sigmoid function , 𝜑 η =
1
1+ 𝑒−η
Where η=xθ.
• Softmax function,
𝜑 η = (
exp η1
𝑖=1
𝑘
exp η 𝑖
, … ,
exp η 𝑘
𝑖=1
𝑘
exp η 𝑖
)
12/12/2016 14

Perceptron learning model specifies the probability of a binary
output yi ε {0,1} given the input xi as follows:
( | , ) ( | ( , ))i i i ip y x w Ber y sigm x w
1
( | , ) ( | ( , ))
n
i i
i
p y X w Ber y sigm x w

 
1
1
1 1
( | , ) 1
1 1
i i
i i
y yn
x w x w
i
p y X w
e e

 

   
         

1
; ( 1| , )
1 i
i i i x w
p y x w
e
   

Cost function:
 
1
( ) log ( | , )
= log (1 )log(1 )
n
i i i i
i
c w p y X w
y y

 
    
Cross entropy
Construction of cost function: sigmoid formulation
sigm(xi,w)=
1
1 ix w
e

Xiw=0
12/12/2016 15

Softmax formulation
sigm(xi,w)=
1
1 ix w
e

+1
xi1
xi2
+1
b1=w10
w11
w21
w12
w22
b2=w20
Ʃ
Ʃ
u11
u12
Softmaxlayer
1
1 2
1
i
i i
x w
ix w x w
e
e e
 

2
1 2
2
i
i i
x w
ix w x w
e
e e
 

1 2 1i i   
12/12/2016 16

Indicator: 1 if
( )
0 otherwise
i
c i
y c
I y

 

0 1( ) ( )
1 2( | , ) i iI y I y
i i i ip y x w   
0 1( ) ( )
1 2
1
( | , ) i i
n
I y I y
i i
i
p y X w

  
1
1 2
2
1 2
1
2
y 0
( | , )
y 1
i
i i
i
i i
x w
i ix w x w
i i x w
i ix w x w
e
if
e e
p y x w
e
if
e e

   
 
   
 
0 1 1 2
1
( ) log ( | , ) ( ( )log ( )log )
n
i i i i
i
c w p y X w I y I y

      
Construction of cost function: Softmax formulation
X
Linear
Layer
Log
softmax
layer
NLL C(w)
12/12/2016 17

Weight update (Backpropagation)
• Derivative cost w.r.t inputs (layer wise).
• Information go from 𝑧1
(𝑥) to 𝑧4
(𝑥) = c forward
message.
• Error propagate backward message & update its
weights.
12/12/2016 18

Optimization
Our goal is to optimize the cost function.
Different optimization techniques
Gradient descent algorithm
Newton's algorithm
Stochastic gradient descent(SGD)
Online learning, batch & mini batch
optimization
12/12/2016 19

Regression (Findings)
• Used data set = 7
• (Regression = 4, classification = 3)
• Pharmaceuticals data:
Size 26
No. of variables 4 (one dependent and three independent)
Outlier Present (6th , 10th ,and 26th )
Autocorrelation Absence
Multicollinearity Absence
Normality Present
Data type Real
Cross validation LOOCV
Applied methods Linear model, Polynomial & ANN
12/12/2016 20

Regression (cont…)
ANN is the best regression model
12/12/2016 21

Regression(cont..)
• Yacht Hydrodynamics Data:
Size 308
No. of variables 7 (one dependent and six independent)
Outlier Absence
Normality Absence (Clustered)
Data type Real
Cross validation Training set and test set
Applied methods Linear model, Polynomial & ANN
12/12/2016 22

• Results of Yacht hydrodynamics..
12/12/2016 23

• 100 times repeat for different training and test set
• Box plot of test error  grow sense about error variation
• ANN is the best regression model
12/12/2016 24

Regression(cont..)
• Simulated data-1
Size 1000
No. of variables 10 (one dependent and nine independent)
Outlier Absence
Normality present
Data type Real
Applied methods Linear model & ANN
12/12/2016 25

• Results of Simulated data-1
12/12/2016 26

12/12/2016 27

Regression (cont…)
• Simulated data-2
Size 20000
No. of variables 20 (one dependent and nine independent)
Outlier Absence
Multicollinearity Strong Multicollinearity
Normality present
Data type Real
Applied methods Linear model & ANN
12/12/2016 28

• Results of Simulated data-2
12/12/2016 29

12/12/2016 30

Classification
• IRIS data
Size 150
No. of variables 5 (one dependent and four independent)
No. of class Three (Setosa, Versicolor, Virginica
Type Balanced
Data type Real
Applied methods Logistic, LDA, QDA, KNN, NB & ANN
12/12/2016 31

Classification (cont…)
• Results
• ANN is the best classifier
Methods Classification rate Misclassification rate
Logistic 0.98 0.02
LDA 0.98 0.02
QDA 0.98 0.02
KNN 0.95 0.05
NB 0.95 0.05
ANN 0.99 0.01
12/12/2016 32

• Fertility data
Size 100
No. of variables 5 (one dependent and four independent)
No. of class Two (Normal & Altered)
Type Imbalanced
Data type Real
Applied methods Logistic, LDA, KNN, NB & ANN
12/12/2016 33

• Results
Methods Accuracy Sensitivity Specificity PPV NPV
Logistic 0.84 0.87 0.00 0.96 0.00
LDA 0.83 0.95 0.00 0.87 0.00
KNN 0.81 0.90 0.16 0.88 0.20
NB 0.82 0.94 0.00 0.87 0.00
ANN 0.88 0.95 0.34 0.91 0.50
12/12/2016 34

• Leukemia data
Size 72
No. of variables 7130 (one dependent and 7129 independent)
No. of class Two (ALL & AML)
Type Balanced
Data type Real
Applied methods Logistic, LDA, QDA, KNN, NB & ANN
12/12/2016 35

• Results
Methods Accuracy Sensitivity Specificity
Logistic 0.47 0.62 0.31
LDA 0.62 0.68 0.52
QDA 0.65 1.00 0.00
KNN 0.54 0.65 0.32
NB 0.65 1.00 0.00
ANN 0.64 0.68 0.56
12/12/2016 36

Conclusion
• In all cases ANN is the best .
Data Problems ANN Status
Pharmaceuticals Outlier Best regression model
Yacht hydro: Clustered Best regression model
Simulated data-1 Fresh Best regression model
simulated data-2 Strong multicollinearity Best regression model
IRIS Balanced Best classifier
Fertility Imbalanced Best classifier
Leukemia Large (7129 varisbles) Best classifier
12/12/2016 37

Limitations
• Backpropagation no guarantee of absolute
minimum
• VC dimension  unclear
• Weights initialization random  result is not unique.
• Some weights are zero  network doesn’t converge.
• Computation of confidence interval is so hard.
• Doesn’t perform t-test, F-test.
12/12/2016 38

Areas of further research
• Robust, generalized ridge, principle component, latent
root, lasso and step wise regression.
• Multivariate regression, time series analysis
• Application of artificial neural network on
unsupervised learning
• Study of semi supervised learning
• Comparative study with others machine learning
techniques and data mining techniques
• Improvement of backpropagation algorithm
12/12/2016 39

Regression and Classification: An Artificial Neural Network Approach

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Regression and Classification: An Artificial Neural Network Approach (20)

More from Khulna University (11)

Recently uploaded (20)

Regression and Classification: An Artificial Neural Network Approach