2. Introduction to pattern recognition
“The assignment of a physical object or event to one of several
prespecified categories” -- Duda & Hart
• A pattern is an object, process or event that can be given a name.
• A pattern class (or category) is a set of patterns sharing common
attributes and usually originating from the same source.
• During recognition (or classification) given objects are assigned to
prescribed classes.
• A classifier is a machine which performs classification.
2
3. Examples of applications
• Optical character recognition (OCR)
• Handwritten: sorting letters by postal code.
• Printed texts: reading machines for blind people, digitalization of text
documents
• Biometrics
• Face recognition.
• Finger prints recognition.
• Speech recognition.
• Iris recognition
• Diagnostic systems
• Medical diagnosis: X-Ray, ECG,
• A mammogram analysis etc
• Video Analytics
• Automated Target Recognition (ATR).
• Image segmentation and analysis (recognition from aerial or satelite
photographs).
3
4. Approaches
4
Statistical PR: based on underlying statistical model of patterns and
pattern classes.
• Baysian classifiers
• Linear discriminant analysis.
• Decision Tree
• Nearest-Neighbor
Arificial neural networks (ANNs): are represented as networks of cells
modeling neurons of the human brain (connectionist approach).
• Single layer perceptron
• Multi layer perceptron
• Neural Tree
5. Artificial Neural Network (ANN)
• Single layer perceptron • Multi layer perceptron(MLP)
5
• A perceptron has a single layer
of weights
• Easy to implement but weak
generalization capability.
• MLP has two or more layers of
weights separated by a layer of nodes.
• Good capability to generalize but
difficult to decide architecture.
6. Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
categorical
categorical
continuous
class
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Splitting Attributes
Training Data Model: Decision Tree
7. Another Example of Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
categorical
categorical
continuous
class
MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes No
Married
Single,
Divorced
< 80K > 80K
There could be more than one tree that fits
the same data!
8. Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Start from the root of tree.
Assign Cheat to “No”
9. Decision Tree Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Tree
Induction
algorithm
Training Set
Decision Tree
10. Decision Tree Induction
• Many Algorithms:
– Hunt’s Algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ,SPRINT
– Available in Software Weka
http://repository.seasr.org/Datasets/UCI/arff/
(weka datasets download)
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
(Weka Software download)
11. Conclusions on DT
DT is “built" by splitting the source set into subsets recursively, based
on an attribute test (AT).
AT: A reliability test procedure in which the items under test are classified according to qualitative
characteristics.
• Advantages:
• Simple to understand and interpret
• Fast learning
• Limitations:
• Weak Generalization
capability (over fitting)
11
12. Neural Tree (NT): A hybridization of DT &
ANN
AT
AT AT
Perceptron MLP
DT
Neural tree
P. E. Utgoff, 1990 MLP tree
12
AT: Attributr test
15. Drawbacks of NT
NT
• Simple perceptron gets stuck in local minima resulting poor
generalization. This may lead to larger tree generation or non-
converging tree building process.
MLP tree
• Difficulties of selecting appropriate architecture (number of hidden
layers and number of node in hidden layers).
High-order perceptron tree
• Increased computational cost.
• Simple perceptron based NT is adopted
and new rules are to overcome the drawbacks
15
Quadratic classifier
16. 16
Overfitting of training set by NT and
adopted solution.
The depth of the NT is considerable.
The depth of NT is significantly reduced.
18. Balancing NT through perceptron rejection
Reliability of a perceptron depends on two factors:
1. The perceptron's classification error has to be equally distributed among the
child nodes i.e.
2. The initial error of the perceptron should be significantly reduced after
training.
Putting 1 & 2 together:
So the criterion to reject a perceptron is:
Where E0 is initial error, , and
is total number of classes present at current node.
18
t
t E
E
E
and
m
E
E
)
( min
max
0
}
{
max
max i
i E
E min min { }
i i
E E
}
,...,
2
,
1
{ m
i
m
E
E t
i
2
0
m
E
Ei
m
E
m
m
E
m
E
E i
t
0
2
0
.
.
m
19. How to consider a pattern to be removable
The decision to remove a pattern from the training set is taken based
upon the following aspects:
• probability of a pattern to belong to a class
• total classification error of the perceptron
Where and , are
the maximum and second maximum classification probabilities, Et
is total error, and R is reliability factor.
19
if
end
TS
in
included
is
x
Pattern
else
removed
is
x
Pattern
then
C
x
and
R
E
and
Th
h
if mazx
t
1
max
1
max
)
|
(
)
|
( 2
max
1
max
max x
C
P
x
C
P
h
)
|
( 1
max x
C
P )
|
( 2
max x
C
P
21. Fusion of ANN with statistical classifier
• A single classifier is not suitable to classify all kind of datasets.
• Although ANN is an universal classifier but depends on the
choice of optimal architecture.
• Moreover ANN has tendency to stick in local minima as the
optimization function is not always convex.
• In such a case fusion with a statistical classifier could be a
good choice.
21
22. Linear Discriminant Analysis
This method maximizes
the ratio of between-
class variance to the
within-class variance in
any particular data set
there by guaranteeing
maximal separability.
24. Objective of LDA
The objective of LDA is to seek the direction not only maximizing the between-class
scatter of the projected samples, but also minimizing the with-in-class scatter. These two
objectives can be achieved simultaneously by maximizing the following function:
Maximize
where the between-class scatter matrix SB is defined as
In which m is the k-dimensional sample mean for the whole set, while mi is the
sample mean for ith class. The with-in-class scatter matrix SW is defined by
where the scatter matrix Si corresponding to ith class is defined by
25. Incorporating Linear Discriminant Analysis in Neural Tree
25
Perceptron unable to generate
Splitting hyperplane.
Patterns projected in lower dimensional
plane by LDA and then separated
into groups.
28. Bibliography
1. A. Rani, C. Micheloni, G. L. Foresti, Balancing Neural Trees to Improve
Classification Performance. In proceeding of International Conference
on Neural Networks(ICNN), held at Oslo Norway during July 29-31,
2009.
2. A. Rani, C. Micheloni, G. L. Foresti, Improving the Performance of
Neural Tree for Pattern Classification. In proceeding of fifth Convegno
dell Gruppo Italiano Ricercatori in Pattern Recognition, held at
Salerno (Italy) during 10- 11 June 2010.
3. A. Rani, C. Micheloni, G. L. Foresti, A Balanced Neural Tree for Pattern
Classification, Neural Networks, 2012
4. A. Rani, C. Micheloni, G. L. Foresti, Incorporating Linear Discriminant
Analysis in Neural Tree for Multi-dimensional Splitting, Applied Soft
Computing, 2013