SlideShare a Scribd company logo
DAT630

Classification
Alternative Techniques
Darío Garigliotti | University of Stavanger
09/10/2017
Introduction to Data Mining, Chapter 5
Recall
Attribute set

(x)
Class label

(y)
Classification
Model
Outline
- Alternative classification techniques

- Rule-based
- Nearest neighbors
- Naive Bayes
- Ensemble methods
- Class imbalance problem

- Multiclass problem
Rule-based classifier
Rule-based Classifier
- Classifying records using a set of "if… then…"
rules

- Example

- R is known as the rule set
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
Classification Rules
- Each classification rule can be expressed in
the following way
ri : (Conditioni) ! yi
rule antecedent 

(or precondition)
rule consequent 

Classification Rules
- A rule r covers an instance x if the attributes of
the instance satisfy the condition of the rule
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
Which rules cover the "hawk" and the "grizzly bear"?
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
Classification Rules
- A rule r covers an instance x if the attributes of
the instance satisfy the condition of the rule
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
Rule Coverage and
Accuracy
- Coverage of a rule

- Fraction of records that
satisfy the antecedent of a
rule
- Accuracy of a rule

- Fraction of records that
satisfy both the antecedent
and consequent of a rule
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single) → No
Coverage = 40%, Accuracy = 50%
How does it work?
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Properties of the Rule Set
- Mutually exclusive rules

- Classifier contains mutually exclusive rules if the
rules are independent of each other
- Every record is covered by at most one rule
- Exhaustive rules

- Classifier has exhaustive coverage if it accounts for
every possible combination of attribute values
- Each record is covered by at least one rule
- These two properties ensure that every record
is covered by exactly one rule
When these Properties are
not Satisfied
- Rules are not mutually exclusive

- A record may trigger more than one rule
- Solution?
- Ordered rule set
- Unordered rule set – use voting schemes
- Rules are not exhaustive

- A record may not trigger any rules
- Solution?
- Use a default class (assign the majority class from the
training records)
Ordered Rule Set
- Rules are rank ordered according to their priority

- An ordered rule set is known as a decision list
- When a test record is presented to the classifier 

- It is assigned to the class label of the highest ranked
rule it has triggered
- If none of the rules fired, it is assigned to the default
class R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Rule Ordering Schemes
- Rule-based ordering

- Individual rules are ranked based on some quality
measure (e.g., accuracy, coverage)
- Class-based ordering

- Rules that belong to the same class appear together
- Rules are sorted on the basis of their class
information (e.g., total description length)
- The relative order of rules within a class does not
matter
Rule Ordering Schemes
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
How to Build a Rule-based
Classifier?
- Direct Method

- Extract rules directly from data
- Indirect Method

- Extract rules from other classification models (e.g.
decision trees, neural networks, etc)
From Decision Trees To
Rules
YESYESNONO
NONO
NONO
Yes No
{Married}
{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Classification Rules
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the tree
Rules Can Be Simplified
YESYESNONO
NONO
NONO
Yes No
{Married}
{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Initial Rule: (Refund=No) ∧ (Status=Married) → No
Simplified Rule: (Status=Married) → No
Summary
- Expressiveness is almost equivalent to that of
a decision tree

- Generally used to produce descriptive models
that are easy to interpret, but gives comparable
performance to decision tree classifiers

- The class-based ordering approach is well
suited for handling data sets with imbalanced
class distributions
Exercise
Nearest Neighbors
So far
- Eager learners
- Decision trees, rule-base classifiers
- Learn a model as soon as the training data becomes
available
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set Model
Learning
algorithm
Learn
model
Apply
model
Opposite strategy
- Lazy learners
- Delay the process of modeling the data until it is
needed to classify the test examples
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set Modeling
Apply
model
Instance-Based Classifiers
Atr1 ……... AtrN Class
A
B
B
C
A
C
B
Set of Stored Cases
Atr1 ……... AtrN
Unseen Case
• Store the training records
• Use training records to 

predict the class label of 

unseen cases
Instance Based Classifiers
- Rote-learner

- Memorizes entire training data and performs
classification only if attributes of record match one of
the training examples exactly
- Nearest neighbors

- Uses k “closest” points (nearest neighbors) for
performing classification
Nearest neighbors
- Basic idea

- "If it walks like a duck, quacks like a duck, then it’s
probably a duck"
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
Nearest-Neighbor
Classifiers
- Requires three things

- The set of stored records
- Distance Metric to compute distance between
records
- The value of k, the number of nearest neighbors to
retrieve
Nearest-Neighbor
Classifiers
- To classify an unknown record

- Compute distance to other
training records
- Identify k-nearest neighbors
- Use class labels of nearest
neighbors to determine the class
label of unknown record (e.g., by
taking majority vote)
Unknown record
Definition of Nearest
Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data
points that have the k smallest distance to x
Choices to make
- Compute distance between two points

- E.g., Euclidean distance
- See Chapter 2
- Determine the class from nearest neighbor list

- Take the majority vote of class labels among the k-
nearest neighbors
- Weigh the vote according to distance
- Choose the value of k
Choosing the value of k
- If k is too small, sensitive to noise points

- If k is too large, neighborhood may include
points from other classes
X
Summary
- Part of a more general technique called
instance-based learning

- Use specific training instances to make predictions
without having to maintain an abstraction (model)
derived from data
- Because there is no model building, classifying
a test example can be quite expensive

- Nearest-neighbors make their predictions
based on local information

- Susceptible to noise
Bayes Classifier
Bayes Classifier
- In many applications the relationship between
the attribute set and the class variable is 

non-deterministic

- The label of the test record cannot be predicted with
certainty even if it was seen previously during training
- A probabilistic framework for solving
classification problems

- Treat X and Y as random variables and capture their
relationship probabilistically using P(Y|X)
Example
- Football game between teams A and B

- Team A won 65% team B won 35% of the time
- Among the games Team A won, 30% when game
hosted by B
- Among the games Team B won, 75% when B
played home
- Which team is more likely to win if the game is
hosted by Team B?
Probability Basics
- Conditional probability

- Bayes’ theorem
P(Y |X) =
P(X|Y )P(Y )
P(X)
P(X, Y ) = P(X|Y )P(Y ) = P(Y |X)P(X)
Example
- Probability Team A wins: P(win=A) = 0.65

- Probability Team B wins: P(win=B) = 0.35

- Probability Team A wins when B hosts: 

P(hosted=B|win=A) = 0.3

- Probability Team B wins when playing at home:
P(hosted=B|win=B) = 0.75

- Who wins the next game that is hosted by B?
P(win=B|hosted=B) = ?

P(win=A|hosted=B) = ?
Solution
- Using:

- P(win=B|hosted=B) = 0.5738

- P(win=A|hosted=B) = 0.4262

- See book page 229

P(Y |X) =
P(X|Y )P(Y )
P(X)
Bayes’ Theorem for
Classification
Posterior
probability
P(Y |X) =
P(X|Y )P(Y )
P(X)
Prior
probability
The evidence
Class-conditional
probability
Bayes’ Theorem for
Classification
Posterior
probability
P(Y |X) =
P(X|Y )P(Y )
P(X)
Prior
probability
The evidence
Constant (same for all classes),
can be ignored
Class-conditional
probability
Bayes’ Theorem for
Classification
Posterior
probability
P(Y |X) =
P(X|Y )P(Y )
P(X)
The evidence
Class-conditional
probability
Prior probability
Can be computed from training
data (fraction of records that
belong to each class)
Bayes’ Theorem for
Classification
Posterior
probability
P(Y |X) =
P(X|Y )P(Y )
P(X)
Prior
probability
The evidence
Class-conditional probability

Two methods: Naive Bayes, Bayesian belief network
Naive Bayes
Estimation
- Mind that X is a vector

- Class-conditional probability

- "Naive" assumption: attributes are independent
X = {X1, . . . , Xn}
P(X|Y ) = P(X1, . . . , Xn|Y )
P(X|Y ) =
nY
i=1
P(Xi|Y )
Naive Bayes Classifier
- Probability that X belongs to class Y

- Target label for record X
P(Y |X) / P(Y )
nY
i=1
P(Xi|Y )
y = arg max
yj
P(Y = yj)
nY
i=1
P(Xi|Y = yj)
Estimating class-
conditional probabilities
- Categorical attributes
- The fraction of training instances in class Y that have
a particular attribute value xi
- Continuous attributes
- Discretizing the range into bins
- Assuming a certain probability distribution
number of training instances
where Xi=xi and Y=y
number of training instances
where Y=y
P(Xi = xi|Y = y) =
nc
n
Conditional probabilities
for categorical attributes
- The fraction of training
instances in class Y that
have a particular
attribute value Xi

- P(Status=Married|No)=?

- P(Refund=Yes|Yes)=?
Tid Refund Marital
Status
Taxable
Income Evade
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
categorical
categorical
continuous
class
Conditional probabilities
for continuous attributes
- Discretize the range into bins, or

- Assume a certain form of probability distribution

- Gaussian (normal) distribution is often used
- The parameters of the distribution are estimated from
the training data (from instances that belong to class yj)
- sample mean and variance
P(Xi = xi|Y = yj) =
1
q
2⇡ 2
ij
exp
(xi µij )2
2 2
ij
2
ij
µij
Example
Tid Refund Marital
Status
Taxable
Income Evade
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Example
Tid Refund Marital
Status
Taxable
Income Evade
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
X={Refund=No,
Marital st.=Married,
Income=120K}
P(C)
P(Refund=x|Y) P(Marital=x|Y) Ann. income
No Yes Single Divorced Married mean var
class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975
class=Yes 3/10 3/3 3/3 2/3 1/3 0/3 90 25
Example

classifying a new instance
X={Refund=No, Marital st.=Married, Income=120K}
P(C)
P(Refund=x|Y) P(Marital=x|Y) Ann. income
No Yes Single Divorced Married mean var
class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975
class=Yes 3/10 3/3 3/3 2/3 1/3 0/3 90 25
P(Class=No|X) = P(Class=No) 

× P(Refund=No|Class=No) 

× P(Marital=Married| Class=No) 

× P(Income=120K| Class=No)
7/10
4/7
4/7
0.0072
Example

classifying a new instance
X={Refund=No, Marital st.=Married, Income=120K}
P(C)
P(Refund=x|Y) P(Marital=x|Y) Ann. income
No Yes Single Divorced Married mean var
class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975
class=Yes 3/10 3/3 0/3 2/3 1/3 0/3 90 25
P(Class=Yes|X) = P(Class=Yes) 

× P(Refund=No|Class=Yes) 

× P(Marital=Married| Class=Yes) 

× P(Income=120K| Class=Yes)
3/10
3/3
0/3
1.2*10-9
Can anything go wrong?
P(Y |X) / P(Y )
nY
i=1
P(Xi|Y )
What if this probability is zero?
- If one of the conditional probabilities is zero, then the
entire expression becomes zero!
Probability estimation
- Original
- Laplace smoothing
number of training instances
where Xi=xi and Y=y
number of training instances
where Y=y
P(Xi = xi|Y = y) =
nc
n
P(Xi = xi|Y = y) =
nc + 1
n + c
c is the number of classes
Probability estimation (2)
- M-estimate
- p can be regarded as the prior probability
- m is called equivalent sample size which determines
the trade-off between the observed probability nc/n
and the prior probability p
- E.g., p=1/3 and m=3
P(Xi = xi|Y = y) =
nc + mp
n + m
Summary
- Robust to isolated noise points

- Handles missing values by ignoring the
instance during probability estimate
calculations

- Robust to irrelevant attributes

- Independence assumption may not hold for
some attributes
Exercise
Ensemble Methods
Ensemble Methods
- Construct a set of classifiers from the training
data

- Predict class label of previously unseen
records by aggregating predictions made by
multiple classifiers
General Idea
Random Forests
Class Imbalance Problem
Class Imbalance Problem
- Data sets with imbalanced class distributions
are quite common in real-world applications

- E.g., credit card fraud detection
- Correct classification of the rare class has
often greater value than a correct classification
of the majority class

- The accuracy measure is not well suited for
imbalanced data sets

- We need alternative measures
Confusion Matrix
Predicted class
Positive Negative
Actual
class
Positive
True Positives
(TP)
False Negatives
(FN)
Negative
False Positives
(FP)
True Negatives
(TN)
Additional Measures
- True positive rate (or sensitivity)

- Fraction of positive examples predicted correctly
- True negative rate (or specificity)

- Fraction of negative examples predicted correctly
TPR =
TP
TP + FN
TNR =
TN
TN + FP
Additional Measures
- False positive rate

- Fraction of negative examples predicted as positive
- False negative rate

- Fraction of positive examples predicted as negative
FPR =
FP
TN + FP
FNR =
FN
TP + FN
Additional Measures
- Precision

- Fraction of positive records among those that are
classified as positive
- Recall

- Fraction of positive examples correctly predicted
(same as the true positive rate)
P =
TP
TP + FP
R =
TP
TP + FN
Additional Measures
- F1-measure
- Summarizing precision and recall into a single
number
- Harmonic mean between precision and recall
F1 =
2RP
R + P
Multiclass Problem
Multiclass Classification
- Many of the approaches are originally
designed for binary classification problems

- Many real-world problems require data to be
divided into more than two categories

- Two approaches

- One-against-rest (1-r)
- One-against-one (1-1)
- Predictions need to be combined in both cases
One-against-rest
- Y={y1, y2, … yK} classes

- For each class yi

- Instances that belong to yi are positive examples
- All other instances are negative examples
- Combining predictions

- If an instance is classified positive, the positive class
gets a vote
- If an instance is classified negative, all classes
except for the positive class receive a vote
Example
- 4 classes, Y={y1, y2, y3, y4} 

- Classifying a given test instance
y1 +
y2 -
y3 -
y4 -
class +
y1 -
y2 -
y3 +
y4 -
class -
y1 -
y2 +
y3 -
y4 -
class -
y1 -
y2 -
y3 -
y4 +
class -
total votes
y1
y2
y3
y4
target class
One-against-one
- Y={y1, y2, … yK} classes

- Construct a binary classifier for each pair of
classes (yi, yj)

- K(K-1)/2 binary classifiers in total
- Combining predictions

- The positive class receives a vote in each pairwise
comparison
Example
- 4 classes, Y={y1, y2, y3, y4} 

- Classifying a given test instance
y1 +
y2 -
class +
y1 +
y3 -
class +
y1 +
y4 -
class -
y2 +
y3 -
class +
y2 +
y4 -
class -
y3 +
y4 -
class +
total votes
y1
y2
y3
y4
target class

More Related Content

PDF
Hierarchical Novelty Detection for Visual Object Recognition
PPT
Rule-Based Classifiers
PPT
rules classifier in machine learning .ppt
PPTX
lec06_Classification_NaiveBayes_RuleBased.pptx
PPT
Ch5 alternative classification
PPTX
Classification: Alternative Techniques and Nearest Neighbor Classifiers
PPTX
�datamining-lect7.pptx literature of data mining and summary
PPTX
chap4_rule_based data mining power point.
Hierarchical Novelty Detection for Visual Object Recognition
Rule-Based Classifiers
rules classifier in machine learning .ppt
lec06_Classification_NaiveBayes_RuleBased.pptx
Ch5 alternative classification
Classification: Alternative Techniques and Nearest Neighbor Classifiers
�datamining-lect7.pptx literature of data mining and summary
chap4_rule_based data mining power point.

Similar to Machine Learning - Classification (ctd.) (20)

PPT
Machine Learning
PPTX
Classification Continued
PPTX
Classification Continued
DOC
learningIntro.doc
DOC
learningIntro.doc
PDF
Bill howe 6_machinelearning_1
PPTX
NN Classififcation Neural Network NN.pptx
PPTX
UNIT 3: Data Warehousing and Data Mining
PDF
Introduction to conventional machine learning techniques
PPT
Information Retrieval 08
PPT
Business Analytics using R.ppt
PPT
Introduction to Machine Learning Aristotelis Tsirigos
PPT
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
PDF
Using rule based classifiers for the predictive analysis of breast cancer rec...
PDF
11.using rule based classifiers for the predictive analysis of breast cancer ...
PDF
Classification Algorithm in Big Data.pdf
PPT
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
PDF
Induction of Decision Trees
PPT
Machine Learning: Foundations Course Number 0368403401
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
Machine Learning
Classification Continued
Classification Continued
learningIntro.doc
learningIntro.doc
Bill howe 6_machinelearning_1
NN Classififcation Neural Network NN.pptx
UNIT 3: Data Warehousing and Data Mining
Introduction to conventional machine learning techniques
Information Retrieval 08
Business Analytics using R.ppt
Introduction to Machine Learning Aristotelis Tsirigos
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Using rule based classifiers for the predictive analysis of breast cancer rec...
11.using rule based classifiers for the predictive analysis of breast cancer ...
Classification Algorithm in Big Data.pdf
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Induction of Decision Trees
Machine Learning: Foundations Course Number 0368403401
IRJET- Performance Evaluation of Various Classification Algorithms
Ad

More from Darío Garigliotti (20)

PDF
Task-Based Support in Search Engines
PDF
Task Recommendation
PDF
About "Towards Better Text Understanding and Retrieval through Kernel Entity ...
PDF
A Semantic Search Approach to Task-Completion Engines
PDF
A Summary of ECIR'18
PDF
A Semantic Search Approach to Task-Completion Engines
PDF
A Knowledge Base of Entity-Oriented Search Intents
PDF
On Type-Aware Entity Retrieval
PDF
Learning-to-Rank Target Types for Entity-Bearing Queries
PDF
Task-Based Information Retrieval
PDF
Type Information in Entity Retrieval
PDF
Type-Aware Entity Retrieval
PDF
Type-Aware Entity Retrieval
PDF
Dive into Deep Learning
PDF
Type-Aware Entity Retrieval
PDF
If this is the answer, what was the question?
PDF
Semi-supervised Learning for Word Sense Disambiguation
PDF
Semi-supervised Learning for Word Sense Disambiguation
PDF
Type-Aware Entity Retrieval
PDF
Semi-supervised Learning for Word Sense Disambiguation
Task-Based Support in Search Engines
Task Recommendation
About "Towards Better Text Understanding and Retrieval through Kernel Entity ...
A Semantic Search Approach to Task-Completion Engines
A Summary of ECIR'18
A Semantic Search Approach to Task-Completion Engines
A Knowledge Base of Entity-Oriented Search Intents
On Type-Aware Entity Retrieval
Learning-to-Rank Target Types for Entity-Bearing Queries
Task-Based Information Retrieval
Type Information in Entity Retrieval
Type-Aware Entity Retrieval
Type-Aware Entity Retrieval
Dive into Deep Learning
Type-Aware Entity Retrieval
If this is the answer, what was the question?
Semi-supervised Learning for Word Sense Disambiguation
Semi-supervised Learning for Word Sense Disambiguation
Type-Aware Entity Retrieval
Semi-supervised Learning for Word Sense Disambiguation
Ad

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
famous lake in india and its disturibution and importance
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Sciences of Europe No 170 (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPT
protein biochemistry.ppt for university classes
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
Introduction to Cardiovascular system_structure and functions-1
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet Module 2ELS
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
famous lake in india and its disturibution and importance
lecture 2026 of Sjogren's syndrome l .pdf
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Cell Membrane: Structure, Composition & Functions
Sciences of Europe No 170 (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Derivatives of integument scales, beaks, horns,.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Phytochemical Investigation of Miliusa longipes.pdf
The KM-GBF monitoring framework – status & key messages.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
protein biochemistry.ppt for university classes
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet earth and life
Introduction to Cardiovascular system_structure and functions-1

Machine Learning - Classification (ctd.)

  • 1. DAT630
 Classification
Alternative Techniques Darío Garigliotti | University of Stavanger 09/10/2017 Introduction to Data Mining, Chapter 5
  • 3. Outline - Alternative classification techniques - Rule-based - Nearest neighbors - Naive Bayes - Ensemble methods - Class imbalance problem - Multiclass problem
  • 5. Rule-based Classifier - Classifying records using a set of "if… then…" rules - Example - R is known as the rule set R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians
  • 6. Classification Rules - Each classification rule can be expressed in the following way ri : (Conditioni) ! yi rule antecedent 
 (or precondition) rule consequent 

  • 7. Classification Rules - A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Which rules cover the "hawk" and the "grizzly bear"? Name Blood Type Give Birth Can Fly Live in Water Class hawk warm no yes no ? grizzly bear warm yes no no ?
  • 8. Classification Rules - A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal Name Blood Type Give Birth Can Fly Live in Water Class hawk warm no yes no ? grizzly bear warm yes no no ?
  • 9. Rule Coverage and Accuracy - Coverage of a rule - Fraction of records that satisfy the antecedent of a rule - Accuracy of a rule - Fraction of records that satisfy both the antecedent and consequent of a rule Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (Status=Single) → No Coverage = 40%, Accuracy = 50%
  • 10. How does it work? R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules Name Blood Type Give Birth Can Fly Live in Water Class lemur warm yes no no ? turtle cold no no sometimes ? dogfish shark cold yes no yes ?
  • 11. Properties of the Rule Set - Mutually exclusive rules - Classifier contains mutually exclusive rules if the rules are independent of each other - Every record is covered by at most one rule - Exhaustive rules - Classifier has exhaustive coverage if it accounts for every possible combination of attribute values - Each record is covered by at least one rule - These two properties ensure that every record is covered by exactly one rule
  • 12. When these Properties are not Satisfied - Rules are not mutually exclusive - A record may trigger more than one rule - Solution? - Ordered rule set - Unordered rule set – use voting schemes - Rules are not exhaustive - A record may not trigger any rules - Solution? - Use a default class (assign the majority class from the training records)
  • 13. Ordered Rule Set - Rules are rank ordered according to their priority - An ordered rule set is known as a decision list - When a test record is presented to the classifier - It is assigned to the class label of the highest ranked rule it has triggered - If none of the rules fired, it is assigned to the default class R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name Blood Type Give Birth Can Fly Live in Water Class turtle cold no no sometimes ?
  • 14. Rule Ordering Schemes - Rule-based ordering - Individual rules are ranked based on some quality measure (e.g., accuracy, coverage) - Class-based ordering - Rules that belong to the same class appear together - Rules are sorted on the basis of their class information (e.g., total description length) - The relative order of rules within a class does not matter
  • 15. Rule Ordering Schemes Rule-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No Class-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Married}) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes
  • 16. How to Build a Rule-based Classifier? - Direct Method - Extract rules directly from data - Indirect Method - Extract rules from other classification models (e.g. decision trees, neural networks, etc)
  • 17. From Decision Trees To Rules YESYESNONO NONO NONO Yes No {Married} {Single, Divorced} < 80K > 80K Taxable Income Marital Status Refund Classification Rules (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree
  • 18. Rules Can Be Simplified YESYESNONO NONO NONO Yes No {Married} {Single, Divorced} < 80K > 80K Taxable Income Marital Status Refund Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Initial Rule: (Refund=No) ∧ (Status=Married) → No Simplified Rule: (Status=Married) → No
  • 19. Summary - Expressiveness is almost equivalent to that of a decision tree - Generally used to produce descriptive models that are easy to interpret, but gives comparable performance to decision tree classifiers - The class-based ordering approach is well suited for handling data sets with imbalanced class distributions
  • 22. So far - Eager learners - Decision trees, rule-base classifiers - Learn a model as soon as the training data becomes available Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set Model Learning algorithm Learn model Apply model
  • 23. Opposite strategy - Lazy learners - Delay the process of modeling the data until it is needed to classify the test examples Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set Modeling Apply model
  • 24. Instance-Based Classifiers Atr1 ……... AtrN Class A B B C A C B Set of Stored Cases Atr1 ……... AtrN Unseen Case • Store the training records • Use training records to 
 predict the class label of 
 unseen cases
  • 25. Instance Based Classifiers - Rote-learner - Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly - Nearest neighbors - Uses k “closest” points (nearest neighbors) for performing classification
  • 26. Nearest neighbors - Basic idea - "If it walks like a duck, quacks like a duck, then it’s probably a duck" Training Records Test Record Compute Distance Choose k of the “nearest” records
  • 27. Nearest-Neighbor Classifiers - Requires three things - The set of stored records - Distance Metric to compute distance between records - The value of k, the number of nearest neighbors to retrieve
  • 28. Nearest-Neighbor Classifiers - To classify an unknown record - Compute distance to other training records - Identify k-nearest neighbors - Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Unknown record
  • 29. Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
  • 30. Choices to make - Compute distance between two points - E.g., Euclidean distance - See Chapter 2 - Determine the class from nearest neighbor list - Take the majority vote of class labels among the k- nearest neighbors - Weigh the vote according to distance - Choose the value of k
  • 31. Choosing the value of k - If k is too small, sensitive to noise points - If k is too large, neighborhood may include points from other classes X
  • 32. Summary - Part of a more general technique called instance-based learning - Use specific training instances to make predictions without having to maintain an abstraction (model) derived from data - Because there is no model building, classifying a test example can be quite expensive - Nearest-neighbors make their predictions based on local information - Susceptible to noise
  • 34. Bayes Classifier - In many applications the relationship between the attribute set and the class variable is 
 non-deterministic - The label of the test record cannot be predicted with certainty even if it was seen previously during training - A probabilistic framework for solving classification problems - Treat X and Y as random variables and capture their relationship probabilistically using P(Y|X)
  • 35. Example - Football game between teams A and B - Team A won 65% team B won 35% of the time - Among the games Team A won, 30% when game hosted by B - Among the games Team B won, 75% when B played home - Which team is more likely to win if the game is hosted by Team B?
  • 36. Probability Basics - Conditional probability - Bayes’ theorem P(Y |X) = P(X|Y )P(Y ) P(X) P(X, Y ) = P(X|Y )P(Y ) = P(Y |X)P(X)
  • 37. Example - Probability Team A wins: P(win=A) = 0.65 - Probability Team B wins: P(win=B) = 0.35 - Probability Team A wins when B hosts: 
 P(hosted=B|win=A) = 0.3 - Probability Team B wins when playing at home: P(hosted=B|win=B) = 0.75 - Who wins the next game that is hosted by B? P(win=B|hosted=B) = ?
 P(win=A|hosted=B) = ?
  • 38. Solution - Using: - P(win=B|hosted=B) = 0.5738 - P(win=A|hosted=B) = 0.4262 - See book page 229 P(Y |X) = P(X|Y )P(Y ) P(X)
  • 39. Bayes’ Theorem for Classification Posterior probability P(Y |X) = P(X|Y )P(Y ) P(X) Prior probability The evidence Class-conditional probability
  • 40. Bayes’ Theorem for Classification Posterior probability P(Y |X) = P(X|Y )P(Y ) P(X) Prior probability The evidence Constant (same for all classes), can be ignored Class-conditional probability
  • 41. Bayes’ Theorem for Classification Posterior probability P(Y |X) = P(X|Y )P(Y ) P(X) The evidence Class-conditional probability Prior probability Can be computed from training data (fraction of records that belong to each class)
  • 42. Bayes’ Theorem for Classification Posterior probability P(Y |X) = P(X|Y )P(Y ) P(X) Prior probability The evidence Class-conditional probability
 Two methods: Naive Bayes, Bayesian belief network
  • 44. Estimation - Mind that X is a vector - Class-conditional probability - "Naive" assumption: attributes are independent X = {X1, . . . , Xn} P(X|Y ) = P(X1, . . . , Xn|Y ) P(X|Y ) = nY i=1 P(Xi|Y )
  • 45. Naive Bayes Classifier - Probability that X belongs to class Y - Target label for record X P(Y |X) / P(Y ) nY i=1 P(Xi|Y ) y = arg max yj P(Y = yj) nY i=1 P(Xi|Y = yj)
  • 46. Estimating class- conditional probabilities - Categorical attributes - The fraction of training instances in class Y that have a particular attribute value xi - Continuous attributes - Discretizing the range into bins - Assuming a certain probability distribution number of training instances where Xi=xi and Y=y number of training instances where Y=y P(Xi = xi|Y = y) = nc n
  • 47. Conditional probabilities for categorical attributes - The fraction of training instances in class Y that have a particular attribute value Xi - P(Status=Married|No)=? - P(Refund=Yes|Yes)=? Tid Refund Marital Status Taxable Income Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 categorical categorical continuous class
  • 48. Conditional probabilities for continuous attributes - Discretize the range into bins, or - Assume a certain form of probability distribution - Gaussian (normal) distribution is often used - The parameters of the distribution are estimated from the training data (from instances that belong to class yj) - sample mean and variance P(Xi = xi|Y = yj) = 1 q 2⇡ 2 ij exp (xi µij )2 2 2 ij 2 ij µij
  • 49. Example Tid Refund Marital Status Taxable Income Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10
  • 50. Example Tid Refund Marital Status Taxable Income Evade 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 X={Refund=No, Marital st.=Married, Income=120K} P(C) P(Refund=x|Y) P(Marital=x|Y) Ann. income No Yes Single Divorced Married mean var class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975 class=Yes 3/10 3/3 3/3 2/3 1/3 0/3 90 25
  • 51. Example
 classifying a new instance X={Refund=No, Marital st.=Married, Income=120K} P(C) P(Refund=x|Y) P(Marital=x|Y) Ann. income No Yes Single Divorced Married mean var class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975 class=Yes 3/10 3/3 3/3 2/3 1/3 0/3 90 25 P(Class=No|X) = P(Class=No) 
 × P(Refund=No|Class=No) 
 × P(Marital=Married| Class=No) 
 × P(Income=120K| Class=No) 7/10 4/7 4/7 0.0072
  • 52. Example
 classifying a new instance X={Refund=No, Marital st.=Married, Income=120K} P(C) P(Refund=x|Y) P(Marital=x|Y) Ann. income No Yes Single Divorced Married mean var class=No 7/10 4/7 3/7 2/7 1/7 4/7 110 2975 class=Yes 3/10 3/3 0/3 2/3 1/3 0/3 90 25 P(Class=Yes|X) = P(Class=Yes) 
 × P(Refund=No|Class=Yes) 
 × P(Marital=Married| Class=Yes) 
 × P(Income=120K| Class=Yes) 3/10 3/3 0/3 1.2*10-9
  • 53. Can anything go wrong? P(Y |X) / P(Y ) nY i=1 P(Xi|Y ) What if this probability is zero? - If one of the conditional probabilities is zero, then the entire expression becomes zero!
  • 54. Probability estimation - Original - Laplace smoothing number of training instances where Xi=xi and Y=y number of training instances where Y=y P(Xi = xi|Y = y) = nc n P(Xi = xi|Y = y) = nc + 1 n + c c is the number of classes
  • 55. Probability estimation (2) - M-estimate - p can be regarded as the prior probability - m is called equivalent sample size which determines the trade-off between the observed probability nc/n and the prior probability p - E.g., p=1/3 and m=3 P(Xi = xi|Y = y) = nc + mp n + m
  • 56. Summary - Robust to isolated noise points - Handles missing values by ignoring the instance during probability estimate calculations - Robust to irrelevant attributes - Independence assumption may not hold for some attributes
  • 59. Ensemble Methods - Construct a set of classifiers from the training data - Predict class label of previously unseen records by aggregating predictions made by multiple classifiers
  • 63. Class Imbalance Problem - Data sets with imbalanced class distributions are quite common in real-world applications - E.g., credit card fraud detection - Correct classification of the rare class has often greater value than a correct classification of the majority class - The accuracy measure is not well suited for imbalanced data sets - We need alternative measures
  • 64. Confusion Matrix Predicted class Positive Negative Actual class Positive True Positives (TP) False Negatives (FN) Negative False Positives (FP) True Negatives (TN)
  • 65. Additional Measures - True positive rate (or sensitivity) - Fraction of positive examples predicted correctly - True negative rate (or specificity) - Fraction of negative examples predicted correctly TPR = TP TP + FN TNR = TN TN + FP
  • 66. Additional Measures - False positive rate - Fraction of negative examples predicted as positive - False negative rate - Fraction of positive examples predicted as negative FPR = FP TN + FP FNR = FN TP + FN
  • 67. Additional Measures - Precision - Fraction of positive records among those that are classified as positive - Recall - Fraction of positive examples correctly predicted (same as the true positive rate) P = TP TP + FP R = TP TP + FN
  • 68. Additional Measures - F1-measure - Summarizing precision and recall into a single number - Harmonic mean between precision and recall F1 = 2RP R + P
  • 70. Multiclass Classification - Many of the approaches are originally designed for binary classification problems - Many real-world problems require data to be divided into more than two categories - Two approaches - One-against-rest (1-r) - One-against-one (1-1) - Predictions need to be combined in both cases
  • 71. One-against-rest - Y={y1, y2, … yK} classes - For each class yi - Instances that belong to yi are positive examples - All other instances are negative examples - Combining predictions - If an instance is classified positive, the positive class gets a vote - If an instance is classified negative, all classes except for the positive class receive a vote
  • 72. Example - 4 classes, Y={y1, y2, y3, y4} - Classifying a given test instance y1 + y2 - y3 - y4 - class + y1 - y2 - y3 + y4 - class - y1 - y2 + y3 - y4 - class - y1 - y2 - y3 - y4 + class - total votes y1 y2 y3 y4 target class
  • 73. One-against-one - Y={y1, y2, … yK} classes - Construct a binary classifier for each pair of classes (yi, yj) - K(K-1)/2 binary classifiers in total - Combining predictions - The positive class receives a vote in each pairwise comparison
  • 74. Example - 4 classes, Y={y1, y2, y3, y4} - Classifying a given test instance y1 + y2 - class + y1 + y3 - class + y1 + y4 - class - y2 + y3 - class + y2 + y4 - class - y3 + y4 - class + total votes y1 y2 y3 y4 target class