DMTM 2015 - 10 Introduction to Classification

Prof. Pier Luca Lanzi
Classiﬁcation: Introduction
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)

What is An Apple? 2

Are These Apples?

Contact Lenses Data 5
NoneReducedYesHypermetropePre-presbyopic
NoneNormalYesHypermetropePre-presbyopic
NoneReducedNoMyopePresbyopic
NoneNormalNoMyopePresbyopic
NoneReducedYesMyopePresbyopic
HardNormalYesMyopePresbyopic
NoneReducedNoHypermetropePresbyopic
SoftNormalNoHypermetropePresbyopic
NoneReducedYesHypermetropePresbyopic
NoneNormalYesHypermetropePresbyopic
SoftNormalNoHypermetropePre-presbyopic
NoneReducedNoHypermetropePre-presbyopic
HardNormalYesMyopePre-presbyopic
NoneReducedYesMyopePre-presbyopic
SoftNormalNoMyopePre-presbyopic
NoneReducedNoMyopePre-presbyopic
hardNormalYesHypermetropeYoung
NoneReducedYesHypermetropeYoung
SoftNormalNoHypermetropeYoung
NoneReducedNoHypermetropeYoung
HardNormalYesMyopeYoung
NoneReducedYesMyopeYoung
SoftNormalNoMyopeYoung
NoneReducedNoMyopeYoung
Recommended lensesTear production rateAstigmatismSpectacle prescriptionAge

A Model for the Contact Lenses Data 6
If tear production rate = reduced then recommendation = none
If age = young and astigmatic = no
and tear production rate = normal then recommendation = soft
If age = pre-presbyopic and astigmatic = no
If age = presbyopic and spectacle prescription = myope
and astigmatic = no then recommendation = none
If spectacle prescription = hypermetrope and astigmatic = no
If spectacle prescription = myope and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age young and astigmatic = yes
and tear production rate = normal then recommendation = hard
If age = pre-presbyopic
and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none
If age = presbyopic and spectacle prescription = hypermetrope
and astigmatic = yes then recommendation = none

CPU Performance Data 7
0
0
32
128
CHMAX
0
0
8
16
CHMIN
Channels PerformanceCache
(Kb)
Main memory
(Kb)
Cycle time
(ns)
45040001000480209
67328000512480208
…
26932320008000292
19825660002561251
PRPCACHMMAXMMINMYCT
PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX
+ 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX

Classification vs. Prediction
•  Classification
§ Predicts categorical class labels (discrete or nominal)
§ Classifies data (constructs a model) based on the training set
and the values (class labels) in a classifying attribute and uses it
in classifying new data
•  Prediction
§ Models continuous-valued functions, i.e., predicts unknown or
missing values
•  Applications
§ Credit approval
§ Target marketing
§ Medical diagnosis
§ Fraud detection
8

classiﬁcation = model building + model usage

What is classification?
•  Classification is a two-step Process
•  Model construction
§ Given a set of data representing examples of
a target concept, build a model to “explain” the concept
•  Model usage
§ The classification model is used for classifying
future or unknown cases
§ Estimate accuracy of the model
10

Classification: Model Construction 11
Classification
Algorithm
IF rank = ‘professor’
OR years 6
THEN tenured = ‘yes’
name rank years tenured
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Training
Data
Classifier
(Model)

Classiﬁcation: Model Usage 12
tenured = yes
name rank years tenured
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Test
Data
Classiﬁer
(Model)
Unseen Data
Jeff, Professor, 4

Evaluating Classification Methods
•  Accuracy
§ classifier accuracy: predicting class label
§ predictor accuracy: guessing value of predicted attributes
•  Speed
§ time to construct the model (training time)
§ time to use the model (classification/prediction time)
•  Other Criteria
§ Robustness: handling noise and missing values
§ Scalability: efficiency in disk-resident databases
§ Interpretability: understanding and insight provided
§ Other measures, e.g., goodness of rules, such as decision tree size
or compactness of classification rules
13

The Weather Dataset:
Building the Model
Outlook Temp Humidity Windy Play
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Cool Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
15
•  Write one rule like “if A=v1 then X, else if A=v2 thenY, …” to
predict whether the player is going to play or not
•  A is an attribute; vi are attribute values; X,Y are class labels

The Weather Dataset:
Testing the Model
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Rainy Mild High False Yes
Sunny Mild High False No
Rainy Mild Normal False Yes
16

Examples of Models
•  if outlook = sunny then no (3 / 2)
if outlook = overcast then yes (0 / 4)
if outlook = rainy then yes (2 / 3)

correct: 10 out of 14 training examples
•  if outlook = sunny then yes (1 / 2)
if outlook = overcast then yes (0 / 4)
if outlook = rainy then no (2 / 1)

correct: 8 out of 10 training examples
17

The Machine Learning Perspective

•  Classiﬁcation algorithms are methods of supervised Learning
•  The experience E consists of a set of examples of a target
concept that have been prepared by a supervisor
•  The task T consists of ﬁnding an hypothesis that accurately
explains the target concept
•  The performance P depends on how accurately the hypothesis h
explains the examples in E
19

•  Let us deﬁne the problem domain as the set of instance X
(for instance, X contains different different fruits)
•  We deﬁne a concept over X as a function c which maps
elements of X into a range D or c:X→ D
•  The range D represents the type of concept analyzed
•  For instance, c: X → {isApple, notAnApple}
20

•  Experience E is a set of x,d pairs, with x∈X and d∈D.
•  The task T consists of ﬁnding an hypothesis h to explain E:
•  ∀x∈X h(x)=c(x)
•  The set H of all the possible hypotheses h that can be used to
explain c it is called the hypothesis space
•  The goodness of an hypothesis h can be evaluated as the
percentage of examples that are correctly explained by h

P(h) = | {x| x∈X e h(x)=c(x)}| / |X|
21

Examples
•  Concept Learning
when D={0,1}
•  Supervised classiﬁcation
when D consists of a ﬁnite number of labels
•  Prediction
when D is a subset of Rn
22

on Classiﬁcation
•  Supervised learning algorithms, given the examples in E, search
the hypotheses space H for the hypothesis h that best explains
the examples in E
•  Learning is viewed as a search in the hypotheses space
23

Searching for Hypotheses
•  The type of hypothesis required influences the search algorithm
•  The more complex the representation
the more complex the search algorithm
•  Many algorithms assume that it is possible to define a partial
ordering over the hypothesis space
•  The hypothesis space can be searched using either a general to
specific or a specific-to-general strategy
24

Exploring the Hypothesis Space
•  General to Specific
§ Start with the most general hypothesis and then go on
through specialization steps
•  Specific to General
§ Start with the set of the most specific hypothesis and
then go on through generalization steps
25

Inductive Bias
•  Set of assumptions that together with the training data deductively justify the
classiﬁcation assigned by the learner to future instances
•  There can be a number of hypotheses consistent with training data
•  Each learning algorithm has an inductive bias that imposes a preference on the
space of all possible hypotheses
26

Types of Inductive Bias
•  Syntactic Bias
§ Depends on the language used to represent hypotheses
•  Semantic Bias
§ Depends on the heuristics used to ﬁlter hypotheses
•  Preference Bias
§ Depends on the ability to rank and compare hypotheses
•  Restriction Bias
§ Depends on the ability to restrict the search space
27

Why Are We Looking for h?

Inductive Learning Hypothesis
•  Any hypothesis (h) found to approximate the target function (c) over a
sufﬁciently large set of training examples will also approximate the
target function (c) well over other unobserved examples.
•  Training
§ The hypothesis h is developed to explain the examples in ETrain
•  Testing
§ The hypothesis h is evaluated (veriﬁed) with respect to the
previously unseen examples in ETest
•  The underlying hypothesis
§ If h explains ETrain then it can also be used to explain other unseen
examples in ETest (not previously used to develop h)
29

Generalization and Overfitting
•  Generalization
§ When h explains “well” both ETrain and ETest we say that h is
general and that the method used to develop h has
adequately generalized
•  Overfitting
§ When h explains ETrain but not ETest we say that the method
used to develop h has overfitted
§ We have overfitting when the hypothesis h explains ETrain too
accurately so that h is not general enough to be applied
outside ETrain
30

What are the general issues
for classiﬁcation in Machine Learning?
•  Type of training experience
§ Direct or indirect?
§ Supervised or not?
•  Type of target function and performance
•  Type of search algorithm
•  Type of representation of the solution
•  Type of inductive bias
31

DMTM 2015 - 10 Introduction to Classification

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to DMTM 2015 - 10 Introduction to Classification (20)

More from Pier Luca Lanzi (13)

Recently uploaded (20)

DMTM 2015 - 10 Introduction to Classification