Rule-Based Classifiers

Rule-Based Classifier
• Classify records by using a collection of “if…then…” rules
• Rule: (Condition)  y
– where
• Condition is a conjunctions of attributes
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
(Blood Type=Warm)  (Lay Eggs=Yes)  Birds
(Taxable Income < 50K)  (Refund=Yes)  Evade=No

(Example)
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

Application of Rule-Based Classifier
• A rule r covers an instance x if the attributes of the instance
satisfy the condition of the rule
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
hawk warm no yes no ?
grizzly bear warm yes no no ?

Rule Coverage and Accuracy
• Coverage of a rule:
– Fraction of records that
satisfy the antecedent of a
rule
• Accuracy of a rule:
– Fraction of records that
satisfy both the antecedent
and consequent of a rule
(over those that satisfy the
antecedent)
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single)  No
Coverage = 40%, Accuracy = 50%

Decision Trees vs. rules
From trees to rules.
• Easy: converting a tree into a set of rules
– One rule for each leaf:
– Antecedent contains a condition for every node on the path from
the root to the leaf
– Consequent is the class assigned by the leaf
– Straightforward, but rule set might be overly complex

Decision Trees vs. rules
From rules to trees
• More difficult: transforming a rule set into a tree
– Tree cannot easily express disjunction between rules
• Example:
If a and b then x
If c and d then x
– Corresponding tree contains identical subtrees (“replicated
subtree problem”)

A tree for a simple disjunction

How does Rule-based Classifier Work?
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?

Desiderata for Rule-Based Classifier
• Mutually exclusive rules
– No two rules are triggered by the same record.
– This ensures that every record is covered by at most one rule.
• Exhaustive rules
– There exists a rule for each combination of attribute values.
– This ensures that every record is covered by at least one rule.
Together these properties ensure that every record is covered by exactly
one rule.

Rules
• Non mutually exclusive rules
– A record may trigger more than one rule
– Solution?
• Ordered rule set
• Non exhaustive rules
– A record may not trigger any rules
– Solution?
• Use a default class

Ordered Rule Set
• Rules are ranked ordered according to their priority (e.g. based
on their quality)
– An ordered rule set is known as a decision list
• When a test record is presented to the classifier
– It is assigned to the class label of the highest ranked rule it has triggered
– If none of the rules fired, it is assigned to the default class
turtle cold no no sometimes ?

Building Classification Rules:
Sequential Covering
1. Start from an empty rule
2. Grow a rule using some Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met
(i) Original Data (ii) Step 1

(iii) Step 2
R1
(iv) Step 3
R1
R2
• This approach is called a covering approach because at
each stage a rule is identified that covers some of the
instances

Example: generating a rule
• Possible rule set for class “b”:
• More rules could be added for “perfect” rule set
If x  1.2 then class = b
If x > 1.2 and y  2.6 then class = b
y
x
a
b b
b
b
b
b
b
b
b b b
b
b
b
a
a
a
a
a y
a
b b
b
b
b
b
b
b
b b b
b
b
b
a a
a
a
a
x
1·2
2·6
y
a
b b
b
b
b
b
b
b
b b b
b
b
b
a a
a
a
a
x
1·2

A simple covering algorithm
• Generates a rule by adding tests that
maximize rule’s accuracy
• Similar to situation in decision trees:
problem of selecting an attribute to
split on.
– But: decision tree inducer maximizes
overall purity
• Here, each new test (growing the
rule) reduces rule’s coverage.
space of
examples
rule so far
rule after
adding new
term

Selecting a test
• Goal: maximizing accuracy
– t: total number of instances covered by rule
– p: positive examples of the class covered by rule
– t-p: number of errors made by rule
Select test that maximizes the ratio p/t
• We are finished when p/t = 1 or the set of instances can’t
be split any further

Example: contact lenses data
Age Spectacle
prescription
Astigmatism Tear production rate Recommended
Lenses
young myope no reduced none
young myope no normal soft
young myope yes reduced none
young myope yes normal hard
young hypermetrope no reduced none
young hypermetrope no normal soft
young hypermetrope yes reduced none
young hypermetrope yes normal hard
pre-presbyopic myope no reduced none
pre-presbyopic myope no normal soft
pre-presbyopic myope yes reduced none
pre-presbyopic myope yes normal hard
pre-presbyopic hypermetrope no reduced none
pre-presbyopic hypermetrope no normal soft
pre-presbyopic hypermetrope yes reduced none
pre-presbyopic hypermetrope yes normal none
presbyopic myope no reduced none
presbyopic myope no normal none
presbyopic myope yes reduced none
presbyopic myope yes normal hard
presbyopic hypermetrope no reduced none
presbyopic hypermetrope no normal soft
presbyopic hypermetrope yes reduced none
presbyopic hypermetrope yes normal none

Example: contact lenses data
The numbers on the right show the fraction of “correct” instances in
the set singled out by that choice.
In this case, correct means that their recommendation is “hard.”

Modified rule and resulting data
The rule isn’t very accurate, getting only 4 out of 12 that it covers.
So, it needs further refinement.

Modified rule and resulting data
Should we stop here? Perhaps. But let’s say we are going for exact
rules, no matter how complex they become.
So, let’s refine further.

Pseudo-code for PRISM
For each class C
Initialize E to the instance set
While E contains instances in class C
Create a rule R with an empty left-hand side that predicts class C
Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v,
Consider adding the condition A = v to the left-hand side
of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the largest p)
Add A = v to R
Remove the instances covered by R from E
RIPPER Algorithm is similar. It uses instead of p/t the info gain.
Heuristic: order C in ascending
order of occurrence.

Separate and conquer
• Methods like PRISM (for dealing with one class) are
separate-and-conquer algorithms:
– First, a rule is identified
– Then, all instances covered by the rule are separated out
– Finally, the remaining instances are “conquered”
• Difference to divide-and-conquer methods:
– Subset covered by rule doesn’t need to be explored any
further

Rule-Based Classifiers

More Related Content

What's hot (20)

Similar to Rule-Based Classifiers (20)

Recently uploaded (20)

Rule-Based Classifiers