SlideShare a Scribd company logo
Rule-Based Classifiers
Rule-Based Classifier
• Classify records by using a collection of “if…then…” rules
• Rule: (Condition)  y
– where
• Condition is a conjunctions of attributes
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
(Blood Type=Warm)  (Lay Eggs=Yes)  Birds
(Taxable Income < 50K)  (Refund=Yes)  Evade=No
(Example)
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
Application of Rule-Based Classifier
• A rule r covers an instance x if the attributes of the instance
satisfy the condition of the rule
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
Rule Coverage and Accuracy
• Coverage of a rule:
– Fraction of records that
satisfy the antecedent of a
rule
• Accuracy of a rule:
– Fraction of records that
satisfy both the antecedent
and consequent of a rule
(over those that satisfy the
antecedent)
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single)  No
Coverage = 40%, Accuracy = 50%
Decision Trees vs. rules
From trees to rules.
• Easy: converting a tree into a set of rules
– One rule for each leaf:
– Antecedent contains a condition for every node on the path from
the root to the leaf
– Consequent is the class assigned by the leaf
– Straightforward, but rule set might be overly complex
Decision Trees vs. rules
From rules to trees
• More difficult: transforming a rule set into a tree
– Tree cannot easily express disjunction between rules
• Example:
If a and b then x
If c and d then x
– Corresponding tree contains identical subtrees (“replicated
subtree problem”)
A tree for a simple disjunction
How does Rule-based Classifier Work?
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Desiderata for Rule-Based Classifier
• Mutually exclusive rules
– No two rules are triggered by the same record.
– This ensures that every record is covered by at most one rule.
• Exhaustive rules
– There exists a rule for each combination of attribute values.
– This ensures that every record is covered by at least one rule.
Together these properties ensure that every record is covered by exactly
one rule.
Rules
• Non mutually exclusive rules
– A record may trigger more than one rule
– Solution?
• Ordered rule set
• Non exhaustive rules
– A record may not trigger any rules
– Solution?
• Use a default class
Ordered Rule Set
• Rules are ranked ordered according to their priority (e.g. based
on their quality)
– An ordered rule set is known as a decision list
• When a test record is presented to the classifier
– It is assigned to the class label of the highest ranked rule it has triggered
– If none of the rules fired, it is assigned to the default class
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Building Classification Rules:
Sequential Covering
1. Start from an empty rule
2. Grow a rule using some Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met
(i) Original Data (ii) Step 1
(iii) Step 2
R1
(iv) Step 3
R1
R2
• This approach is called a covering approach because at
each stage a rule is identified that covers some of the
instances
Example: generating a rule
• Possible rule set for class “b”:
• More rules could be added for “perfect” rule set
If x  1.2 then class = b
If x > 1.2 and y  2.6 then class = b
y
x
a
b b
b
b
b
b
b
b
b b b
b
b
b
a
a
a
a
a y
a
b b
b
b
b
b
b
b
b b b
b
b
b
a a
a
a
a
x
1·2
2·6
y
a
b b
b
b
b
b
b
b
b b b
b
b
b
a a
a
a
a
x
1·2
A simple covering algorithm
• Generates a rule by adding tests that
maximize rule’s accuracy
• Similar to situation in decision trees:
problem of selecting an attribute to
split on.
– But: decision tree inducer maximizes
overall purity
• Here, each new test (growing the
rule) reduces rule’s coverage.
space of
examples
rule so far
rule after
adding new
term
Selecting a test
• Goal: maximizing accuracy
– t: total number of instances covered by rule
– p: positive examples of the class covered by rule
– t-p: number of errors made by rule
Select test that maximizes the ratio p/t
• We are finished when p/t = 1 or the set of instances can’t
be split any further
Example: contact lenses data
Age Spectacle
prescription
Astigmatism Tear production rate Recommended
Lenses
young myope no reduced none
young myope no normal soft
young myope yes reduced none
young myope yes normal hard
young hypermetrope no reduced none
young hypermetrope no normal soft
young hypermetrope yes reduced none
young hypermetrope yes normal hard
pre-presbyopic myope no reduced none
pre-presbyopic myope no normal soft
pre-presbyopic myope yes reduced none
pre-presbyopic myope yes normal hard
pre-presbyopic hypermetrope no reduced none
pre-presbyopic hypermetrope no normal soft
pre-presbyopic hypermetrope yes reduced none
pre-presbyopic hypermetrope yes normal none
presbyopic myope no reduced none
presbyopic myope no normal none
presbyopic myope yes reduced none
presbyopic myope yes normal hard
presbyopic hypermetrope no reduced none
presbyopic hypermetrope no normal soft
presbyopic hypermetrope yes reduced none
presbyopic hypermetrope yes normal none
Example: contact lenses data
The numbers on the right show the fraction of “correct” instances in
the set singled out by that choice.
In this case, correct means that their recommendation is “hard.”
Modified rule and resulting data
The rule isn’t very accurate, getting only 4 out of 12 that it covers.
So, it needs further refinement.
Further refinement
Modified rule and resulting data
Should we stop here? Perhaps. But let’s say we are going for exact
rules, no matter how complex they become.
So, let’s refine further.
Further refinement
The result
Pseudo-code for PRISM
For each class C
Initialize E to the instance set
While E contains instances in class C
Create a rule R with an empty left-hand side that predicts class C
Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v,
Consider adding the condition A = v to the left-hand side
of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the largest p)
Add A = v to R
Remove the instances covered by R from E
RIPPER Algorithm is similar. It uses instead of p/t the info gain.
Heuristic: order C in ascending
order of occurrence.
Separate and conquer
• Methods like PRISM (for dealing with one class) are
separate-and-conquer algorithms:
– First, a rule is identified
– Then, all instances covered by the rule are separated out
– Finally, the remaining instances are “conquered”
• Difference to divide-and-conquer methods:
– Subset covered by rule doesn’t need to be explored any
further

More Related Content

PPTX
Constraint satisfaction problems (csp)
PDF
K - Nearest neighbor ( KNN )
PPT
Association rule mining
PPTX
Association Analysis in Data Mining
PDF
Clustering[306] [Read-Only].pdf
PPT
2.4 rule based classification
PPTX
Feature selection
PDF
Introduction To Multilevel Association Rule And Its Methods
Constraint satisfaction problems (csp)
K - Nearest neighbor ( KNN )
Association rule mining
Association Analysis in Data Mining
Clustering[306] [Read-Only].pdf
2.4 rule based classification
Feature selection
Introduction To Multilevel Association Rule And Its Methods

What's hot (20)

PPT
5.1 greedy
PPT
Backtracking
PPT
Software Process Improvement
PPT
Randomized Algorithms
PPTX
Dbscan algorithom
PPT
CS8091_BDA_Unit_II_Clustering
PPT
Randomized algorithms ver 1.0
PPT
Boundary value analysis
PPT
KNOWLEDGE REPRESENTATION ISSUES.ppt
PDF
An Introduction to Software Architecture
PPTX
Dynamic Programming
PPTX
Statistics and data science
PDF
Data Mining: Association Rules Basics
PPTX
support vector regression
PPTX
Software Testing Methodologies
PPTX
Exhaustive Search
PPT
2.3 bayesian classification
PPTX
source code metrics and other maintenance tools and techniques
ODP
Machine Learning With Logistic Regression
PPTX
Genetic algorithms in Data Mining
5.1 greedy
Backtracking
Software Process Improvement
Randomized Algorithms
Dbscan algorithom
CS8091_BDA_Unit_II_Clustering
Randomized algorithms ver 1.0
Boundary value analysis
KNOWLEDGE REPRESENTATION ISSUES.ppt
An Introduction to Software Architecture
Dynamic Programming
Statistics and data science
Data Mining: Association Rules Basics
support vector regression
Software Testing Methodologies
Exhaustive Search
2.3 bayesian classification
source code metrics and other maintenance tools and techniques
Machine Learning With Logistic Regression
Genetic algorithms in Data Mining
Ad

Similar to Rule-Based Classifiers (20)

PDF
Machine Learning - Classification (ctd.)
PPT
Ch5 alternative classification
PPTX
lec06_Classification_NaiveBayes_RuleBased.pptx
PPTX
�datamining-lect7.pptx literature of data mining and summary
PPTX
chap4_rule_based data mining power point.
PPTX
Classification: Alternative Techniques and Nearest Neighbor Classifiers
PPT
Covering (Rules-based) Algorithm
PPT
Poggi analytics - star - 1a
PDF
Bill howe 6_machinelearning_1
PPTX
Rule Based Algorithms.pptx
PDF
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
PPTX
Classification Continued
PPTX
Classification Continued
PPT
3.Classification.ppt
PPTX
Poggi analytics - concepts - 1a
PPT
Information Retrieval 08
PPTX
WEKA:Output Knowledge Representation
PPTX
WEKA: Output Knowledge Representation
PPT
Business Analytics using R.ppt
PPT
Machine Learning
Machine Learning - Classification (ctd.)
Ch5 alternative classification
lec06_Classification_NaiveBayes_RuleBased.pptx
�datamining-lect7.pptx literature of data mining and summary
chap4_rule_based data mining power point.
Classification: Alternative Techniques and Nearest Neighbor Classifiers
Covering (Rules-based) Algorithm
Poggi analytics - star - 1a
Bill howe 6_machinelearning_1
Rule Based Algorithms.pptx
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
Classification Continued
Classification Continued
3.Classification.ppt
Poggi analytics - concepts - 1a
Information Retrieval 08
WEKA:Output Knowledge Representation
WEKA: Output Knowledge Representation
Business Analytics using R.ppt
Machine Learning
Ad

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Computer network topology notes for revision
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to machine learning and Linear Models
PPT
Reliability_Chapter_ presentation 1221.5784
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Qualitative Qantitative and Mixed Methods.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Database Infoormation System (DBIS).pptx
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Computer network topology notes for revision
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Supervised vs unsupervised machine learning algorithms
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
climate analysis of Dhaka ,Banglades.pptx
Introduction to machine learning and Linear Models
Reliability_Chapter_ presentation 1221.5784

Rule-Based Classifiers

  • 2. Rule-Based Classifier • Classify records by using a collection of “if…then…” rules • Rule: (Condition)  y – where • Condition is a conjunctions of attributes • y is the class label – LHS: rule antecedent or condition – RHS: rule consequent – Examples of classification rules: (Blood Type=Warm)  (Lay Eggs=Yes)  Birds (Taxable Income < 50K)  (Refund=Yes)  Evade=No
  • 3. (Example) R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians Name Blood Type Give Birth Can Fly Live in Water Class human warm yes no no mammals python cold no no no reptiles salmon cold no no yes fishes whale warm yes no yes mammals frog cold no no sometimes amphibians komodo cold no no no reptiles bat warm yes yes no mammals pigeon warm no yes no birds cat warm yes no no mammals leopard shark cold yes no yes fishes turtle cold no no sometimes reptiles penguin warm no no sometimes birds porcupine warm yes no no mammals eel cold no no yes fishes salamander cold no no sometimes amphibians gila monster cold no no no reptiles platypus warm no no no mammals owl warm no yes no birds dolphin warm yes no yes mammals eagle warm no yes no birds
  • 4. Application of Rule-Based Classifier • A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal Name Blood Type Give Birth Can Fly Live in Water Class hawk warm no yes no ? grizzly bear warm yes no no ?
  • 5. Rule Coverage and Accuracy • Coverage of a rule: – Fraction of records that satisfy the antecedent of a rule • Accuracy of a rule: – Fraction of records that satisfy both the antecedent and consequent of a rule (over those that satisfy the antecedent) Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (Status=Single)  No Coverage = 40%, Accuracy = 50%
  • 6. Decision Trees vs. rules From trees to rules. • Easy: converting a tree into a set of rules – One rule for each leaf: – Antecedent contains a condition for every node on the path from the root to the leaf – Consequent is the class assigned by the leaf – Straightforward, but rule set might be overly complex
  • 7. Decision Trees vs. rules From rules to trees • More difficult: transforming a rule set into a tree – Tree cannot easily express disjunction between rules • Example: If a and b then x If c and d then x – Corresponding tree contains identical subtrees (“replicated subtree problem”)
  • 8. A tree for a simple disjunction
  • 9. How does Rule-based Classifier Work? R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules Name Blood Type Give Birth Can Fly Live in Water Class lemur warm yes no no ? turtle cold no no sometimes ? dogfish shark cold yes no yes ?
  • 10. Desiderata for Rule-Based Classifier • Mutually exclusive rules – No two rules are triggered by the same record. – This ensures that every record is covered by at most one rule. • Exhaustive rules – There exists a rule for each combination of attribute values. – This ensures that every record is covered by at least one rule. Together these properties ensure that every record is covered by exactly one rule.
  • 11. Rules • Non mutually exclusive rules – A record may trigger more than one rule – Solution? • Ordered rule set • Non exhaustive rules – A record may not trigger any rules – Solution? • Use a default class
  • 12. Ordered Rule Set • Rules are ranked ordered according to their priority (e.g. based on their quality) – An ordered rule set is known as a decision list • When a test record is presented to the classifier – It is assigned to the class label of the highest ranked rule it has triggered – If none of the rules fired, it is assigned to the default class R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians Name Blood Type Give Birth Can Fly Live in Water Class turtle cold no no sometimes ?
  • 13. Building Classification Rules: Sequential Covering 1. Start from an empty rule 2. Grow a rule using some Learn-One-Rule function 3. Remove training records covered by the rule 4. Repeat Step (2) and (3) until stopping criterion is met (i) Original Data (ii) Step 1
  • 14. (iii) Step 2 R1 (iv) Step 3 R1 R2 • This approach is called a covering approach because at each stage a rule is identified that covers some of the instances
  • 15. Example: generating a rule • Possible rule set for class “b”: • More rules could be added for “perfect” rule set If x  1.2 then class = b If x > 1.2 and y  2.6 then class = b y x a b b b b b b b b b b b b b b a a a a a y a b b b b b b b b b b b b b b a a a a a x 1·2 2·6 y a b b b b b b b b b b b b b b a a a a a x 1·2
  • 16. A simple covering algorithm • Generates a rule by adding tests that maximize rule’s accuracy • Similar to situation in decision trees: problem of selecting an attribute to split on. – But: decision tree inducer maximizes overall purity • Here, each new test (growing the rule) reduces rule’s coverage. space of examples rule so far rule after adding new term
  • 17. Selecting a test • Goal: maximizing accuracy – t: total number of instances covered by rule – p: positive examples of the class covered by rule – t-p: number of errors made by rule Select test that maximizes the ratio p/t • We are finished when p/t = 1 or the set of instances can’t be split any further
  • 18. Example: contact lenses data Age Spectacle prescription Astigmatism Tear production rate Recommended Lenses young myope no reduced none young myope no normal soft young myope yes reduced none young myope yes normal hard young hypermetrope no reduced none young hypermetrope no normal soft young hypermetrope yes reduced none young hypermetrope yes normal hard pre-presbyopic myope no reduced none pre-presbyopic myope no normal soft pre-presbyopic myope yes reduced none pre-presbyopic myope yes normal hard pre-presbyopic hypermetrope no reduced none pre-presbyopic hypermetrope no normal soft pre-presbyopic hypermetrope yes reduced none pre-presbyopic hypermetrope yes normal none presbyopic myope no reduced none presbyopic myope no normal none presbyopic myope yes reduced none presbyopic myope yes normal hard presbyopic hypermetrope no reduced none presbyopic hypermetrope no normal soft presbyopic hypermetrope yes reduced none presbyopic hypermetrope yes normal none
  • 19. Example: contact lenses data The numbers on the right show the fraction of “correct” instances in the set singled out by that choice. In this case, correct means that their recommendation is “hard.”
  • 20. Modified rule and resulting data The rule isn’t very accurate, getting only 4 out of 12 that it covers. So, it needs further refinement.
  • 22. Modified rule and resulting data Should we stop here? Perhaps. But let’s say we are going for exact rules, no matter how complex they become. So, let’s refine further.
  • 25. Pseudo-code for PRISM For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E RIPPER Algorithm is similar. It uses instead of p/t the info gain. Heuristic: order C in ascending order of occurrence.
  • 26. Separate and conquer • Methods like PRISM (for dealing with one class) are separate-and-conquer algorithms: – First, a rule is identified – Then, all instances covered by the rule are separated out – Finally, the remaining instances are “conquered” • Difference to divide-and-conquer methods: – Subset covered by rule doesn’t need to be explored any further