SlideShare a Scribd company logo
Akre
University
for
Applied
Sciences
Classification: Alternative
Techniques
Kurdistan Regional Government – Iraq
Ministry of Higher Education and Scientific Research,
Akre University For Applied Sciences
Technical College of Informatics-Akre
Information Technology
MSC in Computer Sciences
Prepared by
Aqeel H.Younus 2023-2024
Supervised by
Prof. Dr. Eng. Adnan Mohsin Abdulazeez
Akre
University
for
Applied
Sciences
Rule-based Classifier
Akre
University
for
Applied
Sciences
Rule-Based Classifier
• Classify records by using a collection of “if…then…” rules
• Rule: (Condition)  y
• where
• Condition is a conjunction of tests on attributes
• y is the class label
• Examples of classification rules:
• (Blood Type=Warm)  (Lay Eggs=Yes)  Birds
• (Taxable Income < 50K)  (Refund=Yes)  Evade=No
Akre
University
for
Applied
Sciences
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Akre
University
for
Applied
Sciences
• A rule r covers an instance x if the attributes of the
instance satisfy the condition of the rule
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Application of Rule-Based Classifier
Akre
University
for
Applied
Sciences
Rule Coverage and Accuracy
• Coverage of a rule:
• Fraction of records that satisfy
the antecedent of a rule
• Coverage R=n /|D|
• Where
• n :# of tuples covered by R
• |D|:# of tuples in data set.
• Accuracy of a rule:
• Fraction of records that satisfy
the antecedent that also satisfy
the consequent of a rule
• Accuracy R =n / n
• N :# of tuple correctly by R
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single)  No
Coverage = 40%, Accuracy = 50%
covers
covers
covers
correct
correct
Akre
University
for
Applied
Sciences
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
How does Rule-based Classifier Work?
Akre
University
for
Applied
Sciences
Characteristics of Rule Sets: Strategy 1
• Mutually exclusive rules
• Classifier contains mutually exclusive rules if the rules are independent of
each other
• Every record is covered by at most one rule
• Exhaustive rules
• Classifier has exhaustive coverage if it accounts for every possible
combination of attribute values
• Each record is covered by at least one rule
Akre
University
for
Applied
Sciences
Characteristics of Rule Sets: Strategy 2
• Rules are not mutually exclusive
• A record may trigger more than one rule
• Solution?
• Ordered rule set
• Unordered rule set – use voting schemes
• Rules are not exhaustive
• A record may not trigger any rules
• Solution?
• Use a default class
Akre
University
for
Applied
Sciences
Ordered Rule Set
• Rules are rank ordered according to their priority
• An ordered rule set is known as a decision list
• When a test record is presented to the classifier
• It is assigned to the class label of the highest ranked rule it has triggered
• If none of the rules fired, it is assigned to the default class
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Akre
University
for
Applied
Sciences
Rule Ordering Schemes
• Rule-based ordering
• Individual rules are ranked based on their quality
• Class-based ordering
• Rules that belong to the same class appear together
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
Akre
University
for
Applied
Sciences
Direct Method:
Extract rules directly from data
Examples: RIPPER, CN2, Holte’s 1R , 0R , Sequential
Covering.
Indirect Method:
Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
Examples: C4.5rules
Building Classification Rules
Akre
University
for
Applied
Sciences
Direct Method: Sequential Covering
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met
Akre
University
for
Applied
Sciences
Example of Sequential Covering
(i) Original Data (ii) Step 1
Akre
University
for
Applied
Sciences
Example of Sequential Covering…
(iii) Step 2
R1
(iv) Step 3
R1
R2
Akre
University
for
Applied
Sciences
Rule Growing
• Two common strategies
Status =
Single
Status =
Divorced
Status =
Married
Income
> 80K
...
Yes: 3
No: 4
{ }
Yes: 0
No: 3
Refund=
No
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 3
No: 1
(a) General-to-specific
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Akre
University
for
Applied
Sciences
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Rule Growing
Akre
University
for
Applied
Sciences
Rule Evaluation
• Foil’s Information Gain
• R0: {} => class (initial rule)
• R1: {A} => class (rule after adding conjunct)
•
• 𝑝0: number of positive instances covered by R0
𝑛0: number of negative instances covered by R0
𝑝1: number of positive instances covered by R1
𝑛1: number of negative instances covered by R1
FOIL: First Order
Inductive Learner – an
early rule-based
learning algorithm
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2
𝑝1
𝑝1 + 𝑛1
− 𝑙𝑜𝑔2
𝑝0
𝑝0 + 𝑛0
]
Akre
University
for
Applied
Sciences
Example
Akre
University
for
Applied
Sciences
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2
𝑝1
𝑝1 + 𝑛1
− 𝑙𝑜𝑔2
𝑝0
𝑝0 + 𝑛0
]
Coverage R=n /|D|
covers
Accuracy R =n / n
covers
correct
R0: {} Mammals
Coverage R0 =
15
15
= 1
Accuracy R0 =
5
15
= 0.333
Candidate rule P1 N1 Accuracy Info Gian
{ skin cover =hair} Mammals 3 0 3/3 =1 [100 %] 4.755
{ Body temp =warm} Mammals 5 2 5/7 = 0.714 5.498
{has legs= no} Mammals 1 4 1/5 = 0.2 - 0.737
P0 =5
N0 =10
Akre
University
for
Applied
Sciences
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2
𝑝1
𝑝1 + 𝑛1
− 𝑙𝑜𝑔2
𝑝0
𝑝0 + 𝑛0
]
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 3 × 𝑙𝑜𝑔2
3
3 + 0
− 𝑙𝑜𝑔2
5
5 + 10
=4.755
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 5 × 𝑙𝑜𝑔2
5
5 + 2
− 𝑙𝑜𝑔2
5
5 + 10
=5.498
𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 1 × 𝑙𝑜𝑔2
1
1 + 4
− 𝑙𝑜𝑔2
5
5 + 10
= - 0.737
Akre
University
for
Applied
Sciences
Outlook Temp Humidity Windy Play
Overcast 3 81 False Long
Sunny 12 80 False Long
Sunny 15 70 True Long
Overcast 1 85 True Medium
Overcast 2 96 False Medium
Rainy 12 95 False Medium
Overcast 0 96 True Short
Rainy 5 95 False Short
Rainy 9 92 True Short
Example of Direct Method: oneR (1R) algorithm
Akre
University
for
Applied
Sciences
1.Applying the 1R algorithm to determine a single rule that enables
predicting the value of the attribute (play), assuming that the minimum
bachet size =2 ?
2. Determine the classification of the next instance based on the results of
the previous phase?
Outlook Temp Humidity Windy Play
Sunny 5 75 true ?
Akre
University
for
Applied
Sciences
Converting digital data into nominal data
Temp :
0 1 2 3 5 9 12 12 15
s m m L s s m L L
A B C
humidity :
70 80 81 85 92 95 95 96 96
L L L m s s m m s
D E F
Outlook Temp Humidit
y
Windy Play
Overcast 3 81 False Long
Sunny 12 80 False Long
Sunny 15 70 True Long
Overcast 1 85 True Mediu
m
Overcast 2 96 False Mediu
m
Rainy 12 95 False Mediu
m
Overcast 0 96 True Short
Rainy 5 95 False Short
Rainy 9 92 True Short
Akre
University
for
Applied
Sciences
Attribute Rules Errors Total errors
Outlook overcast----
medium
Sunny-------long
Rainy --------short
2/4
0/2
1/3
2+0+1/9=3/9
Temp A------medium
B-------short
C--------long
1/3
1/3
1/3
3/9
Outlook Tem
p
Humid
ity
Windy Play
Overcast 3 81 False Long
Sunny 12 80 False Long
Sunny 15 70 True Long
Overcast 1 85 True Mediu
m
Overcast 2 96 False Mediu
m
Rainy 12 95 False Mediu
m
Overcast 0 96 True Short
Rainy 5 95 False Short
Rainy 9 92 True Short
Akre
University
for
Applied
Sciences
Humidity D------long
E------short
F------medium
0/3
1/3
1/3
2/9
windy True--------short
False* -------long
2/4
3/5
5/9
We take the smallest error value and it becomes the accepted rule is humidity = 2/9
Outlook Temp Humidity Windy Play
Sunny 5 75 true ?
The answer or classification to the second requirement is a play
(long) according to the rule .
Akre
University
for
Applied
Sciences
Direct Method: RIPPER
• It stands for Repeated Incremental Pruning to Produce Error Reduction
• For 2-class problem, choose one of the classes as positive class, and the other as
negative class
• Learn rules for positive class
• Negative class will be default class
• For multi-class problem
• Order the classes according to increasing class prevalence (fraction of instances
that belong to a particular class)
• Learn the rule set for smallest class first, treat the rest as negative class
• Repeat with next smallest class as positive class
Akre
University
for
Applied
Sciences
Direct Method: RIPPER
• Growing a rule:
• Start from empty rule
• Add conjuncts as long as they improve FOIL’s information gain
• Stop when rule no longer covers negative examples
• Prune the rule immediately using incremental reduced error pruning
• Measure for pruning: v = (p-n)/(p+n)
• p: number of positive examples covered by the rule in
the validation set
• n: number of negative examples covered by the rule in
the validation set
• Pruning method: delete any final sequence of conditions that maximizes v
Akre
University
for
Applied
Sciences
Direct Method: RIPPER
• Building a Rule Set:
• Use sequential covering algorithm
• Finds the best rule that covers the current set of positive examples
• Eliminate both positive and negative examples covered by the rule
• Each time a rule is added to the rule set, compute the new description length
• Stop adding new rules when the new description length is d bits longer than the
smallest description length obtained so far
Akre
University
for
Applied
Sciences
Example of Indirect Method:
Rule Set
r1: (P=No,Q=No) ==> -
r2: (P=No,Q=Yes) ==> +
r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> -
r5: (P=Yes,R=Yes,Q=Yes) ==> +
P
Q R
Q
- + +
- +
No No
No
Yes Yes
Yes
No Yes
Akre
University
for
Applied
Sciences
Indirect Method: C4.5rules
• Extract rules from an unpruned decision tree
• For each rule, r: A  y,
• consider an alternative rule r′: A′  y where A′ is obtained by removing one
of the conjuncts in A
• Compare the pessimistic error rate for r against all r’s
• Prune if one of the alternative rules has lower pessimistic error rate
• Repeat until we can no longer improve generalization error
Akre
University
for
Applied
Sciences
Indirect Method: C4.5rules
• Instead of ordering the rules, order subsets of rules (class ordering)
• Each subset is a collection of rules with the same rule consequent (class)
• Compute description length of each subset
• Description length = L(error) + g L(model)
• g is a parameter that takes into account the presence of redundant attributes in a rule
set
(default value = 0.5)
Akre
University
for
Applied
Sciences
Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Akre
University
for
Applied
Sciences
C4.5 versus C4.5rules versus RIPPER
C4.5rules:
(Give Birth=No, Can Fly=Yes)  Birds
(Give Birth=No, Live in Water=Yes)  Fishes
(Give Birth=Yes)  Mammals
(Give Birth=No, Can Fly=No, Live in Water=No)  Reptiles
( )  Amphibians
Give
Birth?
Live In
Water?
Can
Fly?
Mammals
Fishes Amphibians
Birds Reptiles
Yes No
Yes
Sometimes
No
Yes No
RIPPER:
(Live in Water=Yes)  Fishes
(Have Legs=No)  Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
 Reptiles
(Can Fly=Yes,Give Birth=No)  Birds
()  Mammals
Akre
University
for
Applied
Sciences
C4.5 versus C4.5rules versus RIPPER
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 0 0 0 0 2
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
Mammals 0 2 1 0 4
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 2 0 0 0 0
CLASS Fishes 0 2 0 0 1
Reptiles 1 0 3 0 0
Birds 1 0 0 3 0
Mammals 0 0 1 0 6
C4.5 and C4.5rules:
RIPPER:
Akre
University
for
Applied
Sciences
Advantages of Rule Based Data Mining
Classifiers
1.Highly expressive.
2.Easy to interpret.
3.Easy to generate.
4.Capability to classify new records rapidly.
5.Performance is comparable to other classifiers.
Akre
University
for
Applied
Sciences
Nearest Neighbor Classifiers
Akre
University
for
Applied
Sciences
Nearest Neighbor Classifiers
• Basic idea:
• If it walks like a duck, quacks like a duck, then it’s probably a
duck
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
Akre
University
for
Applied
Sciences
Nearest-Neighbor Classifiers
 Requires the following:
– A set of labeled records
– Proximity metric to compute
distance/similarity between a
pair of records
– e.g., Euclidean distance
– The value of k, the number of
nearest neighbors to retrieve
– A method for using class
labels of K nearest neighbors
to determine the class label of
unknown record (e.g., by
taking majority vote)
Unknown record
Akre
University
for
Applied
Sciences
How to Determine the class label of a Test Sample?
• Take the majority vote of class labels among the k-
nearest neighbors
• Weight the vote according to distance
• weight factor, 𝑤 = 1/𝑑2
Akre
University
for
Applied
Sciences
Nearest Neighbor Classification…
• Data preprocessing is often required
• Attributes may have to be scaled to prevent distance measures from being
dominated by one of the attributes
• Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
• Time series are often standardized to have 0 means a standard deviation of 1
Akre
University
for
Applied
Sciences
Nearest Neighbor Classification…
• Choosing the value of k:
• If k is too small, sensitive to noise points
• If k is too large, neighborhood may include points from
other classes
X
Akre
University
for
Applied
Sciences
Nearest-neighbor classifiers
1-nn decision boundary is
a Voronoi Diagram
 Nearest neighbor
classifiers are local
classifiers
 They can produce
decision boundaries of
arbitrary shapes.
Akre
University
for
Applied
Sciences
Nearest Neighbor Classification…
• How to handle missing values in training and test sets?
• Proximity computations normally require the presence of all attributes
• Some approaches use the subset of attributes present in two instances
• This may not produce good results since it effectively uses different proximity measures
for each pair of instances
• Thus, proximities are not comparable
Akre
University
for
Applied
Sciences
K-NN Classificiers…
Handling Irrelevant and Redundant Attributes
• Irrelevant attributes add noise to the proximity measure
• Redundant attributes bias the proximity measure towards certain attributes
Akre
University
for
Applied
Sciences
K-NN Classifiers: Handling attributes that are interacting
Akre
University
for
Applied
Sciences
Handling attributes that are interacting
Akre
University
for
Applied
Sciences
How does K-NN work?
KNN has the following basic steps:
1.Selecta value k.
2.Determine which distancefunctionis to be used (Euclidean).
3.Sortthe distances obtained and take the k-nearest data
samples.
4.Assignthe test class to the class based on the majority vote of
its k neighbors.
Akre
University
for
Applied
Sciences
Example:
This dataset is about Iris flower . In this dataset, we have 3 attributes
which have sepal length,sepal width, and species.in Specieswe have three
target attribute (Setosa, Virginia, and Versicolor) and our target finds the
nearest species which belong from three species using the k-Nearest
Neighbors.
Akre
University
for
Applied
Sciences
Target:New flower found, need to classify “Unlabeled”.Feature
of the new unlabeled flower:
Akre
University
for
Applied
Sciences
Solution:Step 1: Find Distance
Our first step is to find distance using the Euclidean distance
between ActualandObserved sepal length and sepal width. For
the firstinstance dataset
X = Observed sepal length=5.2
Y = Observed sepal width=3.1
Now Actual which is given in the dataset
A = Actual sepal length = 5.3
B = Actual sepal width=3.7
Akre
University
for
Applied
Sciences
Distance formula:
Euclidean distance=
Akre
University
for
Applied
Sciences
This is for firstinstances which I find the distance similarly find
allthe remaining instances distance as shown below in the table
Akre
University
for
Applied
Sciences
Step 2: FindRank:
In this step, we findthe Rankafter findingthe Distance. Rank basically
gives the number according to ascending order distance. As you can see
below table:
If we see the above table then instance number 5has a minimum
distance 0.22so gave him rank as below table
Akre
University
for
Applied
Sciences
Similarly, find the rankfor allother instances as shown below the
table
Akre
University
for
Applied
Sciences
Step3: Findthe NearestNeighbor:
Our last step finds the nearest neighbors on the basis of distance
and rank we can find our Unknown on the basis of species.
According to rank find the k Nearest Neighbor
for k=1
Feature Species is Setosa so K=1 is Setosa
Akre
University
for
Applied
Sciences
for k=2
Feature Species is Setosa because no other species is found for
so K=2 is Setosa
Akre
University
for
Applied
Sciences
For k=5
Akre
University
for
Applied
Sciences
Feature Species is Setosa because a majority vote for setosa=3 and
virginica=1 and virginica =1 so on the basis of highest vote KNN for K=5 is
Setosa.
Akre
University
for
Applied
Sciences
Improving KNN Efficiency
• Avoid having to compute distance to all objects in the training set
• Multi-dimensional access methods (k-d trees)
• Fast approximate similarity search
• Locality Sensitive Hashing (LSH)
• Condensing
• Determine a smaller set of objects that give the same performance
• Editing
• Remove objects to improve efficiency
Akre
University
for
Applied
Sciences
61

More Related Content

PPT
Rule-Based Classifiers
PPT
rules classifier in machine learning .ppt
PDF
Machine Learning - Classification (ctd.)
PPTX
chap4_rule_based data mining power point.
PPTX
lec06_Classification_NaiveBayes_RuleBased.pptx
PPT
Ch5 alternative classification
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
Rule-Based Classifiers
rules classifier in machine learning .ppt
Machine Learning - Classification (ctd.)
chap4_rule_based data mining power point.
lec06_Classification_NaiveBayes_RuleBased.pptx
Ch5 alternative classification
2024 Trend Updates: What Really Works In SEO & Content Marketing
Storytelling For The Web: Integrate Storytelling in your Design Process

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Lesson notes of climatology university.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Presentation on HIE in infants and its manifestations
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Computing-Curriculum for Schools in Ghana
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Final Presentation General Medicine 03-08-2024.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
VCE English Exam - Section C Student Revision Booklet
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Lesson notes of climatology university.
Abdominal Access Techniques with Prof. Dr. R K Mishra
Presentation on HIE in infants and its manifestations
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Module 4: Burden of Disease Tutorial Slides S2 2025
Ad
Ad

Classification: Alternative Techniques and Nearest Neighbor Classifiers

  • 1. Akre University for Applied Sciences Classification: Alternative Techniques Kurdistan Regional Government – Iraq Ministry of Higher Education and Scientific Research, Akre University For Applied Sciences Technical College of Informatics-Akre Information Technology MSC in Computer Sciences Prepared by Aqeel H.Younus 2023-2024 Supervised by Prof. Dr. Eng. Adnan Mohsin Abdulazeez
  • 3. Akre University for Applied Sciences Rule-Based Classifier • Classify records by using a collection of “if…then…” rules • Rule: (Condition)  y • where • Condition is a conjunction of tests on attributes • y is the class label • Examples of classification rules: • (Blood Type=Warm)  (Lay Eggs=Yes)  Birds • (Taxable Income < 50K)  (Refund=Yes)  Evade=No
  • 4. Akre University for Applied Sciences Rule-based Classifier (Example) Name Blood Type Give Birth Can Fly Live in Water Class human warm yes no no mammals python cold no no no reptiles salmon cold no no yes fishes whale warm yes no yes mammals frog cold no no sometimes amphibians komodo cold no no no reptiles bat warm yes yes no mammals pigeon warm no yes no birds cat warm yes no no mammals leopard shark cold yes no yes fishes turtle cold no no sometimes reptiles penguin warm no no sometimes birds porcupine warm yes no no mammals eel cold no no yes fishes salamander cold no no sometimes amphibians gila monster cold no no no reptiles platypus warm no no no mammals owl warm no yes no birds dolphin warm yes no yes mammals eagle warm no yes no birds R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians
  • 5. Akre University for Applied Sciences • A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians Name Blood Type Give Birth Can Fly Live in Water Class hawk warm no yes no ? grizzly bear warm yes no no ? The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal Application of Rule-Based Classifier
  • 6. Akre University for Applied Sciences Rule Coverage and Accuracy • Coverage of a rule: • Fraction of records that satisfy the antecedent of a rule • Coverage R=n /|D| • Where • n :# of tuples covered by R • |D|:# of tuples in data set. • Accuracy of a rule: • Fraction of records that satisfy the antecedent that also satisfy the consequent of a rule • Accuracy R =n / n • N :# of tuple correctly by R Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 (Status=Single)  No Coverage = 40%, Accuracy = 50% covers covers covers correct correct
  • 7. Akre University for Applied Sciences R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules Name Blood Type Give Birth Can Fly Live in Water Class lemur warm yes no no ? turtle cold no no sometimes ? dogfish shark cold yes no yes ? How does Rule-based Classifier Work?
  • 8. Akre University for Applied Sciences Characteristics of Rule Sets: Strategy 1 • Mutually exclusive rules • Classifier contains mutually exclusive rules if the rules are independent of each other • Every record is covered by at most one rule • Exhaustive rules • Classifier has exhaustive coverage if it accounts for every possible combination of attribute values • Each record is covered by at least one rule
  • 9. Akre University for Applied Sciences Characteristics of Rule Sets: Strategy 2 • Rules are not mutually exclusive • A record may trigger more than one rule • Solution? • Ordered rule set • Unordered rule set – use voting schemes • Rules are not exhaustive • A record may not trigger any rules • Solution? • Use a default class
  • 10. Akre University for Applied Sciences Ordered Rule Set • Rules are rank ordered according to their priority • An ordered rule set is known as a decision list • When a test record is presented to the classifier • It is assigned to the class label of the highest ranked rule it has triggered • If none of the rules fired, it is assigned to the default class R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians Name Blood Type Give Birth Can Fly Live in Water Class turtle cold no no sometimes ?
  • 11. Akre University for Applied Sciences Rule Ordering Schemes • Rule-based ordering • Individual rules are ranked based on their quality • Class-based ordering • Rules that belong to the same class appear together Rule-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No Class-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Married}) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes
  • 12. Akre University for Applied Sciences Direct Method: Extract rules directly from data Examples: RIPPER, CN2, Holte’s 1R , 0R , Sequential Covering. Indirect Method: Extract rules from other classification models (e.g. decision trees, neural networks, etc). Examples: C4.5rules Building Classification Rules
  • 13. Akre University for Applied Sciences Direct Method: Sequential Covering 1. Start from an empty rule 2. Grow a rule using the Learn-One-Rule function 3. Remove training records covered by the rule 4. Repeat Step (2) and (3) until stopping criterion is met
  • 14. Akre University for Applied Sciences Example of Sequential Covering (i) Original Data (ii) Step 1
  • 15. Akre University for Applied Sciences Example of Sequential Covering… (iii) Step 2 R1 (iv) Step 3 R1 R2
  • 16. Akre University for Applied Sciences Rule Growing • Two common strategies Status = Single Status = Divorced Status = Married Income > 80K ... Yes: 3 No: 4 { } Yes: 0 No: 3 Refund= No Yes: 3 No: 4 Yes: 2 No: 1 Yes: 1 No: 0 Yes: 3 No: 1 (a) General-to-specific Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10
  • 17. Akre University for Applied Sciences Refund=No, Status=Single, Income=85K (Class=Yes) Refund=No, Status=Single, Income=90K (Class=Yes) Refund=No, Status = Single (Class = Yes) (b) Specific-to-general Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Rule Growing
  • 18. Akre University for Applied Sciences Rule Evaluation • Foil’s Information Gain • R0: {} => class (initial rule) • R1: {A} => class (rule after adding conjunct) • • 𝑝0: number of positive instances covered by R0 𝑛0: number of negative instances covered by R0 𝑝1: number of positive instances covered by R1 𝑛1: number of negative instances covered by R1 FOIL: First Order Inductive Learner – an early rule-based learning algorithm 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2 𝑝1 𝑝1 + 𝑛1 − 𝑙𝑜𝑔2 𝑝0 𝑝0 + 𝑛0 ]
  • 20. Akre University for Applied Sciences 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2 𝑝1 𝑝1 + 𝑛1 − 𝑙𝑜𝑔2 𝑝0 𝑝0 + 𝑛0 ] Coverage R=n /|D| covers Accuracy R =n / n covers correct R0: {} Mammals Coverage R0 = 15 15 = 1 Accuracy R0 = 5 15 = 0.333 Candidate rule P1 N1 Accuracy Info Gian { skin cover =hair} Mammals 3 0 3/3 =1 [100 %] 4.755 { Body temp =warm} Mammals 5 2 5/7 = 0.714 5.498 {has legs= no} Mammals 1 4 1/5 = 0.2 - 0.737 P0 =5 N0 =10
  • 21. Akre University for Applied Sciences 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 𝑝1 × [ 𝑙𝑜𝑔2 𝑝1 𝑝1 + 𝑛1 − 𝑙𝑜𝑔2 𝑝0 𝑝0 + 𝑛0 ] 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 3 × 𝑙𝑜𝑔2 3 3 + 0 − 𝑙𝑜𝑔2 5 5 + 10 =4.755 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 5 × 𝑙𝑜𝑔2 5 5 + 2 − 𝑙𝑜𝑔2 5 5 + 10 =5.498 𝐺𝑎𝑖𝑛 𝑅0, 𝑅1 = 1 × 𝑙𝑜𝑔2 1 1 + 4 − 𝑙𝑜𝑔2 5 5 + 10 = - 0.737
  • 22. Akre University for Applied Sciences Outlook Temp Humidity Windy Play Overcast 3 81 False Long Sunny 12 80 False Long Sunny 15 70 True Long Overcast 1 85 True Medium Overcast 2 96 False Medium Rainy 12 95 False Medium Overcast 0 96 True Short Rainy 5 95 False Short Rainy 9 92 True Short Example of Direct Method: oneR (1R) algorithm
  • 23. Akre University for Applied Sciences 1.Applying the 1R algorithm to determine a single rule that enables predicting the value of the attribute (play), assuming that the minimum bachet size =2 ? 2. Determine the classification of the next instance based on the results of the previous phase? Outlook Temp Humidity Windy Play Sunny 5 75 true ?
  • 24. Akre University for Applied Sciences Converting digital data into nominal data Temp : 0 1 2 3 5 9 12 12 15 s m m L s s m L L A B C humidity : 70 80 81 85 92 95 95 96 96 L L L m s s m m s D E F Outlook Temp Humidit y Windy Play Overcast 3 81 False Long Sunny 12 80 False Long Sunny 15 70 True Long Overcast 1 85 True Mediu m Overcast 2 96 False Mediu m Rainy 12 95 False Mediu m Overcast 0 96 True Short Rainy 5 95 False Short Rainy 9 92 True Short
  • 25. Akre University for Applied Sciences Attribute Rules Errors Total errors Outlook overcast---- medium Sunny-------long Rainy --------short 2/4 0/2 1/3 2+0+1/9=3/9 Temp A------medium B-------short C--------long 1/3 1/3 1/3 3/9 Outlook Tem p Humid ity Windy Play Overcast 3 81 False Long Sunny 12 80 False Long Sunny 15 70 True Long Overcast 1 85 True Mediu m Overcast 2 96 False Mediu m Rainy 12 95 False Mediu m Overcast 0 96 True Short Rainy 5 95 False Short Rainy 9 92 True Short
  • 26. Akre University for Applied Sciences Humidity D------long E------short F------medium 0/3 1/3 1/3 2/9 windy True--------short False* -------long 2/4 3/5 5/9 We take the smallest error value and it becomes the accepted rule is humidity = 2/9 Outlook Temp Humidity Windy Play Sunny 5 75 true ? The answer or classification to the second requirement is a play (long) according to the rule .
  • 27. Akre University for Applied Sciences Direct Method: RIPPER • It stands for Repeated Incremental Pruning to Produce Error Reduction • For 2-class problem, choose one of the classes as positive class, and the other as negative class • Learn rules for positive class • Negative class will be default class • For multi-class problem • Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class) • Learn the rule set for smallest class first, treat the rest as negative class • Repeat with next smallest class as positive class
  • 28. Akre University for Applied Sciences Direct Method: RIPPER • Growing a rule: • Start from empty rule • Add conjuncts as long as they improve FOIL’s information gain • Stop when rule no longer covers negative examples • Prune the rule immediately using incremental reduced error pruning • Measure for pruning: v = (p-n)/(p+n) • p: number of positive examples covered by the rule in the validation set • n: number of negative examples covered by the rule in the validation set • Pruning method: delete any final sequence of conditions that maximizes v
  • 29. Akre University for Applied Sciences Direct Method: RIPPER • Building a Rule Set: • Use sequential covering algorithm • Finds the best rule that covers the current set of positive examples • Eliminate both positive and negative examples covered by the rule • Each time a rule is added to the rule set, compute the new description length • Stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far
  • 30. Akre University for Applied Sciences Example of Indirect Method: Rule Set r1: (P=No,Q=No) ==> - r2: (P=No,Q=Yes) ==> + r3: (P=Yes,R=No) ==> + r4: (P=Yes,R=Yes,Q=No) ==> - r5: (P=Yes,R=Yes,Q=Yes) ==> + P Q R Q - + + - + No No No Yes Yes Yes No Yes
  • 31. Akre University for Applied Sciences Indirect Method: C4.5rules • Extract rules from an unpruned decision tree • For each rule, r: A  y, • consider an alternative rule r′: A′  y where A′ is obtained by removing one of the conjuncts in A • Compare the pessimistic error rate for r against all r’s • Prune if one of the alternative rules has lower pessimistic error rate • Repeat until we can no longer improve generalization error
  • 32. Akre University for Applied Sciences Indirect Method: C4.5rules • Instead of ordering the rules, order subsets of rules (class ordering) • Each subset is a collection of rules with the same rule consequent (class) • Compute description length of each subset • Description length = L(error) + g L(model) • g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)
  • 33. Akre University for Applied Sciences Example Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class human yes no no no yes mammals python no yes no no no reptiles salmon no yes no yes no fishes whale yes no no yes no mammals frog no yes no sometimes yes amphibians komodo no yes no no yes reptiles bat yes no yes no yes mammals pigeon no yes yes no yes birds cat yes no no no yes mammals leopard shark yes no no yes no fishes turtle no yes no sometimes yes reptiles penguin no yes no sometimes yes birds porcupine yes no no no yes mammals eel no yes no yes no fishes salamander no yes no sometimes yes amphibians gila monster no yes no no yes reptiles platypus no yes no no yes mammals owl no yes yes no yes birds dolphin yes no no yes no mammals eagle no yes yes no yes birds
  • 34. Akre University for Applied Sciences C4.5 versus C4.5rules versus RIPPER C4.5rules: (Give Birth=No, Can Fly=Yes)  Birds (Give Birth=No, Live in Water=Yes)  Fishes (Give Birth=Yes)  Mammals (Give Birth=No, Can Fly=No, Live in Water=No)  Reptiles ( )  Amphibians Give Birth? Live In Water? Can Fly? Mammals Fishes Amphibians Birds Reptiles Yes No Yes Sometimes No Yes No RIPPER: (Live in Water=Yes)  Fishes (Have Legs=No)  Reptiles (Give Birth=No, Can Fly=No, Live In Water=No)  Reptiles (Can Fly=Yes,Give Birth=No)  Birds ()  Mammals
  • 35. Akre University for Applied Sciences C4.5 versus C4.5rules versus RIPPER PREDICTED CLASS Amphibians Fishes Reptiles Birds Mammals ACTUAL Amphibians 0 0 0 0 2 CLASS Fishes 0 3 0 0 0 Reptiles 0 0 3 0 1 Birds 0 0 1 2 1 Mammals 0 2 1 0 4 PREDICTED CLASS Amphibians Fishes Reptiles Birds Mammals ACTUAL Amphibians 2 0 0 0 0 CLASS Fishes 0 2 0 0 1 Reptiles 1 0 3 0 0 Birds 1 0 0 3 0 Mammals 0 0 1 0 6 C4.5 and C4.5rules: RIPPER:
  • 36. Akre University for Applied Sciences Advantages of Rule Based Data Mining Classifiers 1.Highly expressive. 2.Easy to interpret. 3.Easy to generate. 4.Capability to classify new records rapidly. 5.Performance is comparable to other classifiers.
  • 38. Akre University for Applied Sciences Nearest Neighbor Classifiers • Basic idea: • If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records
  • 39. Akre University for Applied Sciences Nearest-Neighbor Classifiers  Requires the following: – A set of labeled records – Proximity metric to compute distance/similarity between a pair of records – e.g., Euclidean distance – The value of k, the number of nearest neighbors to retrieve – A method for using class labels of K nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Unknown record
  • 40. Akre University for Applied Sciences How to Determine the class label of a Test Sample? • Take the majority vote of class labels among the k- nearest neighbors • Weight the vote according to distance • weight factor, 𝑤 = 1/𝑑2
  • 41. Akre University for Applied Sciences Nearest Neighbor Classification… • Data preprocessing is often required • Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes • Example: • height of a person may vary from 1.5m to 1.8m • weight of a person may vary from 90lb to 300lb • income of a person may vary from $10K to $1M • Time series are often standardized to have 0 means a standard deviation of 1
  • 42. Akre University for Applied Sciences Nearest Neighbor Classification… • Choosing the value of k: • If k is too small, sensitive to noise points • If k is too large, neighborhood may include points from other classes X
  • 43. Akre University for Applied Sciences Nearest-neighbor classifiers 1-nn decision boundary is a Voronoi Diagram  Nearest neighbor classifiers are local classifiers  They can produce decision boundaries of arbitrary shapes.
  • 44. Akre University for Applied Sciences Nearest Neighbor Classification… • How to handle missing values in training and test sets? • Proximity computations normally require the presence of all attributes • Some approaches use the subset of attributes present in two instances • This may not produce good results since it effectively uses different proximity measures for each pair of instances • Thus, proximities are not comparable
  • 45. Akre University for Applied Sciences K-NN Classificiers… Handling Irrelevant and Redundant Attributes • Irrelevant attributes add noise to the proximity measure • Redundant attributes bias the proximity measure towards certain attributes
  • 48. Akre University for Applied Sciences How does K-NN work? KNN has the following basic steps: 1.Selecta value k. 2.Determine which distancefunctionis to be used (Euclidean). 3.Sortthe distances obtained and take the k-nearest data samples. 4.Assignthe test class to the class based on the majority vote of its k neighbors.
  • 49. Akre University for Applied Sciences Example: This dataset is about Iris flower . In this dataset, we have 3 attributes which have sepal length,sepal width, and species.in Specieswe have three target attribute (Setosa, Virginia, and Versicolor) and our target finds the nearest species which belong from three species using the k-Nearest Neighbors.
  • 50. Akre University for Applied Sciences Target:New flower found, need to classify “Unlabeled”.Feature of the new unlabeled flower:
  • 51. Akre University for Applied Sciences Solution:Step 1: Find Distance Our first step is to find distance using the Euclidean distance between ActualandObserved sepal length and sepal width. For the firstinstance dataset X = Observed sepal length=5.2 Y = Observed sepal width=3.1 Now Actual which is given in the dataset A = Actual sepal length = 5.3 B = Actual sepal width=3.7
  • 53. Akre University for Applied Sciences This is for firstinstances which I find the distance similarly find allthe remaining instances distance as shown below in the table
  • 54. Akre University for Applied Sciences Step 2: FindRank: In this step, we findthe Rankafter findingthe Distance. Rank basically gives the number according to ascending order distance. As you can see below table: If we see the above table then instance number 5has a minimum distance 0.22so gave him rank as below table
  • 55. Akre University for Applied Sciences Similarly, find the rankfor allother instances as shown below the table
  • 56. Akre University for Applied Sciences Step3: Findthe NearestNeighbor: Our last step finds the nearest neighbors on the basis of distance and rank we can find our Unknown on the basis of species. According to rank find the k Nearest Neighbor for k=1 Feature Species is Setosa so K=1 is Setosa
  • 57. Akre University for Applied Sciences for k=2 Feature Species is Setosa because no other species is found for so K=2 is Setosa
  • 59. Akre University for Applied Sciences Feature Species is Setosa because a majority vote for setosa=3 and virginica=1 and virginica =1 so on the basis of highest vote KNN for K=5 is Setosa.
  • 60. Akre University for Applied Sciences Improving KNN Efficiency • Avoid having to compute distance to all objects in the training set • Multi-dimensional access methods (k-d trees) • Fast approximate similarity search • Locality Sensitive Hashing (LSH) • Condensing • Determine a smaller set of objects that give the same performance • Editing • Remove objects to improve efficiency