Introduction to classification K-Nearest Neighbour

LEC-03
INTRODUCTION TO CLASSIFICATION
K - NEAREST NEIGHBOUR

Nearest Neighbour
• Mainly used when all attribute values are continuous
• It can be modified to deal with categorical attributes
• The idea is to estimate the classification of an unseen instance using the
classification of the instance or instances that are closest to it, in some
sense that we need to define (classifies new cases based on a similarity
measure)

Nearest Neighbour
• What should its classification be?
• Even without knowing what the six attributes represent, it seems
intuitively obvious that the unseen instance is nearer to the first instance
than to the second.

K - Nearest Neighbour (KNN)
• In practice there are likely to be many more instances in the training set
but the same principle applies.
• It is usual to base the classification on those of the k nearest neighbours,
not just the nearest one.
• The method is then known as k-Nearest Neighbour or just k-NN
classification

KNN
• We can illustrate k-NN classification diagrammatically when the
dimension (i.e. the number of attributes) is small.
• Next we will see an example which illustrates the case where the
dimension is just 2.
• In real-world data mining applications it can of course be considerably
larger.

KNN
• A training set with 20 instances, each giving the
values of two attributes and an associated
classification
• How can we estimate the classification for an
‘unseen’ instance where the first and second
attributes are 9.1 and 11.0, respectively?

Introduction to classification K-Nearest Neighbour

KNN
• For this small number of attributes we can represent
the training set as 20 points on a two-dimensional
graph with values of the first and second attributes
measured along the horizontal and vertical axes,
respectively.
• Each point is labelled with a + or − symbol to
indicate that the classification is positive or
negative, respectively.

KNN
• A circle has been added to enclose the five nearest
neighbours of the unseen instance, which is shown
as a small circle close to the centre of the larger
one.

KNN
• The five nearest neighbours are labelled with three
+ signs and two − signs
• So a basic 5-NN classifier would classify the unseen
instance as ‘positive’ by a form of majority voting.

KNN
• We can represent two points in two dimensions (‘in two-dimensional
space’ is the usual term) as (a1,a2) and (b1,b2)
• When there are three attributes we can represent the points by (a1,a2,a3)
and (b1,b2,b3)
• When there are n attributes, we can represent the instances by the points
(a1,a2,.. .,an) and (b1,b2,...,bn) in ‘n-dimensional space’

Distance Measures: Euclidean Distance
• If we denote an instance in the training set by (a1, a2) and the unseen
instance by (b1,b2) the length of the straight line joining the points is
• If there are two points (a1, a2, a3) and (b1, b2, b3) in a three-dimensional
space the corresponding formula is
• The formula for Euclidean distance between points (a1, a2, . . . , an) and
(b1, b2, . . . , bn) in n-dimensional space is a generalisation of these two
results.The Euclidean distance is given by the formula

Distance Measures: Manhattan Distance
• The City Block distance between the points (4, 2) and (12, 9) is (12 − 4) +
(9 − 2) = 8 + 7 = 15

Introduction to classification K-Nearest Neighbour

More Related Content

Similar to Introduction to classification K-Nearest Neighbour (20)

Recently uploaded (20)

Introduction to classification K-Nearest Neighbour