k-NN Algorithm.pptx

Intro to k-NN
USED IN CLASSIFICATION PROBLEMS

Identify the Genre and Group them

Cluster – Action Vs Comedy
Action
Comedy
Comedy &
Action

Hotstar Page – a Good Example of K-NN
Thriller | Action | Drama | Romance | Comedy

Nearest Neighbor Classification
 The Nearest Neighbors are defined by their characteristics of
class and using them to classify the unlabeled set of data
 Suitable for Classification Tasks where the relationship
between features and target class are numerous, complex and
extremely difficult to understand.
 Computer Vision Applications
 Optical Character Recognition
 Predicting if a person will enjoy music or a movie based on recommendations
 Patterns in Genetic Data
 Detecting Diseases

Identify the Class of Red Star

What k-
NN does
Whether a new point should be
Classified as Red Point or Green Point,
This is where k-NN comes.

Euclidean Distance
Default Value of ‘K’ is 05

Brute Force
How do we choose neighbours?
Ans. Brute Force
Lets consider for simple case with two dimension plot. If we
look mathematically, the simple intuition is to calculate the
Euclidean distance from point of interest ( of whose class
we need to determine) to all the points in training set.
Then we take class with majority points. This is called Brute
Force method.
Remember that Brute Force performs worst when there are
large dimensions and large training sets. With larger
dimensions, it will take longer time. This is called the “curse
of dimensionality”.

Blind Taste Experience case study
 Blind Taste Experience involves some people going in a restaurant and tasting food in darkness.
 In Mystery Meal, people are asked to mark the food on two parameters – Crunchy and Sweet
 Scale used – 1 to 10 (10 being highest and 01 being lowest)
 The food products are labeled as follows:

Tomato Family
 Notice the pattern of Veggies,
Fruits and Proteins
 Locating the tomato’s nearest
neighbor requires a distance
formula.
 k-NN uses EUCLIDEAN
DISTANCE to find the answer

Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50

Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50
Solution: B
Validation error is the least when the value of k is 10.
So it is best to use this value of k

Interview Question
Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression

Interview Question
Which of the following option is true about k-NN algorithm?
It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Solution: C
We can also use k-NN for regression problems. In this case the prediction can be based on
the mean or the median of the k-most similar instances.

Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above

Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very
large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
The above mentioned statements are assumptions of K-NN algorithm

Interview Question
Which of the following machine learning algorithm can be used for imputing
missing values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression

Interview Question
Which of the following machine learning algorithm can be used for imputing missing
values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression
Solution: A
k-NN algorithm can be used for imputing missing value of both categorical and continuous
variables.

Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance

Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance
Solution: A
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas
hamming distance is used in case of categorical variable.

Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they
deployed this model on client side it has been found that the model is not at all accurate. Which
of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the
model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these

Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed
this model on client side it has been found that the model is not at all accurate. Which of the
following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the model
performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to
give the same results on a new data.

k-NN Algorithm.pptx

More Related Content

Similar to k-NN Algorithm.pptx (20)

Recently uploaded (20)

k-NN Algorithm.pptx