Intro to k-NN
USED IN CLASSIFICATION PROBLEMS
We all love
watching
movies
2
Lets Talk about the Movies
Identify the Genre and Group them
Cluster – Action Vs Comedy
Action
Comedy
Comedy &
Action
Hotstar Page – a Good Example of K-NN
Thriller | Action | Drama | Romance | Comedy
Nearest Neighbor Classification
 The Nearest Neighbors are defined by their characteristics of
class and using them to classify the unlabeled set of data
 Suitable for Classification Tasks where the relationship
between features and target class are numerous, complex and
extremely difficult to understand.
 Computer Vision Applications
 Optical Character Recognition
 Predicting if a person will enjoy music or a movie based on recommendations
 Patterns in Genetic Data
 Detecting Diseases
Identify the Class of Red Star
Identify the Class of Red Star
What k-
NN does
Whether a new point should be
Classified as Red Point or Green Point,
This is where k-NN comes.
Euclidean Distance
Default Value of ‘K’ is 05
Brute Force
How do we choose neighbours?
Ans. Brute Force
Lets consider for simple case with two dimension plot. If we
look mathematically, the simple intuition is to calculate the
Euclidean distance from point of interest ( of whose class
we need to determine) to all the points in training set.
Then we take class with majority points. This is called Brute
Force method.
Remember that Brute Force performs worst when there are
large dimensions and large training sets. With larger
dimensions, it will take longer time. This is called the “curse
of dimensionality”.
k-NN Algorithm.pptx
k-NN Ground Realities
Blind Taste Experience case study
 Blind Taste Experience involves some people going in a restaurant and tasting food in darkness.
 In Mystery Meal, people are asked to mark the food on two parameters – Crunchy and Sweet
 Scale used – 1 to 10 (10 being highest and 01 being lowest)
 The food products are labeled as follows:
Tomato Family
 Notice the pattern of Veggies,
Fruits and Proteins
 Locating the tomato’s nearest
neighbor requires a distance
formula.
 k-NN uses EUCLIDEAN
DISTANCE to find the answer
k-NN Algorithm.pptx
How does It do that??
Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50
Interview Questions
In the given image, which would be the best value for
k assuming that the algorithm you are using is k-
Nearest Neighbour.
A) 3
B) 10
C) 20
D) 50
Solution: B
Validation error is the least when the value of k is 10.
So it is best to use this value of k
Interview Question
Which of the following option is true about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Interview Question
Which of the following option is true about k-NN algorithm?
It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Solution: C
We can also use k-NN for regression problems. In this case the prediction can be based on
the mean or the median of the k-most similar instances.
Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Interview Question
Which of the following statement is true about k-NN algorithm?
A) K-NN performs much better if all of the data have the same scale
B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very
large
C) K-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
The above mentioned statements are assumptions of K-NN algorithm
Interview Question
Which of the following machine learning algorithm can be used for imputing
missing values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression
Interview Question
Which of the following machine learning algorithm can be used for imputing missing
values of both categorical and continuous variables?
A) k-NN
B) Linear Regression
C) Logistics Regression
Solution: A
k-NN algorithm can be used for imputing missing value of both categorical and continuous
variables.
Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance
Interview Question
Which of the following distance measure do we use in case of categorical variables in k-NN?
A) Hamming Distance
B) Euclidean Distance
C) Manhattan Distance
Solution: A
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas
hamming distance is used in case of categorical variable.
Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they
deployed this model on client side it has been found that the model is not at all accurate. Which
of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the
model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Interview Question
A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed
this model on client side it has been found that the model is not at all accurate. Which of the
following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except the model
performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to
give the same results on a new data.

More Related Content

PPTX
PDF
k-nearest neighbour Machine Learning.pdf
PPTX
k-nearest neighbour Machine Learning.pptx
PDF
pml.pdf
PDF
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
PDF
Bb25322324
PPTX
Statistical Machine Learning unit3 lecture notes
PPTX
K-Nearest Neighbor(KNN)
k-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pptx
pml.pdf
tghteh ddh4eth rtnrtrgthgh12500123196.pdf
Bb25322324
Statistical Machine Learning unit3 lecture notes
K-Nearest Neighbor(KNN)

Similar to k-NN Algorithm.pptx (20)

PDF
kmeans.pdfbgfyih bhdsey jbct gtfryjvftyjhgtuugft
PPTX
KNN Classificationwithexplanation and examples.pptx
PDF
Real-world News Recommender Systems
PDF
SVD and the Netflix Dataset
PPTX
03_Supervised_Classification (1).pptxxxx
PPTX
K- Nearest Neighbour Algorithm.pptx
PPTX
KNN Classifier
PDF
LCBM: Statistics-Based Parallel Collaborative Filtering
PPTX
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
PDF
Machine Learning Algorithm - KNN
PPTX
K- Nearest Neighbor Approach
PDF
Lec14: Evaluation Framework for Medical Image Segmentation
PPT
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
PDF
Lecture 6 - Classification Classification
PPTX
KNN.pptx
PDF
Introduction to k-Nearest Neighbors and Amazon SageMaker
PPT
Download
PPT
Download
PDF
K - Nearest neighbor ( KNN )
PDF
Matt gershoff
kmeans.pdfbgfyih bhdsey jbct gtfryjvftyjhgtuugft
KNN Classificationwithexplanation and examples.pptx
Real-world News Recommender Systems
SVD and the Netflix Dataset
03_Supervised_Classification (1).pptxxxx
K- Nearest Neighbour Algorithm.pptx
KNN Classifier
LCBM: Statistics-Based Parallel Collaborative Filtering
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
Machine Learning Algorithm - KNN
K- Nearest Neighbor Approach
Lec14: Evaluation Framework for Medical Image Segmentation
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Lecture 6 - Classification Classification
KNN.pptx
Introduction to k-Nearest Neighbors and Amazon SageMaker
Download
Download
K - Nearest neighbor ( KNN )
Matt gershoff
Ad

Recently uploaded (20)

PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Microsoft 365 products and services descrption
PPTX
Managing Community Partner Relationships
PPTX
Steganography Project Steganography Project .pptx
PDF
Introduction to the R Programming Language
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
chrmotography.pptx food anaylysis techni
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
Optimise Shopper Experiences with a Strong Data Estate.pdf
IMPACT OF LANDSLIDE.....................
Microsoft 365 products and services descrption
Managing Community Partner Relationships
Steganography Project Steganography Project .pptx
Introduction to the R Programming Language
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
modul_python (1).pptx for professional and student
SET 1 Compulsory MNH machine learning intro
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
chrmotography.pptx food anaylysis techni
CYBER SECURITY the Next Warefare Tactics
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Navigating the Thai Supplements Landscape.pdf
[EN] Industrial Machine Downtime Prediction
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
DU, AIS, Big Data and Data Analytics.ppt
Ad

k-NN Algorithm.pptx

  • 1. Intro to k-NN USED IN CLASSIFICATION PROBLEMS
  • 3. Lets Talk about the Movies
  • 4. Identify the Genre and Group them
  • 5. Cluster – Action Vs Comedy Action Comedy Comedy & Action
  • 6. Hotstar Page – a Good Example of K-NN Thriller | Action | Drama | Romance | Comedy
  • 7. Nearest Neighbor Classification  The Nearest Neighbors are defined by their characteristics of class and using them to classify the unlabeled set of data  Suitable for Classification Tasks where the relationship between features and target class are numerous, complex and extremely difficult to understand.  Computer Vision Applications  Optical Character Recognition  Predicting if a person will enjoy music or a movie based on recommendations  Patterns in Genetic Data  Detecting Diseases
  • 8. Identify the Class of Red Star
  • 9. Identify the Class of Red Star
  • 10. What k- NN does Whether a new point should be Classified as Red Point or Green Point, This is where k-NN comes.
  • 12. Brute Force How do we choose neighbours? Ans. Brute Force Lets consider for simple case with two dimension plot. If we look mathematically, the simple intuition is to calculate the Euclidean distance from point of interest ( of whose class we need to determine) to all the points in training set. Then we take class with majority points. This is called Brute Force method. Remember that Brute Force performs worst when there are large dimensions and large training sets. With larger dimensions, it will take longer time. This is called the “curse of dimensionality”.
  • 15. Blind Taste Experience case study  Blind Taste Experience involves some people going in a restaurant and tasting food in darkness.  In Mystery Meal, people are asked to mark the food on two parameters – Crunchy and Sweet  Scale used – 1 to 10 (10 being highest and 01 being lowest)  The food products are labeled as follows:
  • 16. Tomato Family  Notice the pattern of Veggies, Fruits and Proteins  Locating the tomato’s nearest neighbor requires a distance formula.  k-NN uses EUCLIDEAN DISTANCE to find the answer
  • 18. How does It do that??
  • 19. Interview Questions In the given image, which would be the best value for k assuming that the algorithm you are using is k- Nearest Neighbour. A) 3 B) 10 C) 20 D) 50
  • 20. Interview Questions In the given image, which would be the best value for k assuming that the algorithm you are using is k- Nearest Neighbour. A) 3 B) 10 C) 20 D) 50 Solution: B Validation error is the least when the value of k is 10. So it is best to use this value of k
  • 21. Interview Question Which of the following option is true about k-NN algorithm? A) It can be used for classification B) It can be used for regression C) It can be used in both classification and regression
  • 22. Interview Question Which of the following option is true about k-NN algorithm? It can be used for classification B) It can be used for regression C) It can be used in both classification and regression Solution: C We can also use k-NN for regression problems. In this case the prediction can be based on the mean or the median of the k-most similar instances.
  • 23. Interview Question Which of the following statement is true about k-NN algorithm? A) K-NN performs much better if all of the data have the same scale B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large C) K-NN makes no assumptions about the functional form of the problem being solved A) 1 and 2 B) 1 and 3 C) Only 1 D) All of the above
  • 24. Interview Question Which of the following statement is true about k-NN algorithm? A) K-NN performs much better if all of the data have the same scale B) K-NN works well with a small number of input variables (p), but struggles when the number of inputs is very large C) K-NN makes no assumptions about the functional form of the problem being solved A) 1 and 2 B) 1 and 3 C) Only 1 D) All of the above Solution: D The above mentioned statements are assumptions of K-NN algorithm
  • 25. Interview Question Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) k-NN B) Linear Regression C) Logistics Regression
  • 26. Interview Question Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? A) k-NN B) Linear Regression C) Logistics Regression Solution: A k-NN algorithm can be used for imputing missing value of both categorical and continuous variables.
  • 27. Interview Question Which of the following distance measure do we use in case of categorical variables in k-NN? A) Hamming Distance B) Euclidean Distance C) Manhattan Distance
  • 28. Interview Question Which of the following distance measure do we use in case of categorical variables in k-NN? A) Hamming Distance B) Euclidean Distance C) Manhattan Distance Solution: A Both Euclidean and Manhattan distances are used in case of continuous variables, whereas hamming distance is used in case of categorical variable.
  • 29. Interview Question A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong? Note: Model has successfully deployed and no technical issues are found at client side except the model performance A) It is probably a overfitted model B) It is probably a underfitted model C) Can’t say D) None of these
  • 30. Interview Question A company has build a kNN classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is not at all accurate. Which of the following thing might gone wrong? Note: Model has successfully deployed and no technical issues are found at client side except the model performance A) It is probably a overfitted model B) It is probably a underfitted model C) Can’t say D) None of these Solution: A In an overfitted module, it seems to be performing well on training data, but it is not generalized enough to give the same results on a new data.