SlideShare a Scribd company logo
Introduction to
Classification
October 9, 2019
Amit Praseed Classification October 9, 2019 1 / 30
Introduction
The question of whether machines can learn from experience and em-
ulate humans has always intrigued researchers.
The Turing Test (1950) : “test of a machine’s ability to exhibit in-
telligent behaviour equivalent to, or indistinguishable from, that of a
human”
Led to the development of several mechanisms to block automated ac-
cesses, such as CAPTCHAs.
Google (2014) demonstrated that their algorithms could defeat CAPTCHAs
with 99.8% accuracy !!!
Amit Praseed Classification October 9, 2019 2 / 30
Types of Learning
Supervised Learning
Labelled data (data+class la-
bels) is provided as input to
the system.
When a new unlabelled ex-
ample is provided to the sys-
tem, it maps it to a class
based on the examples it has
encountered.
Eg: Classification
Amit Praseed Classification October 9, 2019 3 / 30
Types of Learning
Unsupervised Learning
Unlabelled data is provided
as input to the system.
The system identifies pat-
terns in the data and creates
internal groupings.
Eg: Clustering
Amit Praseed Classification October 9, 2019 4 / 30
Basic Idea of Classification
Input : Data set X = {x1, x2, ...xn} and associated Label set Y =
{y1, y2, ...yn}
Learning: Identify a function / procedure based on X and Y.
f (x) = y
Testing: Given a new input x , predict the class label
f (x ) = y
Amit Praseed Classification October 9, 2019 5 / 30
Features and Feature Vectors
Each data item used in classification is represented by using its fea-
tures.
The array representing all of a data item’s features and the correspond-
ing values is called a feature vector.
Eg: One of the popular open-source data sets Iris contains information
about 50 samples of flowers belonging to three classes. For each of the
samples, four features are measured - the length and width of sepals
and petals. A particular sample may look like [5.1, 3.5, 1.4, 0.2], so the
sample is said to have 4 dimensions.
This means that every data item can be represented as a point
in an n-dimensional data space.
Amit Praseed Classification October 9, 2019 6 / 30
Geometric View of Classification
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure
Amit Praseed Classification October 9, 2019 7 / 30
The Nearest Neighbour Approach - Example 1
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure
Amit Praseed Classification October 9, 2019 8 / 30
The Nearest Neighbour Approach - Example 2
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure
Amit Praseed Classification October 9, 2019 9 / 30
The Nearest Neighbour Approach - Example 2
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure
Amit Praseed Classification October 9, 2019 10 / 30
k Nearest Neighbours (kNN) Approach
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure: k=3
Amit Praseed Classification October 9, 2019 11 / 30
k Nearest Neighbours Approach
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
x
y
Figure
Amit Praseed Classification October 9, 2019 12 / 30
Advantages and Disadvantages of kNN
Advantages
Simple and easy to implement
Lazy Learner - No training
phase required
Only two parameters - k and the
distance measure
Disadvantages
High complexity for large
datasets and data of higher
dimensions
Doesn’t work well with categor-
ical data
Amit Praseed Classification October 9, 2019 13 / 30
Scalability of kNN
kNN classifier has a complexity of O(nd + nk), where n is the number
of data points and d is the number of attributes or dimensions.
Different mechanisms can be used to reduce the complexity of kNN
especially for large datasets.
Parallelization
Exact Space Partitioning
KD Trees
Ball Trees
Cover Trees
Approximate Neighbour Search
Space Partitioning Trees
Nearest Neighbour Graphs
Locality Sensitive Hashing
Dimensionality Reduction
Feature Extraction
Feature Selection
Amit Praseed Classification October 9, 2019 14 / 30
Space Partitioning using kd-Trees
Similar to binary search trees.
Each element in a kd Tree is a multidimensional vector in itself.
Each level is aligned along one particular dimension and splits the search
space.
Each node is confined to a particular bounding box in the search space.
Approximate neighbour search restricts itself to a single bounding box,
thus reducing complexity.
Exact neighbour search can get a bit more computationally expensive.
Amit Praseed Classification October 9, 2019 15 / 30
kd-Tree Construction
51,75
Amit Praseed Classification October 9, 2019 16 / 30
kd-Tree Construction
(51,75)
(25,40)
Amit Praseed Classification October 9, 2019 17 / 30
kd-Tree Construction
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Amit Praseed Classification October 9, 2019 18 / 30
NN-Search for Query (1,5)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Amit Praseed Classification October 9, 2019 19 / 30
NN-Search for Query (1,5)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Current NN distance = 5 (to (1,10))
Amit Praseed Classification October 9, 2019 20 / 30
NN-Search for Query (12,33)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Amit Praseed Classification October 9, 2019 21 / 30
NN-Search for Query (12,33)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Current NN distance =
√
13 =3.6055
(to (10,30))
Amit Praseed Classification October 9, 2019 22 / 30
NN-Search for Query (50,2)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Amit Praseed Classification October 9, 2019 23 / 30
NN-Search for Query (50,2)
(51,75)
(25,40)
(10,30)
(1,10)
(50,50)
(70,70)
(55,1) (60,80)
Initial NN distance =
√
2384 =48.83 (to (10,30))
Final NN distance =
√
26 =5.1 (to (55,1))
Amit Praseed Classification October 9, 2019 24 / 30

More Related Content

PPTX
Data visualization with R
PPTX
Chapter 02 mis
DOC
Assessment sheet 12.2
PPTX
Data visualization using R
PPTX
Data handling -
PPT
The science behind predictive analytics a text mining perspective
PPT
Chapter 2
PDF
discussion on Bayesian restricted likelihood
Data visualization with R
Chapter 02 mis
Assessment sheet 12.2
Data visualization using R
Data handling -
The science behind predictive analytics a text mining perspective
Chapter 2
discussion on Bayesian restricted likelihood

Similar to Introduction to Classification (20)

PDF
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
PDF
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
PDF
A Method for Vibration Testing Decision Tree-Based Classification Systems.
PDF
Gloeocercospora sorghiGloeocercospora sorghi
PPTX
Data Mining Primitives, Languages & Systems
PDF
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
PDF
Research scholars evaluation based on guides view
PDF
Research scholars evaluation based on guides view using id3
DOCX
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
PDF
Classifier Model using Artificial Neural Network
PDF
cs551_intro.pdf
PDF
A Comparative Study Of Various Clustering Algorithms In Data Mining
PDF
Cg33504508
PDF
Premeditated Initial Points for K-Means Clustering
PDF
Ways to Extract Variable Insights when Data is Scarse
DOCX
Sherlock a deep learning approach to semantic data type dete
PDF
E1062530
PDF
Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...
PDF
A Framework and Infrastructure for Uncertainty Quantification and Management ...
PDF
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
A Method for Vibration Testing Decision Tree-Based Classification Systems.
Gloeocercospora sorghiGloeocercospora sorghi
Data Mining Primitives, Languages & Systems
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
Research scholars evaluation based on guides view
Research scholars evaluation based on guides view using id3
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Classifier Model using Artificial Neural Network
cs551_intro.pdf
A Comparative Study Of Various Clustering Algorithms In Data Mining
Cg33504508
Premeditated Initial Points for K-Means Clustering
Ways to Extract Variable Insights when Data is Scarse
Sherlock a deep learning approach to semantic data type dete
E1062530
Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenri...
A Framework and Infrastructure for Uncertainty Quantification and Management ...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
Ad

More from amitpraseed (7)

PDF
Decision Trees
PDF
Support Vector Machines (SVM)
PDF
Principal Component Analysis
PDF
Perceptron Learning
PDF
Dimensionality Reduction
PDF
Convolutional Neural Networks
PDF
Bayesianclassifiers
Decision Trees
Support Vector Machines (SVM)
Principal Component Analysis
Perceptron Learning
Dimensionality Reduction
Convolutional Neural Networks
Bayesianclassifiers
Ad

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Institutional Correction lecture only . . .
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
master seminar digital applications in india
PDF
Microbial disease of the cardiovascular and lymphatic systems
GDM (1) (1).pptx small presentation for students
O5-L3 Freight Transport Ops (International) V1.pdf
Pharma ospi slides which help in ospi learning
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Institutional Correction lecture only . . .
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Complications of Minimal Access Surgery at WLH
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
RMMM.pdf make it easy to upload and study
Module 4: Burden of Disease Tutorial Slides S2 2025
master seminar digital applications in india
Microbial disease of the cardiovascular and lymphatic systems

Introduction to Classification

  • 1. Introduction to Classification October 9, 2019 Amit Praseed Classification October 9, 2019 1 / 30
  • 2. Introduction The question of whether machines can learn from experience and em- ulate humans has always intrigued researchers. The Turing Test (1950) : “test of a machine’s ability to exhibit in- telligent behaviour equivalent to, or indistinguishable from, that of a human” Led to the development of several mechanisms to block automated ac- cesses, such as CAPTCHAs. Google (2014) demonstrated that their algorithms could defeat CAPTCHAs with 99.8% accuracy !!! Amit Praseed Classification October 9, 2019 2 / 30
  • 3. Types of Learning Supervised Learning Labelled data (data+class la- bels) is provided as input to the system. When a new unlabelled ex- ample is provided to the sys- tem, it maps it to a class based on the examples it has encountered. Eg: Classification Amit Praseed Classification October 9, 2019 3 / 30
  • 4. Types of Learning Unsupervised Learning Unlabelled data is provided as input to the system. The system identifies pat- terns in the data and creates internal groupings. Eg: Clustering Amit Praseed Classification October 9, 2019 4 / 30
  • 5. Basic Idea of Classification Input : Data set X = {x1, x2, ...xn} and associated Label set Y = {y1, y2, ...yn} Learning: Identify a function / procedure based on X and Y. f (x) = y Testing: Given a new input x , predict the class label f (x ) = y Amit Praseed Classification October 9, 2019 5 / 30
  • 6. Features and Feature Vectors Each data item used in classification is represented by using its fea- tures. The array representing all of a data item’s features and the correspond- ing values is called a feature vector. Eg: One of the popular open-source data sets Iris contains information about 50 samples of flowers belonging to three classes. For each of the samples, four features are measured - the length and width of sepals and petals. A particular sample may look like [5.1, 3.5, 1.4, 0.2], so the sample is said to have 4 dimensions. This means that every data item can be represented as a point in an n-dimensional data space. Amit Praseed Classification October 9, 2019 6 / 30
  • 7. Geometric View of Classification 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure Amit Praseed Classification October 9, 2019 7 / 30
  • 8. The Nearest Neighbour Approach - Example 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure Amit Praseed Classification October 9, 2019 8 / 30
  • 9. The Nearest Neighbour Approach - Example 2 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure Amit Praseed Classification October 9, 2019 9 / 30
  • 10. The Nearest Neighbour Approach - Example 2 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure Amit Praseed Classification October 9, 2019 10 / 30
  • 11. k Nearest Neighbours (kNN) Approach 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure: k=3 Amit Praseed Classification October 9, 2019 11 / 30
  • 12. k Nearest Neighbours Approach 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 x y Figure Amit Praseed Classification October 9, 2019 12 / 30
  • 13. Advantages and Disadvantages of kNN Advantages Simple and easy to implement Lazy Learner - No training phase required Only two parameters - k and the distance measure Disadvantages High complexity for large datasets and data of higher dimensions Doesn’t work well with categor- ical data Amit Praseed Classification October 9, 2019 13 / 30
  • 14. Scalability of kNN kNN classifier has a complexity of O(nd + nk), where n is the number of data points and d is the number of attributes or dimensions. Different mechanisms can be used to reduce the complexity of kNN especially for large datasets. Parallelization Exact Space Partitioning KD Trees Ball Trees Cover Trees Approximate Neighbour Search Space Partitioning Trees Nearest Neighbour Graphs Locality Sensitive Hashing Dimensionality Reduction Feature Extraction Feature Selection Amit Praseed Classification October 9, 2019 14 / 30
  • 15. Space Partitioning using kd-Trees Similar to binary search trees. Each element in a kd Tree is a multidimensional vector in itself. Each level is aligned along one particular dimension and splits the search space. Each node is confined to a particular bounding box in the search space. Approximate neighbour search restricts itself to a single bounding box, thus reducing complexity. Exact neighbour search can get a bit more computationally expensive. Amit Praseed Classification October 9, 2019 15 / 30
  • 16. kd-Tree Construction 51,75 Amit Praseed Classification October 9, 2019 16 / 30
  • 17. kd-Tree Construction (51,75) (25,40) Amit Praseed Classification October 9, 2019 17 / 30
  • 19. NN-Search for Query (1,5) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Amit Praseed Classification October 9, 2019 19 / 30
  • 20. NN-Search for Query (1,5) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Current NN distance = 5 (to (1,10)) Amit Praseed Classification October 9, 2019 20 / 30
  • 21. NN-Search for Query (12,33) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Amit Praseed Classification October 9, 2019 21 / 30
  • 22. NN-Search for Query (12,33) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Current NN distance = √ 13 =3.6055 (to (10,30)) Amit Praseed Classification October 9, 2019 22 / 30
  • 23. NN-Search for Query (50,2) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Amit Praseed Classification October 9, 2019 23 / 30
  • 24. NN-Search for Query (50,2) (51,75) (25,40) (10,30) (1,10) (50,50) (70,70) (55,1) (60,80) Initial NN distance = √ 2384 =48.83 (to (10,30)) Final NN distance = √ 26 =5.1 (to (55,1)) Amit Praseed Classification October 9, 2019 24 / 30