SlideShare a Scribd company logo
Machine Learning:
Machine Learning:
k-Nearest Neighbor and
k-Nearest Neighbor and
Support Vector Machines
Support Vector Machines
skim 20.4, 20.6-20.7
skim 20.4, 20.6-20.7
CMSC 471
CMSC 471
Revised End-of-Semester Schedule
Revised End-of-Semester Schedule
 Wed 11/21
Wed 11/21 Machine Learning IV
Machine Learning IV
 Mon 11/26
Mon 11/26 Philosophy of AI
Philosophy of AI (You must read the three articles!)
(You must read the three articles!)
 Wed 11/28
Wed 11/28 Special Topics
Special Topics
 Mon 12/3
Mon 12/3 Special Topics
Special Topics
 Wed 12/5
Wed 12/5 Review / Tournament dry run #2
Review / Tournament dry run #2 (HW6 due)
(HW6 due)
 Mon 12/10
Mon 12/10 Tournament
Tournament
 Wed 12/19
Wed 12/19 FINAL EXAM
FINAL EXAM (1:00pm - 3:00pm) (Project and final report due)
(1:00pm - 3:00pm) (Project and final report due)
NO LATE SUBMISSIONS ALLOWED!
NO LATE SUBMISSIONS ALLOWED!
 Special Topics
Special Topics
 Robotics
Robotics
 AI in Games
AI in Games
 Natural language processing
Natural language processing
 Multi-agent systems
Multi-agent systems
k
k-Nearest Neighbor
-Nearest Neighbor
Instance-Based Learning
Instance-Based Learning
Some material adapted from slides by Andrew Moore, CMU.
Some material adapted from slides by Andrew Moore, CMU.
Visit
Visit http://www.autonlab.org/tutorials/
http://www.autonlab.org/tutorials/ for
for
Andrew’s repository of Data Mining tutorials.
Andrew’s repository of Data Mining tutorials.
1-Nearest Neighbor
1-Nearest Neighbor
 One of the simplest of all machine
One of the simplest of all machine
learning classifiers
learning classifiers
 Simple idea: label a new point the same as
Simple idea: label a new point the same as
the closest known point
the closest known point
Label it red.
1-Nearest Neighbor
1-Nearest Neighbor
 A type of instance-based learning
A type of instance-based learning
 Also known as “memory-based” learning
Also known as “memory-based” learning
 Forms a Voronoi tessellation of the
Forms a Voronoi tessellation of the
instance space
instance space
Distance Metrics
Distance Metrics
 Different metrics can change the decision surface
Different metrics can change the decision surface
 Standard Euclidean distance metric:
Standard Euclidean distance metric:
 Two-dimensional: Dist(a,b) = sqrt((a
Two-dimensional: Dist(a,b) = sqrt((a1
1 – b
– b1
1)
)2
2
+ (a
+ (a2
2 – b
– b2
2)
)2
2
)
)
 Multivariate: Dist(a,b) = sqrt(∑ (a
Multivariate: Dist(a,b) = sqrt(∑ (ai
i – b
– bi
i)
)2
2
)
)
Dist(a,b) =(a1 – b1)2
+ (a2 – b2)2
Dist(a,b) =(a1 – b1)2
+ (3a2 – 3b2)2
Adapted from “Instance-Based Learning”
lecture slides by Andrew Moore, CMU.
Four Aspects of an
Four Aspects of an
Instance-Based Learner:
Instance-Based Learner:
1. A distance metric
2. How many nearby neighbors to look at?
3. A weighting function (optional)
4. How to fit with the local points?
Adapted from “Instance-Based Learning”
lecture slides by Andrew Moore, CMU.
1-NN’s Four Aspects as an
1-NN’s Four Aspects as an
Instance-Based Learner:
Instance-Based Learner:
1. A distance metric
 Euclidian
2. How many nearby neighbors to look at?
 One
3. A weighting function (optional)
 Unused
4. How to fit with the local points?
 Just predict the same output as the nearest
neighbor.
Adapted from “Instance-Based Learning”
lecture slides by Andrew Moore, CMU.
Zen Gardens
Zen Gardens
Mystery of renowned zen garden revealed [CNN Article]
Thursday, September 26, 2002 Posted: 10:11 AM EDT (1411 GMT)
LONDON (Reuters) -- For centuries visitors to the renowned Ryoanji Temple garden in
Kyoto, Japan have been entranced and mystified by the simple arrangement of rocks.
The five sparse clusters on a rectangle of raked gravel are said to be pleasing to the eyes
of the hundreds of thousands of tourists who visit the garden each year.
Scientists in Japan said on Wednesday they now believe they have discovered its
mysterious appeal.
"We have uncovered the implicit structure of the Ryoanji garden's visual ground and
have shown that it includes an abstract, minimalist depiction of natural scenery," said
Gert Van Tonder of Kyoto University.
The researchers discovered that the empty space of the garden evokes a hidden image of a
branching tree that is sensed by the unconscious mind.
"We believe that the unconscious perception of this pattern contributes to the enigmatic
appeal of the garden," Van Tonder added.
He and his colleagues believe that whoever created the garden during the Muromachi era
between 1333-1573 knew exactly what they were doing and placed the rocks around the
tree image.
By using a concept called medial-axis transformation, the scientists showed that the
hidden branched tree converges on the main area from which the garden is viewed.
The trunk leads to the prime viewing site in the ancient temple that once overlooked the
garden. It is thought that abstract art may have a similar impact.
"There is a growing realisation that scientific analysis can reveal unexpected structural
features hidden in controversial abstract paintings," Van Tonder said
Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
k – Nearest Neighbor
k – Nearest Neighbor
 Generalizes 1-NN to smooth away noise
Generalizes 1-NN to smooth away noise
in the labels
in the labels
 A new point is now assigned the most
A new point is now assigned the most
frequent label of its
frequent label of its k
k nearest neighbors
nearest neighbors
Label it red, when k = 3
Label it blue, when k = 7
k-Nearest Neighbor (k = 9)
k-Nearest Neighbor (k = 9)
A magnificent job of
noise smoothing.
Three cheers for 9-
nearest-neighbor.
But the lack of
gradients and the
jerkiness isn’t good.
Appalling behavior!
Loses all the detail
that 1-nearest
neighbor would give.
The tails are horrible!
Fits much less of the
noise, captures trends.
But still, frankly,
pathetic compared
with linear regression.
Adapted from “Instance-Based Learning”
lecture slides by Andrew Moore, CMU.
Support Vector
Support Vector
Machines and Kernels
Machines and Kernels
Adapted from slides by Tim Oates
Adapted from slides by Tim Oates
Cognition, Robotics, and Learning (CORAL) Lab
Cognition, Robotics, and Learning (CORAL) Lab
University of Maryland Baltimore County
University of Maryland Baltimore County
Doing
Doing Really
Really Well with
Well with
Linear Decision Surfaces
Linear Decision Surfaces
Outline
Outline
 Prediction
Prediction
 Why might predictions be wrong?
Why might predictions be wrong?
 Support vector machines
Support vector machines
 Doing really well with linear models
Doing really well with linear models
 Kernels
Kernels
 Making the non-linear linear
Making the non-linear linear
Supervised ML = Prediction
Supervised ML = Prediction
 Given training instances (x,y)
Given training instances (x,y)
 Learn a model f
Learn a model f
 Such that f(x) = y
Such that f(x) = y
 Use f to predict y for new x
Use f to predict y for new x
 Many variations on this basic theme
Many variations on this basic theme
Why might predictions be wrong?
Why might predictions be wrong?
 True Non-Determinism
True Non-Determinism
 Flip a biased coin
Flip a biased coin
 p(
p(heads
heads) =
) = 

 Estimate
Estimate 

 If
If 
 > 0.5 predict
> 0.5 predict heads
heads, else
, else tails
tails
 Lots of ML research on problems like this
Lots of ML research on problems like this
 Learn a model
Learn a model
 Do the best you can in expectation
Do the best you can in expectation
Why might predictions be wrong?
Why might predictions be wrong?
 Partial Observability
Partial Observability
 Something needed to predict y is missing
Something needed to predict y is missing
from observation x
from observation x
 N-bit parity problem
N-bit parity problem
 x contains N-1 bits (hard PO)
x contains N-1 bits (hard PO)
 x contains N bits but learner ignores some of them
x contains N bits but learner ignores some of them
(soft PO)
(soft PO)
Why might predictions be wrong?
Why might predictions be wrong?
 True non-determinism
True non-determinism
 Partial observability
Partial observability
 hard, soft
hard, soft
 Representational bias
Representational bias
 Algorithmic bias
Algorithmic bias
 Bounded resources
Bounded resources
Representational Bias
Representational Bias
 Having the right features (x) is crucial
Having the right features (x) is crucial
X
O
O O O X
X
X
X
O
O
O
O
X
X
X
Support Vector
Support Vector
Machines
Machines
Doing
Doing Really
Really Well with Linear
Well with Linear
Decision Surfaces
Decision Surfaces
Strengths of SVMs
Strengths of SVMs
 Good generalization in theory
Good generalization in theory
 Good generalization in practice
Good generalization in practice
 Work well with few training instances
Work well with few training instances
 Find globally best model
Find globally best model
 Efficient algorithms
Efficient algorithms
 Amenable to the kernel trick
Amenable to the kernel trick
Linear Separators
Linear Separators
 Training instances
Training instances
 x
x 
 
n
n
 y
y 
 {-1, 1}
{-1, 1}
 w
w 
 
n
n
 b
b 
 

 Hyperplane
Hyperplane
 <w, x> + b = 0
<w, x> + b = 0
 w
w1
1x
x1
1 + w
+ w2
2x
x2
2 … + w
… + wn
nx
xn
n + b = 0
+ b = 0
 Decision function
Decision function
 f(x) = sign(<w, x> + b)
f(x) = sign(<w, x> + b)
Math Review
Math Review
Inner (dot) product:
Inner (dot) product:
<a, b> = a · b = ∑ a
<a, b> = a · b = ∑ ai
i*b
*bi
i
= a
= a1
1b
b1
1 + a
+ a2
2b
b2
2 + …+a
+ …+an
nb
bn
n
Intuitions
Intuitions
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Intuitions
Intuitions
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Intuitions
Intuitions
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Intuitions
Intuitions
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
A “Good” Separator
A “Good” Separator
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Noise in the Observations
Noise in the Observations
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Ruling Out Some Separators
Ruling Out Some Separators
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Lots of Noise
Lots of Noise
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Maximizing the Margin
Maximizing the Margin
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
“
“Fat” Separators
Fat” Separators
X
X
O
O
O
O
O
O
X
X
X
X
X
X
O
O
Support Vectors
Support Vectors
X
X
O
O
O
O
O
O
O
O
X
X
X
X
X
X
The Math
The Math
 Training instances
Training instances
 x
x 
 
n
n
 y
y 
 {-1, 1}
{-1, 1}
 Decision function
Decision function
 f(x) = sign(<w,x> + b)
f(x) = sign(<w,x> + b)
 w
w 
 
n
n
 b
b 
 

 Find w and b that
Find w and b that
 Perfectly classify training instances
Perfectly classify training instances
 Assuming linear separability
Assuming linear separability
 Maximize margin
Maximize margin
The Math
The Math
 For perfect classification, we want
For perfect classification, we want
 y
yi
i (<w,x
(<w,xi
i> + b) ≥ 0 for all i
> + b) ≥ 0 for all i
 Why?
Why?
 To maximize the margin, we want
To maximize the margin, we want
 w that minimizes |w|
w that minimizes |w|2
2
Dual Optimization Problem
Dual Optimization Problem
 Maximize over
Maximize over 

 W(
W(
) =
) = 
i
i 
i
i - 1/2
- 1/2 
i,j
i,j 
i
i 
j
j y
yi
i y
yj
j <x
<xi
i, x
, xj
j>
>
 Subject to
Subject to
 
i
i 
 0
0
 
i
i 
i
i y
yi
i = 0
= 0
 Decision function
Decision function
 f(x) = sign(
f(x) = sign(
i
i 
i
i y
yi
i <x, x
<x, xi
i> + b)
> + b)
Strengths of SVMs
Strengths of SVMs
 Good generalization in theory
Good generalization in theory
 Good generalization in practice
Good generalization in practice
 Work well with few training instances
Work well with few training instances
 Find globally best model
Find globally best model
 Efficient algorithms
Efficient algorithms
 Amenable to the kernel trick …
Amenable to the kernel trick …
What if Surface is Non-Linear?
What if Surface is Non-Linear?
X
X
X
X
X
X
O O
O
O
O
O O
O
O
O
O
O
O
O O
O
O
O O
O
Image from http://www.atrandomresearch.com/iclass/
Kernel Methods
Kernel Methods
Making the Non-Linear Linear
Making the Non-Linear Linear
When Linear Separators Fail
When Linear Separators Fail
X
O
O O O X
X
X x1
x2
X
O
O O O
X
X
X
x1
x1
2
Mapping into a New Feature Space
Mapping into a New Feature Space
 Rather than run SVM on x
Rather than run SVM on xi
i, run it on
, run it on 
(x
(xi
i)
)
 Find non-linear separator in input space
Find non-linear separator in input space
 What if
What if 
(x
(xi
i) is really big?
) is really big?
 Use kernels to compute it implicitly!
Use kernels to compute it implicitly!

 : x
: x 
 X =
X = 
(x)
(x)

(x
(x1
1,x
,x2
2) = (x
) = (x1
1,x
,x2
2,x
,x1
1
2
2
,x
,x2
2
2
2
,x
,x1
1x
x2
2)
)
Image from http://web.engr.oregonstate.edu/
~afern/classes/cs534/
Kernels
Kernels
 Find kernel K such that
Find kernel K such that
 K(x
K(x1
1,x
,x2
2) = <
) = < 
(x
(x1
1),
), 
(x
(x2
2)>
)>
 Computing K(x
Computing K(x1
1,x
,x2
2) should be efficient, much
) should be efficient, much
more so than computing
more so than computing 
(x
(x1
1) and
) and 
(x
(x2
2)
)
 Use K(x
Use K(x1
1,x
,x2
2) in SVM algorithm rather than
) in SVM algorithm rather than
<x
<x1
1,x
,x2
2>
>
 Remarkably, this is possible
Remarkably, this is possible
The Polynomial Kernel
The Polynomial Kernel
 K(x
K(x1
1,x
,x2
2) = < x
) = < x1
1, x
, x2
2 >
> 2
2
 x
x1
1 = (x
= (x11
11, x
, x12
12)
)
 x
x2
2 = (x
= (x21
21, x
, x22
22)
)
 < x
< x1
1, x
, x2
2 > = (x
> = (x11
11x
x21
21 + x
+ x12
12x
x22
22)
)
 < x
< x1
1, x
, x2
2 >
> 2
2
= (x
= (x11
11
2
2
x
x21
21
2
2
+ x
+ x12
12
2
2
x
x22
22
2
2
+ 2x
+ 2x11
11 x
x12
12 x
x21
21 x
x22
22)
)
 
(x
(x1
1) = (x
) = (x11
11
2
2
, x
, x12
12
2
2
, √2x
, √2x11
11 x
x12
12)
)
 
(x
(x2
2) = (x
) = (x21
21
2
2
, x
, x22
22
2
2
, √2x
, √2x21
21 x
x22
22)
)
 K(x
K(x1
1,x
,x2
2) = <
) = < 
(x
(x1
1),
), 
(x
(x2
2)
) >
>
The Polynomial Kernel
The Polynomial Kernel
 
(x) contains all monomials of degree d
(x) contains all monomials of degree d
 Useful in visual pattern recognition
Useful in visual pattern recognition
 Number of monomials
Number of monomials
 16x16 pixel image
16x16 pixel image
 10
1010
10
monomials of degree 5
monomials of degree 5
 Never explicitly compute
Never explicitly compute 
(x)!
(x)!
 Variation - K(x
Variation - K(x1
1,x
,x2
2) = (< x
) = (< x1
1, x
, x2
2 > + 1)
> + 1) 2
2
A Few Good Kernels
A Few Good Kernels
 Dot product kernel
Dot product kernel
 K(x
K(x1
1,x
,x2
2) = < x
) = < x1
1,x
,x2
2 >
>
 Polynomial kernel
Polynomial kernel
 K(x
K(x1
1,x
,x2
2) = < x
) = < x1
1,x
,x2
2 >
>d
d
(Monomials of degree d)
(Monomials of degree d)
 K(x
K(x1
1,x
,x2
2) = (< x
) = (< x1
1,x
,x2
2 > + 1)
> + 1)d
d
(All monomials of degree 1,2,…,d)
(All monomials of degree 1,2,…,d)
 Gaussian kernel
Gaussian kernel
 K(x
K(x1
1,x
,x2
2) = exp(-| x
) = exp(-| x1
1-x
-x2
2 |
|2
2
/2
/2
2
2
)
)
 Radial basis functions
Radial basis functions
 Sigmoid kernel
Sigmoid kernel
 K(x
K(x1
1,x
,x2
2) = tanh(< x
) = tanh(< x1
1,x
,x2
2 > +
> + 
)
)
 Neural networks
Neural networks
 Establishing “kernel-hood” from first principles is non-
Establishing “kernel-hood” from first principles is non-
trivial
trivial
The Kernel Trick
The Kernel Trick
“Given an algorithm which is
formulated in terms of a positive
definite kernel K1, one can construct
an alternative algorithm by replacing
K1 with another positive definite
kernel K2”
 SVMs can use the kernel trick
Using a Different Kernel in the
Using a Different Kernel in the
Dual Optimization Problem
Dual Optimization Problem
 For example, using the polynomial kernel
For example, using the polynomial kernel
with d = 4 (including lower-order terms).
with d = 4 (including lower-order terms).
 Maximize over
Maximize over 

 W(
W(
) =
) = 
i
i 
i
i - 1/2
- 1/2 
i,j
i,j 
i
i 
j
j y
yi
i y
yj
j <x
<xi
i, x
, xj
j>
>
 Subject to
Subject to
 
i
i 
 0
0
 
i
i 
i
i y
yi
i = 0
= 0
 Decision function
Decision function
 f(x) = sign(
f(x) = sign(
i
i 
i
i y
yi
i <x, x
<x, xi
i> + b)
> + b)
(<x
(<xi
i, x
, xj
j> + 1)
> + 1)4
4
X
(<x
(<xi
i, x
, xj
j> + 1)
> + 1)4
4
X
These are kernels!
So by the kernel trick,
we just replace them!
Conclusion
Conclusion
 SVMs find optimal linear separator
SVMs find optimal linear separator
 The kernel trick makes SVMs non-linear
The kernel trick makes SVMs non-linear
learning algorithms
learning algorithms

More Related Content

PDF
prototypes-AMALEA.pdf
PDF
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PDF
Krylov Subspace based Direct Projection Techninques for low frequency fully c...
PDF
Numerical solutions of linear algebraic systems using matlab
PPT
Machine Learning
PDF
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
PDF
NYAI #9: Concepts and Questions As Programs by Brenden Lake
PPT
Support Vector Machines
prototypes-AMALEA.pdf
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
Krylov Subspace based Direct Projection Techninques for low frequency fully c...
Numerical solutions of linear algebraic systems using matlab
Machine Learning
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
NYAI #9: Concepts and Questions As Programs by Brenden Lake
Support Vector Machines

Similar to KNN and SVM algorithm in Machine Learning for MCA (20)

PPT
Computational Biology, Part 4 Protein Coding Regions
PPTX
DeepLearning
PPTX
TSIndexingIndexacao De Série ttemporal.pptx
PPT
Introduction to Machine Vision
PPTX
Multi-legged Robot Walking Strategies, with an Emphasis on Image-based Methods
PDF
Lecture 6-computer vision features descriptors matching
ODP
Artificial Intelligence and Optimization with Parallelism
PPT
Shai Avidan's Support vector tracking and ensemble tracking
PPTX
Teaching & Learning with Technology TLT 2016
PPT
Phylogenomic Supertrees. ORP Bininda-Emond
PDF
Meshing for computer graphics
PDF
HalifaxNGGs
PDF
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
PPT
Computational geometry
PDF
MaryamNajafianPhDthesis
PDF
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
PDF
SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification
PDF
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
PPTX
sharda_dss10_ppt_06_ST. decesion support
PPTX
Using Decision trees with GIS data for modeling and prediction
Computational Biology, Part 4 Protein Coding Regions
DeepLearning
TSIndexingIndexacao De Série ttemporal.pptx
Introduction to Machine Vision
Multi-legged Robot Walking Strategies, with an Emphasis on Image-based Methods
Lecture 6-computer vision features descriptors matching
Artificial Intelligence and Optimization with Parallelism
Shai Avidan's Support vector tracking and ensemble tracking
Teaching & Learning with Technology TLT 2016
Phylogenomic Supertrees. ORP Bininda-Emond
Meshing for computer graphics
HalifaxNGGs
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Computational geometry
MaryamNajafianPhDthesis
Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measur...
SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification
Bootstrap Custom Image Classification using Transfer Learning by Danielle Dea...
sharda_dss10_ppt_06_ST. decesion support
Using Decision trees with GIS data for modeling and prediction
Ad

More from trahul9 (13)

PPT
Linear Regression in Machine Learning Notes for MCA.ppt
PPTX
Linear Regression in Machine Learning Notes for MCA.pptx
PPTX
KNN algorithm in Machine Learning for MCA
PPTX
Machine Learning Notes on Decision Trees.pptx
PPT
Lecture notes on Software Engineering MCA
PPT
Notes on Understanding RDBMS2 for StudentsS.ppt
PPTX
05-Notes on Understanding RDBMS for Students.pptx
PPT
Notes on PHP for BCA and MCA php_(2).ppt
PPT
Notes on PHP for BCA and MCA php_(1).ppt
PPT
Notes on RDBMS for BCA and MCA overview.ppt
PPT
Notes on RDBMS for BCA and MCA unit-1.ppt
PPT
Working of Strassen's Matrix Multiplication.ppt
PPT
working of binary search .BINARY_SEARCH.ppt
Linear Regression in Machine Learning Notes for MCA.ppt
Linear Regression in Machine Learning Notes for MCA.pptx
KNN algorithm in Machine Learning for MCA
Machine Learning Notes on Decision Trees.pptx
Lecture notes on Software Engineering MCA
Notes on Understanding RDBMS2 for StudentsS.ppt
05-Notes on Understanding RDBMS for Students.pptx
Notes on PHP for BCA and MCA php_(2).ppt
Notes on PHP for BCA and MCA php_(1).ppt
Notes on RDBMS for BCA and MCA overview.ppt
Notes on RDBMS for BCA and MCA unit-1.ppt
Working of Strassen's Matrix Multiplication.ppt
working of binary search .BINARY_SEARCH.ppt
Ad

Recently uploaded (20)

PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
Lesson notes of climatology university.
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Trump Administration's workforce development strategy
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
IGGE1 Understanding the Self1234567891011
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Orientation - ARALprogram of Deped to the Parents.pptx
Lesson notes of climatology university.
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Trump Administration's workforce development strategy
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Final Presentation General Medicine 03-08-2024.pptx
Weekly quiz Compilation Jan -July 25.pdf
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
History, Philosophy and sociology of education (1).pptx
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
IGGE1 Understanding the Self1234567891011
Indian roads congress 037 - 2012 Flexible pavement
A systematic review of self-coping strategies used by university students to ...
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE

KNN and SVM algorithm in Machine Learning for MCA

  • 1. Machine Learning: Machine Learning: k-Nearest Neighbor and k-Nearest Neighbor and Support Vector Machines Support Vector Machines skim 20.4, 20.6-20.7 skim 20.4, 20.6-20.7 CMSC 471 CMSC 471
  • 2. Revised End-of-Semester Schedule Revised End-of-Semester Schedule  Wed 11/21 Wed 11/21 Machine Learning IV Machine Learning IV  Mon 11/26 Mon 11/26 Philosophy of AI Philosophy of AI (You must read the three articles!) (You must read the three articles!)  Wed 11/28 Wed 11/28 Special Topics Special Topics  Mon 12/3 Mon 12/3 Special Topics Special Topics  Wed 12/5 Wed 12/5 Review / Tournament dry run #2 Review / Tournament dry run #2 (HW6 due) (HW6 due)  Mon 12/10 Mon 12/10 Tournament Tournament  Wed 12/19 Wed 12/19 FINAL EXAM FINAL EXAM (1:00pm - 3:00pm) (Project and final report due) (1:00pm - 3:00pm) (Project and final report due) NO LATE SUBMISSIONS ALLOWED! NO LATE SUBMISSIONS ALLOWED!  Special Topics Special Topics  Robotics Robotics  AI in Games AI in Games  Natural language processing Natural language processing  Multi-agent systems Multi-agent systems
  • 3. k k-Nearest Neighbor -Nearest Neighbor Instance-Based Learning Instance-Based Learning Some material adapted from slides by Andrew Moore, CMU. Some material adapted from slides by Andrew Moore, CMU. Visit Visit http://www.autonlab.org/tutorials/ http://www.autonlab.org/tutorials/ for for Andrew’s repository of Data Mining tutorials. Andrew’s repository of Data Mining tutorials.
  • 4. 1-Nearest Neighbor 1-Nearest Neighbor  One of the simplest of all machine One of the simplest of all machine learning classifiers learning classifiers  Simple idea: label a new point the same as Simple idea: label a new point the same as the closest known point the closest known point Label it red.
  • 5. 1-Nearest Neighbor 1-Nearest Neighbor  A type of instance-based learning A type of instance-based learning  Also known as “memory-based” learning Also known as “memory-based” learning  Forms a Voronoi tessellation of the Forms a Voronoi tessellation of the instance space instance space
  • 6. Distance Metrics Distance Metrics  Different metrics can change the decision surface Different metrics can change the decision surface  Standard Euclidean distance metric: Standard Euclidean distance metric:  Two-dimensional: Dist(a,b) = sqrt((a Two-dimensional: Dist(a,b) = sqrt((a1 1 – b – b1 1) )2 2 + (a + (a2 2 – b – b2 2) )2 2 ) )  Multivariate: Dist(a,b) = sqrt(∑ (a Multivariate: Dist(a,b) = sqrt(∑ (ai i – b – bi i) )2 2 ) ) Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2 Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 7. Four Aspects of an Four Aspects of an Instance-Based Learner: Instance-Based Learner: 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 8. 1-NN’s Four Aspects as an 1-NN’s Four Aspects as an Instance-Based Learner: Instance-Based Learner: 1. A distance metric  Euclidian 2. How many nearby neighbors to look at?  One 3. A weighting function (optional)  Unused 4. How to fit with the local points?  Just predict the same output as the nearest neighbor. Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 9. Zen Gardens Zen Gardens Mystery of renowned zen garden revealed [CNN Article] Thursday, September 26, 2002 Posted: 10:11 AM EDT (1411 GMT) LONDON (Reuters) -- For centuries visitors to the renowned Ryoanji Temple garden in Kyoto, Japan have been entranced and mystified by the simple arrangement of rocks. The five sparse clusters on a rectangle of raked gravel are said to be pleasing to the eyes of the hundreds of thousands of tourists who visit the garden each year. Scientists in Japan said on Wednesday they now believe they have discovered its mysterious appeal. "We have uncovered the implicit structure of the Ryoanji garden's visual ground and have shown that it includes an abstract, minimalist depiction of natural scenery," said Gert Van Tonder of Kyoto University. The researchers discovered that the empty space of the garden evokes a hidden image of a branching tree that is sensed by the unconscious mind. "We believe that the unconscious perception of this pattern contributes to the enigmatic appeal of the garden," Van Tonder added. He and his colleagues believe that whoever created the garden during the Muromachi era between 1333-1573 knew exactly what they were doing and placed the rocks around the tree image. By using a concept called medial-axis transformation, the scientists showed that the hidden branched tree converges on the main area from which the garden is viewed. The trunk leads to the prime viewing site in the ancient temple that once overlooked the garden. It is thought that abstract art may have a similar impact. "There is a growing realisation that scientific analysis can reveal unexpected structural features hidden in controversial abstract paintings," Van Tonder said Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 10. k – Nearest Neighbor k – Nearest Neighbor  Generalizes 1-NN to smooth away noise Generalizes 1-NN to smooth away noise in the labels in the labels  A new point is now assigned the most A new point is now assigned the most frequent label of its frequent label of its k k nearest neighbors nearest neighbors Label it red, when k = 3 Label it blue, when k = 7
  • 11. k-Nearest Neighbor (k = 9) k-Nearest Neighbor (k = 9) A magnificent job of noise smoothing. Three cheers for 9- nearest-neighbor. But the lack of gradients and the jerkiness isn’t good. Appalling behavior! Loses all the detail that 1-nearest neighbor would give. The tails are horrible! Fits much less of the noise, captures trends. But still, frankly, pathetic compared with linear regression. Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 12. Support Vector Support Vector Machines and Kernels Machines and Kernels Adapted from slides by Tim Oates Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County University of Maryland Baltimore County Doing Doing Really Really Well with Well with Linear Decision Surfaces Linear Decision Surfaces
  • 13. Outline Outline  Prediction Prediction  Why might predictions be wrong? Why might predictions be wrong?  Support vector machines Support vector machines  Doing really well with linear models Doing really well with linear models  Kernels Kernels  Making the non-linear linear Making the non-linear linear
  • 14. Supervised ML = Prediction Supervised ML = Prediction  Given training instances (x,y) Given training instances (x,y)  Learn a model f Learn a model f  Such that f(x) = y Such that f(x) = y  Use f to predict y for new x Use f to predict y for new x  Many variations on this basic theme Many variations on this basic theme
  • 15. Why might predictions be wrong? Why might predictions be wrong?  True Non-Determinism True Non-Determinism  Flip a biased coin Flip a biased coin  p( p(heads heads) = ) =    Estimate Estimate    If If   > 0.5 predict > 0.5 predict heads heads, else , else tails tails  Lots of ML research on problems like this Lots of ML research on problems like this  Learn a model Learn a model  Do the best you can in expectation Do the best you can in expectation
  • 16. Why might predictions be wrong? Why might predictions be wrong?  Partial Observability Partial Observability  Something needed to predict y is missing Something needed to predict y is missing from observation x from observation x  N-bit parity problem N-bit parity problem  x contains N-1 bits (hard PO) x contains N-1 bits (hard PO)  x contains N bits but learner ignores some of them x contains N bits but learner ignores some of them (soft PO) (soft PO)
  • 17. Why might predictions be wrong? Why might predictions be wrong?  True non-determinism True non-determinism  Partial observability Partial observability  hard, soft hard, soft  Representational bias Representational bias  Algorithmic bias Algorithmic bias  Bounded resources Bounded resources
  • 18. Representational Bias Representational Bias  Having the right features (x) is crucial Having the right features (x) is crucial X O O O O X X X X O O O O X X X
  • 19. Support Vector Support Vector Machines Machines Doing Doing Really Really Well with Linear Well with Linear Decision Surfaces Decision Surfaces
  • 20. Strengths of SVMs Strengths of SVMs  Good generalization in theory Good generalization in theory  Good generalization in practice Good generalization in practice  Work well with few training instances Work well with few training instances  Find globally best model Find globally best model  Efficient algorithms Efficient algorithms  Amenable to the kernel trick Amenable to the kernel trick
  • 21. Linear Separators Linear Separators  Training instances Training instances  x x    n n  y y   {-1, 1} {-1, 1}  w w    n n  b b      Hyperplane Hyperplane  <w, x> + b = 0 <w, x> + b = 0  w w1 1x x1 1 + w + w2 2x x2 2 … + w … + wn nx xn n + b = 0 + b = 0  Decision function Decision function  f(x) = sign(<w, x> + b) f(x) = sign(<w, x> + b) Math Review Math Review Inner (dot) product: Inner (dot) product: <a, b> = a · b = ∑ a <a, b> = a · b = ∑ ai i*b *bi i = a = a1 1b b1 1 + a + a2 2b b2 2 + …+a + …+an nb bn n
  • 26. A “Good” Separator A “Good” Separator X X O O O O O O X X X X X X O O
  • 27. Noise in the Observations Noise in the Observations X X O O O O O O X X X X X X O O
  • 28. Ruling Out Some Separators Ruling Out Some Separators X X O O O O O O X X X X X X O O
  • 29. Lots of Noise Lots of Noise X X O O O O O O X X X X X X O O
  • 30. Maximizing the Margin Maximizing the Margin X X O O O O O O X X X X X X O O
  • 33. The Math The Math  Training instances Training instances  x x    n n  y y   {-1, 1} {-1, 1}  Decision function Decision function  f(x) = sign(<w,x> + b) f(x) = sign(<w,x> + b)  w w    n n  b b      Find w and b that Find w and b that  Perfectly classify training instances Perfectly classify training instances  Assuming linear separability Assuming linear separability  Maximize margin Maximize margin
  • 34. The Math The Math  For perfect classification, we want For perfect classification, we want  y yi i (<w,x (<w,xi i> + b) ≥ 0 for all i > + b) ≥ 0 for all i  Why? Why?  To maximize the margin, we want To maximize the margin, we want  w that minimizes |w| w that minimizes |w|2 2
  • 35. Dual Optimization Problem Dual Optimization Problem  Maximize over Maximize over    W( W( ) = ) =  i i  i i - 1/2 - 1/2  i,j i,j  i i  j j y yi i y yj j <x <xi i, x , xj j> >  Subject to Subject to   i i   0 0   i i  i i y yi i = 0 = 0  Decision function Decision function  f(x) = sign( f(x) = sign( i i  i i y yi i <x, x <x, xi i> + b) > + b)
  • 36. Strengths of SVMs Strengths of SVMs  Good generalization in theory Good generalization in theory  Good generalization in practice Good generalization in practice  Work well with few training instances Work well with few training instances  Find globally best model Find globally best model  Efficient algorithms Efficient algorithms  Amenable to the kernel trick … Amenable to the kernel trick …
  • 37. What if Surface is Non-Linear? What if Surface is Non-Linear? X X X X X X O O O O O O O O O O O O O O O O O O O O Image from http://www.atrandomresearch.com/iclass/
  • 38. Kernel Methods Kernel Methods Making the Non-Linear Linear Making the Non-Linear Linear
  • 39. When Linear Separators Fail When Linear Separators Fail X O O O O X X X x1 x2 X O O O O X X X x1 x1 2
  • 40. Mapping into a New Feature Space Mapping into a New Feature Space  Rather than run SVM on x Rather than run SVM on xi i, run it on , run it on  (x (xi i) )  Find non-linear separator in input space Find non-linear separator in input space  What if What if  (x (xi i) is really big? ) is really big?  Use kernels to compute it implicitly! Use kernels to compute it implicitly!   : x : x   X = X =  (x) (x)  (x (x1 1,x ,x2 2) = (x ) = (x1 1,x ,x2 2,x ,x1 1 2 2 ,x ,x2 2 2 2 ,x ,x1 1x x2 2) ) Image from http://web.engr.oregonstate.edu/ ~afern/classes/cs534/
  • 41. Kernels Kernels  Find kernel K such that Find kernel K such that  K(x K(x1 1,x ,x2 2) = < ) = <  (x (x1 1), ),  (x (x2 2)> )>  Computing K(x Computing K(x1 1,x ,x2 2) should be efficient, much ) should be efficient, much more so than computing more so than computing  (x (x1 1) and ) and  (x (x2 2) )  Use K(x Use K(x1 1,x ,x2 2) in SVM algorithm rather than ) in SVM algorithm rather than <x <x1 1,x ,x2 2> >  Remarkably, this is possible Remarkably, this is possible
  • 42. The Polynomial Kernel The Polynomial Kernel  K(x K(x1 1,x ,x2 2) = < x ) = < x1 1, x , x2 2 > > 2 2  x x1 1 = (x = (x11 11, x , x12 12) )  x x2 2 = (x = (x21 21, x , x22 22) )  < x < x1 1, x , x2 2 > = (x > = (x11 11x x21 21 + x + x12 12x x22 22) )  < x < x1 1, x , x2 2 > > 2 2 = (x = (x11 11 2 2 x x21 21 2 2 + x + x12 12 2 2 x x22 22 2 2 + 2x + 2x11 11 x x12 12 x x21 21 x x22 22) )   (x (x1 1) = (x ) = (x11 11 2 2 , x , x12 12 2 2 , √2x , √2x11 11 x x12 12) )   (x (x2 2) = (x ) = (x21 21 2 2 , x , x22 22 2 2 , √2x , √2x21 21 x x22 22) )  K(x K(x1 1,x ,x2 2) = < ) = <  (x (x1 1), ),  (x (x2 2) ) > >
  • 43. The Polynomial Kernel The Polynomial Kernel   (x) contains all monomials of degree d (x) contains all monomials of degree d  Useful in visual pattern recognition Useful in visual pattern recognition  Number of monomials Number of monomials  16x16 pixel image 16x16 pixel image  10 1010 10 monomials of degree 5 monomials of degree 5  Never explicitly compute Never explicitly compute  (x)! (x)!  Variation - K(x Variation - K(x1 1,x ,x2 2) = (< x ) = (< x1 1, x , x2 2 > + 1) > + 1) 2 2
  • 44. A Few Good Kernels A Few Good Kernels  Dot product kernel Dot product kernel  K(x K(x1 1,x ,x2 2) = < x ) = < x1 1,x ,x2 2 > >  Polynomial kernel Polynomial kernel  K(x K(x1 1,x ,x2 2) = < x ) = < x1 1,x ,x2 2 > >d d (Monomials of degree d) (Monomials of degree d)  K(x K(x1 1,x ,x2 2) = (< x ) = (< x1 1,x ,x2 2 > + 1) > + 1)d d (All monomials of degree 1,2,…,d) (All monomials of degree 1,2,…,d)  Gaussian kernel Gaussian kernel  K(x K(x1 1,x ,x2 2) = exp(-| x ) = exp(-| x1 1-x -x2 2 | |2 2 /2 /2 2 2 ) )  Radial basis functions Radial basis functions  Sigmoid kernel Sigmoid kernel  K(x K(x1 1,x ,x2 2) = tanh(< x ) = tanh(< x1 1,x ,x2 2 > + > +  ) )  Neural networks Neural networks  Establishing “kernel-hood” from first principles is non- Establishing “kernel-hood” from first principles is non- trivial trivial
  • 45. The Kernel Trick The Kernel Trick “Given an algorithm which is formulated in terms of a positive definite kernel K1, one can construct an alternative algorithm by replacing K1 with another positive definite kernel K2”  SVMs can use the kernel trick
  • 46. Using a Different Kernel in the Using a Different Kernel in the Dual Optimization Problem Dual Optimization Problem  For example, using the polynomial kernel For example, using the polynomial kernel with d = 4 (including lower-order terms). with d = 4 (including lower-order terms).  Maximize over Maximize over    W( W( ) = ) =  i i  i i - 1/2 - 1/2  i,j i,j  i i  j j y yi i y yj j <x <xi i, x , xj j> >  Subject to Subject to   i i   0 0   i i  i i y yi i = 0 = 0  Decision function Decision function  f(x) = sign( f(x) = sign( i i  i i y yi i <x, x <x, xi i> + b) > + b) (<x (<xi i, x , xj j> + 1) > + 1)4 4 X (<x (<xi i, x , xj j> + 1) > + 1)4 4 X These are kernels! So by the kernel trick, we just replace them!
  • 47. Conclusion Conclusion  SVMs find optimal linear separator SVMs find optimal linear separator  The kernel trick makes SVMs non-linear The kernel trick makes SVMs non-linear learning algorithms learning algorithms