SlideShare a Scribd company logo
SWAPNA.C
1.Introduction
 Instance-based learning methods such as nearest
neighbour and locally weighted regression are conceptually
straightforward approaches to approximating real-valued
or discrete-valued target functions.
 It can also called Lazy Learning.
 Learning in these algorithms consists of simply storing the
presented training data.
 When a new query instance is encountered, a set similar
related instances is retrieved from memory and used to
classify the new query instance.
Swapna.C
 It mainly uses memorizing technique to learn.
 Instance Based Learning can be achieved by
3 approaches.
 1. Lazy learning (Nearest Neihbourhood), KNN
 2. Radial Basis Function( based on weighted methods)
 3. Case-Based Reasoning
 That instance-based approaches can construct a different
approximation to the target function for each distinct
query instance that must be classified.
 For large instances it assigns locally target for each new
instance.
Swapna.C
 Advantages:
 Instance-based methods can also use more complex,
symbolic representations for instances.
 In case-based learning instances are represented in this
fashion and the process for identifying “neighbouring”
instances is elaborated accordingly.
 Case-based reasoning has been applied to tasks such as
storing and reusing past experience at a help desk,
reasoning about legal cases by referring to previous cases,
and solving complex scheduling problems by reusing
relevant portions of previously solved problems .
Swapna.C
 Disadvantages:
 Cost of classifying new instances can be high.
 Techniques for efficiently indexing training examples are a
significant practical issue in reducing the computation
required at query time.
 Especially nearest neighbour approaches, is that the
typically consider all attributes of the instances when
attempting to retrieve similar training examples from
memory.
 If the target concept depends on only a few of the many
available attributes, then the instances that are truly most
“similar” may well be a large distance apart.
Swapna.C
2.K-NEAREST NEIGHBOR LEARNING:
 This algorithm assumes all instances correspond to points in the
n-dimensional space. The nearest neighbours of an instance are
defined in terms of the standard Euclidean distance.
 Let an arbitrary instance x be described by the feature vector
 Where ar(x) denotes the value of the rth attribute of instance x.
Then the distance between 2 instances xi and xj is defined to be
d(xi,xj), where ar(x) =

Swapna.C
Swapna.C
 The operation of the k-NEAREST NEIGHBOR algorithm for the
case where the instances are points in a 2- dimensional space and
where the target function is boolean valued.
 The positive and negative training examples are shown by “+”
and “-” res.1-NEAREST NEIGHBOR Algorithm classifies xq as a
positive ex.
 Whereas the 5-NEAREST NEIGHBOR algorithm classifies it as a
negative example.
 What the implicit general function is, or what classifications
would be assigned if we were to hold the training examples
constant and query the algorithm with every possible instance in
x.
Swapna.C
Swapna.C
Swapna.C
 The shape of this decision surface induced by 1-NEAREST
NEIGHBOR over the entire instance space.
 The decision surface is a combination of convex polyhedral
surrounding each of the training ex.
for every training ex.
 The polyhedron indicates the set of query points whose
classification will be completely determined by that
training ex. Query points outside the polyhedron are closer
to some other training ex. This kind of diagram is often
called the Voronoi diagram for the set of training ex.
Swapna.C
 The k-NEAREST NEIGHBOR algorithm is easily
adapted to approximating continuous-valued target
functions.
Swapna.C
2.1.Distance –Weighted NEAREST NEIGHBOR
Algorithm
 K-NEAREST NEIGHBOR Algorithm is to weight the
contribution of each of the k neighbours according to
their distance to the query point xq, giving greater
weight to closer neighbours.
EX. Which approximates discrete-valued target
functions, we might weight the vote of each neighbour
according to the inverse square of its distance from xq.
Swapna.C
Swapna.C
Swapna.C
2.2 Remarks on K-NEAREST NEIGHBOUR
Algorithm
 The distance-weighted k-NEAREST NEIGHBOR Algorithm
is a highly effective inductive inference method for many
practical problems.
 It is robust to noisy training data and quite effective when it
is provided a sufficiently large set of training data.
 Note that by taking the weighted average of the k
neighbours nearest to the query point, it can smooth out
the impact of isolated noisy training examples.
Swapna.C
 The inductive bias corresponds to an assumption that the
classification of an stance xq will be most similar to the
classification of other instances that are near by in Euclidean
distance.
 In k-NEAREST NEIGHBOR , The distance between neighbours
will be dominated by the large number of irrelevant attributes re
present, is sometimes referred to as the curse of dimensionality.
 Overcome this problem is to weight each attribute differently
when calculating the distance between 2 instances. This
corresponds to stretching the axes in the Euclidean space,
shortening the axes that correspond to less relevant attributes,
and lengthening the axes that correspond to more relevant
attributes.
Swapna.C
 The process of stretching the axes in order to optimize the
performance of k-NEAREST NEIGHBOR algorithms provides a
mechanism for suppressing the impact of irrelevant attributes.
 Alternative is to completely eliminate the least relevant
attributes from the instance space. This is equivalent to setting
some of the scaling factors to zero.
 Moore and Lee discuss efficient cross-validation methods for
selecting relevant subsets of the attributes for k-NEAREST
NEIGHBOR algorithms.
 They explore methods based on leave-one-out cross validation,
in which the set of m training instances is repeatedly divided
into a training set of size m-1 and test set of size 1.
Swapna.C
 There is risk of over fitting. The approach of locally
stretching the axes is much less common.
 Practical issue in applying k-NEAREST NEIGHBOR is
efficient memory indexing.
 This algorithm delays all processing until a new query is
received, significant computation can be required to
process each new query.
 One indexing method is the kd-tree, in which instances
are stored at the leaves of a tree, with nearby instances
stored at the dame or nearby nodes.
 The internal nodes of the tree sort the new query xq to the
relevant leaf by testing selected attributes of xq.
Swapna.C
3.LOCALLY WEIGHTED REGRESSION
 This is a generalization approach of KNN.
 The phrase "locally weighted regression" is called local
because the function is approximated based a only on data
near the query point, weighted because the contribution of
each training example is weighted by its distance from the
query point, and regression because this is the term used
widely in the statistical learning community for the
problem of approximating real-valued functions.
Swapna.C
 It constructs on explicit approximation to f over the local
region corresponds to query point.
3.1 Locally Weighted Linear Regression
 Let us consider the case of locally weighted regression in
which the target function f is approximated near x, using a
linear function of the form
 We choose weights that minimize the squared error
summed over the set D of training examples
Swapna.C
 Gradient descent training rule
 Error criterion E
Swapna.C
 Criterion two is perhaps the most esthetical pleasing
because it allows every training example to have an
impact on the classification of xq.
 However, this approach requires computation that
grows linearly with the number of training examples.
 Criterion three is a good approximation to criterion
two and has the advantage that computational cost is
independent of the total number of training examples;
its cost depends only on the number k of neighbours
considered.
Swapna.C
 Choose criterion 3 and re derive
Swapna.C
3.2 Remarks on Locally Weighted Regression
 The literature on locally weighted regression contains a broad
range of alternative methods for distance weighting the training
examples, and a range of methods for locally approximating the
target function.
 In most cases, the target function is approximated by a constant,
linear, or quadratic function.
 More complex functional forms are not often found because
 (1) the cost of fitting more complex functions for each query
instance is prohibitively high, and
 (2) these simple approximations model the target function quite
well over a sufficiently small sub region of the instance space.
Swapna.C
4.RADIAL BASIS FUCTIONS
 Function approximation that is closely related to
distance-weighted regression and also to artificial
neural networks is learning with radial basis functions.
Swapna.C
 Gaussian kernel function
Swapna.C
 Given a set of training examples of the target function,
RBF networks are typically trained in a two-stage
process.
 First, the number k of hidden units is determined
and each hidden unit u is defined by choosing the
values of xu and that define its kernel
function Ku(d(xu, x)).
 Second, the weights wu, are trained to maximize the
fit of the network to the training data, using the global
error criterion.
Swapna.C
 Several alternative methods have been proposed for
choosing an appropriate number of hidden units or
equivalently, kernel functions.
 One approach is to allocate a Gaussian kernel function for
each training example (xi, f(xi)), centering this Gaussian at
the point xi. Each of these kernels may be assigned the
same width.
 One advantage of this choice of kernel functions is that it
allows the RBF network to fit the training data exactly. That
is, for any set of m training examples the weights wo . . . w,
for combining the m Gaussian kernel functions can be
set so that f(xi) = f (xi) for each training example<xi,f(xi)>
Swapna.C
 A second approach is to choose a set of kernel
functions that is smaller than the number of training
examples. It especially when the number of training
examples is large.
 Alternatively, we may wish to distribute the centers
non uniformly, especially if the instances themselves
are found to be distributed non uniformly over X.
 Radial basis function networks provide a global
approximation to the target function, represented by a
linear combination of many local kernel functions.
Swapna.C
 The value for any given kernel function is non-negligible
only when the input x falls into the region defined by its
particular center and width.
 Thus, the network can be viewed as a smooth linear
combination of many local approximations to the target
function.
 One key advantage to RBF networks is that they can be
trained much more efficiently than feed forward networks
trained with BACKPROPAGATTION.
 This follows from the fact that the input layer and the
output layer of an RBF are trained separately.
Swapna.C
5.CASE-BASED REASONING
 Instance-based methods such as k - NEAREST NEIGHBOR
and locally weighted regression share three key
properties.
 First, they are lazy learning methods in that they defer the
decision of how to generalize beyond the training data
until a new query instance is observed.
 Second, they classify new query instances by analyzing
similar instances while ignoring instances that are very
different from the query.
 Third, they represent instances as real-valued points in an
n-dimensional Euclidean space.
Swapna.C
 In CBR, instances are typically represented using more rich
symbolic descriptions, and the methods used to retrieve
similar instances are correspondingly more elaborate.
 CBR has been applied to problems such as conceptual
design of mechanical devices based on a stored library of
previous designs (Sycara et al. 1992), reasoning about new
legal cases based on previous rulings (Ashley 1990), and
solving planning and scheduling problems by reusing and
combining portions of previous solutions to similar
problems.
Swapna.C
 Conceptual design of mechanical devices based on
previous stored devices.
Swapna.C
 Several generic properties of case-based reasoning systems that
distinguish them from approaches such as
k-NEAREST NEIGHBOR:
 Instances or cases may be represented by rich symbolic descriptions, such as
the function graphs used in CADET. This may require a similarity metric
different from Euclidean distance, such as the size of the largest shared sub
graph between two function graphs.
 Multiple retrieved cases may be combined to form the solution to the new
problem. This is similar to the k-NEAREST NEIGHBOUR approach, in that
multiple similar cases are used to construct a response for the new query.
However, the process for combining these multiple retrieved cases can be very
different, relying on knowledge-based reasoning rather than statistical
methods.
Swapna.C
 There may be a tight coupling between case retrieval,
knowledge-based reasoning, and problem solving.
 One simple example of this is found in CADET, which
uses generic knowledge about influences to rewrite
function graphs during its attempt to find matching
cases.
 Other systems have been developed that more fully
integrate case-based reasoning into general search
based problem-solving systems. Two examples are
ANAPRON and PRODIGY?ANALOGY.
Swapna.C
6.REMARKS ON LAZY AND EAGER LEARNING
 we considered three lazy learning methods:
The k-NEAREST NEIGHBOR algorithm, locally
weighted regression, and case-based reasoning.
 These methods lazy because they defer the decision of
how to generalize beyond the training data until each
new query instance is encountered.
 One Eager learning method: the method for
learning radial basis function networks.
Swapna.C
 Differences in computation time and differences in
the classifications produced for new queries.
 Lazy methods will generally require less computation
during training, but more computation when they must
predict the target value for a new query.
 The key difference between lazy and eager methods in
this regard is
 Lazy methods may consider the query instance x, when
deciding how to generalize beyond the training data D.
 Eager methods cannot. By the time they observe the query
instance x, they have already chosen their (global)
approximation to the target function.
Swapna.C
 For each new query xq it generalizes from the training data by
choosing a new hypothesis based on the training examples near
xq.
 In contrast, an eager learner that uses the same hypothesis space
of linear functions must choose its approximation before the
queries are observed.
 The eager learner must therefore commit to a single linear
function hypothesis that covers the entire instance space and all
future queries.
 The lazy method effectively uses a richer hypothesis space
because it uses many different local linear functions to form its
implicit global approximation to the target function.
Swapna.C
 Lazy learner has the option of (implicitly) representing the
target function by a combination of many local
approximations , whereas an eager learner must commit at
training time to a single global approximation.
 The distinction between eager and lazy learning is thus
related to the distinction between global and local
approximations to the target function.
 The RBF learning methods we discussed are eager methods
that commit to a global approximation to the target
function at training time.
Swapna.C
 RBF networks are built eagerly from local approximations
centred around the training examples, or around clusters
of training examples, but not around the unknown future
query points.
 lazy methods have the option of selecting a different
hypothesis or local approximation to the target function for
each query instance.
 Eager methods using the same hypothesis space are more
restricted because they must commit to a single hypothesis
that covers the entire instance space.
 Eager methods can, of course, employ hypothesis spaces
that combine multiple local approximations, as in RBF
networks.
Swapna.C

More Related Content

PPT
Instance Based Learning in Machine Learning
PPTX
Android studio ppt
PPT
Artificial Neural Networks - ANN
PPTX
REGRESSION ANALYSIS
PPTX
Multilayer & Back propagation algorithm
ODP
Machine Learning with Decision trees
PDF
Research Methodology
PPTX
Id3 algorithm
Instance Based Learning in Machine Learning
Android studio ppt
Artificial Neural Networks - ANN
REGRESSION ANALYSIS
Multilayer & Back propagation algorithm
Machine Learning with Decision trees
Research Methodology
Id3 algorithm

What's hot (20)

PPTX
Computational learning theory
PPTX
Dbscan algorithom
PPT
3.2 partitioning methods
PPTX
Evaluating hypothesis
PPTX
Inductive bias
PPT
Support Vector Machines
PDF
9. chapter 8 np hard and np complete problems
PPT
2.5 backpropagation
PDF
Introduction to Machine Learning Classifiers
PPTX
sum of subset problem using Backtracking
PPTX
Analytical learning
PPTX
Cost estimation for Query Optimization
PPTX
Association rule mining.pptx
PPTX
Grid based method & model based clustering method
PPTX
2. forward chaining and backward chaining
PDF
bag-of-words models
PPT
Adhoc and Sensor Networks - Chapter 03
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPT
2.3 bayesian classification
PPTX
Module 4 part_1
Computational learning theory
Dbscan algorithom
3.2 partitioning methods
Evaluating hypothesis
Inductive bias
Support Vector Machines
9. chapter 8 np hard and np complete problems
2.5 backpropagation
Introduction to Machine Learning Classifiers
sum of subset problem using Backtracking
Analytical learning
Cost estimation for Query Optimization
Association rule mining.pptx
Grid based method & model based clustering method
2. forward chaining and backward chaining
bag-of-words models
Adhoc and Sensor Networks - Chapter 03
Naïve Bayes Classifier Algorithm.pptx
2.3 bayesian classification
Module 4 part_1
Ad

Similar to Instance based learning (20)

PPTX
UNIT IV (4).pptx
PPTX
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
PPT
Artificial Intelligence
PPT
instance bases k nearest neighbor algorithm.ppt
DOCX
Neural nw k means
PDF
PPTX
SVM - Functional Verification
PPTX
Machine Learning Algorithms (Part 1)
PDF
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
PDF
Data analysis of weather forecasting
PDF
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
PDF
Av33274282
PDF
Av33274282
PDF
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
PDF
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
DOCX
8.clustering algorithm.k means.em algorithm
PDF
Clustering Algorithms for Data Stream
PDF
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
PPTX
Knn 160904075605-converted
PDF
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
UNIT IV (4).pptx
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Artificial Intelligence
instance bases k nearest neighbor algorithm.ppt
Neural nw k means
SVM - Functional Verification
Machine Learning Algorithms (Part 1)
KNN,Weighted KNN,Nearest Centroid Classifier,Locally Weighted Regression
Data analysis of weather forecasting
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
Av33274282
Av33274282
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
8.clustering algorithm.k means.em algorithm
Clustering Algorithms for Data Stream
Investigating the Performance of Distanced-Based Weighted-Voting approaches i...
Knn 160904075605-converted
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
Ad

More from swapnac12 (16)

PPTX
Awt, Swing, Layout managers
PPTX
Applet
PPTX
Event handling
PPTX
Asymptotic notations(Big O, Omega, Theta )
PPTX
Performance analysis(Time & Space Complexity)
PPTX
Introduction ,characteristics, properties,pseudo code conventions
PPTX
Inductive analytical approaches to learning
PPTX
Using prior knowledge to initialize the hypothesis,kbann
PPTX
Combining inductive and analytical learning
PPTX
Learning set of rules
PPTX
Learning rule of first order rules
PPTX
Genetic algorithms
PPTX
Artificial Neural Networks 1
PPTX
Advanced topics in artificial neural networks
PPTX
Introdution and designing a learning system
PPTX
Concept learning and candidate elimination algorithm
Awt, Swing, Layout managers
Applet
Event handling
Asymptotic notations(Big O, Omega, Theta )
Performance analysis(Time & Space Complexity)
Introduction ,characteristics, properties,pseudo code conventions
Inductive analytical approaches to learning
Using prior knowledge to initialize the hypothesis,kbann
Combining inductive and analytical learning
Learning set of rules
Learning rule of first order rules
Genetic algorithms
Artificial Neural Networks 1
Advanced topics in artificial neural networks
Introdution and designing a learning system
Concept learning and candidate elimination algorithm

Recently uploaded (20)

PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
master seminar digital applications in india
PPTX
Pharma ospi slides which help in ospi learning
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Basic Mud Logging Guide for educational purpose
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
O7-L3 Supply Chain Operations - ICLT Program
PPH.pptx obstetrics and gynecology in nursing
human mycosis Human fungal infections are called human mycosis..pptx
master seminar digital applications in india
Pharma ospi slides which help in ospi learning
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Supply Chain Operations Speaking Notes -ICLT Program
GDM (1) (1).pptx small presentation for students
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Renaissance Architecture: A Journey from Faith to Humanism
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Insiders guide to clinical Medicine.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Basic Mud Logging Guide for educational purpose
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial diseases, their pathogenesis and prophylaxis
O5-L3 Freight Transport Ops (International) V1.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?

Instance based learning

  • 2. 1.Introduction  Instance-based learning methods such as nearest neighbour and locally weighted regression are conceptually straightforward approaches to approximating real-valued or discrete-valued target functions.  It can also called Lazy Learning.  Learning in these algorithms consists of simply storing the presented training data.  When a new query instance is encountered, a set similar related instances is retrieved from memory and used to classify the new query instance. Swapna.C
  • 3.  It mainly uses memorizing technique to learn.  Instance Based Learning can be achieved by 3 approaches.  1. Lazy learning (Nearest Neihbourhood), KNN  2. Radial Basis Function( based on weighted methods)  3. Case-Based Reasoning  That instance-based approaches can construct a different approximation to the target function for each distinct query instance that must be classified.  For large instances it assigns locally target for each new instance. Swapna.C
  • 4.  Advantages:  Instance-based methods can also use more complex, symbolic representations for instances.  In case-based learning instances are represented in this fashion and the process for identifying “neighbouring” instances is elaborated accordingly.  Case-based reasoning has been applied to tasks such as storing and reusing past experience at a help desk, reasoning about legal cases by referring to previous cases, and solving complex scheduling problems by reusing relevant portions of previously solved problems . Swapna.C
  • 5.  Disadvantages:  Cost of classifying new instances can be high.  Techniques for efficiently indexing training examples are a significant practical issue in reducing the computation required at query time.  Especially nearest neighbour approaches, is that the typically consider all attributes of the instances when attempting to retrieve similar training examples from memory.  If the target concept depends on only a few of the many available attributes, then the instances that are truly most “similar” may well be a large distance apart. Swapna.C
  • 6. 2.K-NEAREST NEIGHBOR LEARNING:  This algorithm assumes all instances correspond to points in the n-dimensional space. The nearest neighbours of an instance are defined in terms of the standard Euclidean distance.  Let an arbitrary instance x be described by the feature vector  Where ar(x) denotes the value of the rth attribute of instance x. Then the distance between 2 instances xi and xj is defined to be d(xi,xj), where ar(x) =  Swapna.C
  • 8.  The operation of the k-NEAREST NEIGHBOR algorithm for the case where the instances are points in a 2- dimensional space and where the target function is boolean valued.  The positive and negative training examples are shown by “+” and “-” res.1-NEAREST NEIGHBOR Algorithm classifies xq as a positive ex.  Whereas the 5-NEAREST NEIGHBOR algorithm classifies it as a negative example.  What the implicit general function is, or what classifications would be assigned if we were to hold the training examples constant and query the algorithm with every possible instance in x. Swapna.C
  • 11.  The shape of this decision surface induced by 1-NEAREST NEIGHBOR over the entire instance space.  The decision surface is a combination of convex polyhedral surrounding each of the training ex. for every training ex.  The polyhedron indicates the set of query points whose classification will be completely determined by that training ex. Query points outside the polyhedron are closer to some other training ex. This kind of diagram is often called the Voronoi diagram for the set of training ex. Swapna.C
  • 12.  The k-NEAREST NEIGHBOR algorithm is easily adapted to approximating continuous-valued target functions. Swapna.C
  • 13. 2.1.Distance –Weighted NEAREST NEIGHBOR Algorithm  K-NEAREST NEIGHBOR Algorithm is to weight the contribution of each of the k neighbours according to their distance to the query point xq, giving greater weight to closer neighbours. EX. Which approximates discrete-valued target functions, we might weight the vote of each neighbour according to the inverse square of its distance from xq. Swapna.C
  • 16. 2.2 Remarks on K-NEAREST NEIGHBOUR Algorithm  The distance-weighted k-NEAREST NEIGHBOR Algorithm is a highly effective inductive inference method for many practical problems.  It is robust to noisy training data and quite effective when it is provided a sufficiently large set of training data.  Note that by taking the weighted average of the k neighbours nearest to the query point, it can smooth out the impact of isolated noisy training examples. Swapna.C
  • 17.  The inductive bias corresponds to an assumption that the classification of an stance xq will be most similar to the classification of other instances that are near by in Euclidean distance.  In k-NEAREST NEIGHBOR , The distance between neighbours will be dominated by the large number of irrelevant attributes re present, is sometimes referred to as the curse of dimensionality.  Overcome this problem is to weight each attribute differently when calculating the distance between 2 instances. This corresponds to stretching the axes in the Euclidean space, shortening the axes that correspond to less relevant attributes, and lengthening the axes that correspond to more relevant attributes. Swapna.C
  • 18.  The process of stretching the axes in order to optimize the performance of k-NEAREST NEIGHBOR algorithms provides a mechanism for suppressing the impact of irrelevant attributes.  Alternative is to completely eliminate the least relevant attributes from the instance space. This is equivalent to setting some of the scaling factors to zero.  Moore and Lee discuss efficient cross-validation methods for selecting relevant subsets of the attributes for k-NEAREST NEIGHBOR algorithms.  They explore methods based on leave-one-out cross validation, in which the set of m training instances is repeatedly divided into a training set of size m-1 and test set of size 1. Swapna.C
  • 19.  There is risk of over fitting. The approach of locally stretching the axes is much less common.  Practical issue in applying k-NEAREST NEIGHBOR is efficient memory indexing.  This algorithm delays all processing until a new query is received, significant computation can be required to process each new query.  One indexing method is the kd-tree, in which instances are stored at the leaves of a tree, with nearby instances stored at the dame or nearby nodes.  The internal nodes of the tree sort the new query xq to the relevant leaf by testing selected attributes of xq. Swapna.C
  • 20. 3.LOCALLY WEIGHTED REGRESSION  This is a generalization approach of KNN.  The phrase "locally weighted regression" is called local because the function is approximated based a only on data near the query point, weighted because the contribution of each training example is weighted by its distance from the query point, and regression because this is the term used widely in the statistical learning community for the problem of approximating real-valued functions. Swapna.C
  • 21.  It constructs on explicit approximation to f over the local region corresponds to query point. 3.1 Locally Weighted Linear Regression  Let us consider the case of locally weighted regression in which the target function f is approximated near x, using a linear function of the form  We choose weights that minimize the squared error summed over the set D of training examples Swapna.C
  • 22.  Gradient descent training rule  Error criterion E Swapna.C
  • 23.  Criterion two is perhaps the most esthetical pleasing because it allows every training example to have an impact on the classification of xq.  However, this approach requires computation that grows linearly with the number of training examples.  Criterion three is a good approximation to criterion two and has the advantage that computational cost is independent of the total number of training examples; its cost depends only on the number k of neighbours considered. Swapna.C
  • 24.  Choose criterion 3 and re derive Swapna.C
  • 25. 3.2 Remarks on Locally Weighted Regression  The literature on locally weighted regression contains a broad range of alternative methods for distance weighting the training examples, and a range of methods for locally approximating the target function.  In most cases, the target function is approximated by a constant, linear, or quadratic function.  More complex functional forms are not often found because  (1) the cost of fitting more complex functions for each query instance is prohibitively high, and  (2) these simple approximations model the target function quite well over a sufficiently small sub region of the instance space. Swapna.C
  • 26. 4.RADIAL BASIS FUCTIONS  Function approximation that is closely related to distance-weighted regression and also to artificial neural networks is learning with radial basis functions. Swapna.C
  • 27.  Gaussian kernel function Swapna.C
  • 28.  Given a set of training examples of the target function, RBF networks are typically trained in a two-stage process.  First, the number k of hidden units is determined and each hidden unit u is defined by choosing the values of xu and that define its kernel function Ku(d(xu, x)).  Second, the weights wu, are trained to maximize the fit of the network to the training data, using the global error criterion. Swapna.C
  • 29.  Several alternative methods have been proposed for choosing an appropriate number of hidden units or equivalently, kernel functions.  One approach is to allocate a Gaussian kernel function for each training example (xi, f(xi)), centering this Gaussian at the point xi. Each of these kernels may be assigned the same width.  One advantage of this choice of kernel functions is that it allows the RBF network to fit the training data exactly. That is, for any set of m training examples the weights wo . . . w, for combining the m Gaussian kernel functions can be set so that f(xi) = f (xi) for each training example<xi,f(xi)> Swapna.C
  • 30.  A second approach is to choose a set of kernel functions that is smaller than the number of training examples. It especially when the number of training examples is large.  Alternatively, we may wish to distribute the centers non uniformly, especially if the instances themselves are found to be distributed non uniformly over X.  Radial basis function networks provide a global approximation to the target function, represented by a linear combination of many local kernel functions. Swapna.C
  • 31.  The value for any given kernel function is non-negligible only when the input x falls into the region defined by its particular center and width.  Thus, the network can be viewed as a smooth linear combination of many local approximations to the target function.  One key advantage to RBF networks is that they can be trained much more efficiently than feed forward networks trained with BACKPROPAGATTION.  This follows from the fact that the input layer and the output layer of an RBF are trained separately. Swapna.C
  • 32. 5.CASE-BASED REASONING  Instance-based methods such as k - NEAREST NEIGHBOR and locally weighted regression share three key properties.  First, they are lazy learning methods in that they defer the decision of how to generalize beyond the training data until a new query instance is observed.  Second, they classify new query instances by analyzing similar instances while ignoring instances that are very different from the query.  Third, they represent instances as real-valued points in an n-dimensional Euclidean space. Swapna.C
  • 33.  In CBR, instances are typically represented using more rich symbolic descriptions, and the methods used to retrieve similar instances are correspondingly more elaborate.  CBR has been applied to problems such as conceptual design of mechanical devices based on a stored library of previous designs (Sycara et al. 1992), reasoning about new legal cases based on previous rulings (Ashley 1990), and solving planning and scheduling problems by reusing and combining portions of previous solutions to similar problems. Swapna.C
  • 34.  Conceptual design of mechanical devices based on previous stored devices. Swapna.C
  • 35.  Several generic properties of case-based reasoning systems that distinguish them from approaches such as k-NEAREST NEIGHBOR:  Instances or cases may be represented by rich symbolic descriptions, such as the function graphs used in CADET. This may require a similarity metric different from Euclidean distance, such as the size of the largest shared sub graph between two function graphs.  Multiple retrieved cases may be combined to form the solution to the new problem. This is similar to the k-NEAREST NEIGHBOUR approach, in that multiple similar cases are used to construct a response for the new query. However, the process for combining these multiple retrieved cases can be very different, relying on knowledge-based reasoning rather than statistical methods. Swapna.C
  • 36.  There may be a tight coupling between case retrieval, knowledge-based reasoning, and problem solving.  One simple example of this is found in CADET, which uses generic knowledge about influences to rewrite function graphs during its attempt to find matching cases.  Other systems have been developed that more fully integrate case-based reasoning into general search based problem-solving systems. Two examples are ANAPRON and PRODIGY?ANALOGY. Swapna.C
  • 37. 6.REMARKS ON LAZY AND EAGER LEARNING  we considered three lazy learning methods: The k-NEAREST NEIGHBOR algorithm, locally weighted regression, and case-based reasoning.  These methods lazy because they defer the decision of how to generalize beyond the training data until each new query instance is encountered.  One Eager learning method: the method for learning radial basis function networks. Swapna.C
  • 38.  Differences in computation time and differences in the classifications produced for new queries.  Lazy methods will generally require less computation during training, but more computation when they must predict the target value for a new query.  The key difference between lazy and eager methods in this regard is  Lazy methods may consider the query instance x, when deciding how to generalize beyond the training data D.  Eager methods cannot. By the time they observe the query instance x, they have already chosen their (global) approximation to the target function. Swapna.C
  • 39.  For each new query xq it generalizes from the training data by choosing a new hypothesis based on the training examples near xq.  In contrast, an eager learner that uses the same hypothesis space of linear functions must choose its approximation before the queries are observed.  The eager learner must therefore commit to a single linear function hypothesis that covers the entire instance space and all future queries.  The lazy method effectively uses a richer hypothesis space because it uses many different local linear functions to form its implicit global approximation to the target function. Swapna.C
  • 40.  Lazy learner has the option of (implicitly) representing the target function by a combination of many local approximations , whereas an eager learner must commit at training time to a single global approximation.  The distinction between eager and lazy learning is thus related to the distinction between global and local approximations to the target function.  The RBF learning methods we discussed are eager methods that commit to a global approximation to the target function at training time. Swapna.C
  • 41.  RBF networks are built eagerly from local approximations centred around the training examples, or around clusters of training examples, but not around the unknown future query points.  lazy methods have the option of selecting a different hypothesis or local approximation to the target function for each query instance.  Eager methods using the same hypothesis space are more restricted because they must commit to a single hypothesis that covers the entire instance space.  Eager methods can, of course, employ hypothesis spaces that combine multiple local approximations, as in RBF networks. Swapna.C