SlideShare a Scribd company logo
Artificial Intelligence & Computer Vision Lab School of Computer Science and Engineering Seoul National University Machine Learning Instance-based Learning
Overview Introduction k -Nearest Neighbor Learning Locally Weighted Regression  Radial Basis Functions Case-based Reasoning Remarks on Lazy and Eager Learning Summary
Introduction Approaches in previous chapters Training examples New instances Classified Target function …
Introduction (cont.) Instance-based learning Lazy : processing is postponed  Until queries are encountered Stores all training examples When a new query is encountered, examples related to the query instance are retrieved and processed Training examples
Introduction (cont.) Can construct a different approx. for target function for each query Local approximation Can use more complex and symbolic representation for instances (Case-based reasoning) Cost of classifying new instances is high Indexing is a significant issue
k -Nearest Neighbor Learning Inductive bias The classification of a new instance is most similar to the one of other instances that are nearby in Euclidean distance. Assumption All instances are correspond to points in the n-dimensional space  R n Distance from  x i  to  x j  is given by Target function  Discrete or real-valued
Illustrative example 5-nearest neighbor 2-dim. data Target class : boolean ( +  or -) k -Nearest Neighbor Learning (cont.) + and - : location and target value of training instances x q  : query instance
Discrete-valued target function x 1  ~ x k  : nearest k instances V  : set of target values  (v) Real-valued target function k -Nearest Neighbor Learning (cont.)
k-NN never forms explicit general hypothesis Implicit Voronoi diagram 1-nearest neighbor k -Nearest Neighbor Learning (cont.) query point  nearest neighbor
3 target classes  Red, Green, Blue 2-dim. data k -Nearest Neighbor Learning (cont.)
Large  k Less sensitive to noise (particularly class noise) Better probability estimates Small  k Captures fine structure of problem space better Cost less k -Nearest Neighbor Learning (cont.)
Tradeoff between small and large  k   Want to use large  k , but more emphasis on nearer neighbors Weight nearer neighbors more heavily Use  all training examples  instead of just  k  (Stepard’s method) k -Nearest Neighbor Learning (cont.) Discrete-valued Real-valued Distance-weighted NN
Consider ‘all’ attributes Decision tree : Subset of attributes considered What if just some of attributes are relevant to target value?  ⇒  Curse of Dimensionality Solutions to  Curse of Dimensionality 1. Weight each attribute differently 2. Eliminate the least relevant attributes Cross validation To determine scaling factors for each attributes Leave-one-out cross-validation (for method 2 above) k -Nearest Neighbor Learning (cont.) Remarks
Indexing is important Significant computation is required at query time Because of ‘lazy’ kd -tree (Bentley 75, Friedman et al. 1977) k -Nearest Neighbor Learning (cont.)
Locally Weighted Regression Approximation to  f   over a local region surrounding  x q Produce “piecewise approximation” to  f k-NN  : local approximation to  f  for each query point  x q Global regression : global approximation to  f   Approximated function  f   Used to get the estimated target value  f  ( x q ) Different local approximation for each distinct query Various forms Constant, linear function, quadratic function, … ^ ^
“ Locally Weighted Regression” “ locally” The function is approximated based only on data near the query point “ weighted” Contribution of each training example is weighted by its distance from the query point “ regression” approximating a real-valued function Locally Weighted Regression (cont.)
Approximated linear function Should choose weights that minimize the sum of squared error Can apply gradient descent rule Locally Weighted Regression (cont.)
Global approximation  ->  local approximation For just the  k  nearest neighbors Apply weight for all instances Combine above two Using gradient descent rule Locally Weighted Regression (cont.)
A broad range of method for approximating the target function Constant, linear, quadratic function More complex functions are not common Fitting is costly Simple forms suffice for small subregion of instance space Locally Weighted Regression (cont.)
An approach to function approximation related to distance-weighted regression and also to artificial neural networks. Approximated function Linear combination of radial kernel functions f(x)  is a global approximation for  f(x) K u (d(x u ,x))  is localized to region nearby  x u Radial Basis Functions ^
K(d(x, y))  : kernel function Decreases as distance  d(x, y)  increases E.g. Gaussian function Radial Basis Functions (cont.) Using Gaussian radial basis functions Using sigmoidal radial basis functions
RBF Networks  Two-layered network 1 st  layer : computes  K u 2 nd  layer : computes weighted linear sum of values from 1 st  layer Uses (typically) Gaussian kernel function Radial Basis Functions (cont.)
Training RBFN : 2 phases 1 st  phase Number  k  of hidden units is determined For each hidden unit  u , choose  x u  and  σ u 2 nd  phase For each  u , weights  w u  is trained Efficiently trained (kernel functions are already determined) Radial Basis Functions (cont.)
Choosing number  k  of hidden units (kernel functions)    1. allocate a kernel function for each training example  ( k  = number of training examples) each training example < x i , f(x i ) > can influence the value of the approximated function only  in the neighborhood of  x i .  costly 2. Choose ( k  < number of training examples) much more efficient than above center  x u   should be determined uniformly centered throughout instance space randomly selecting a subset of training examples, thereby sampling the underlying distribution of instances identify clusters of instances, then add a kernel function centered at each cluster ( EM  algorithm applied) Radial Basis Functions (cont.)
Key advantage of RBFN Can be trained much more efficiently than feedfoward networks trained with  Backpropagation  because input layer and output layer of an RBFN are trained separately RBFN provides a global approximation to the target function, represented by a linear combination of many local kernel functions. The value for any given kernel function is non-negligible only when the input falls into the region defined by its particular center and width. Thus, the network can be viewed as a smooth linear combination of many local approximations to the target function.  Radial Basis Functions (cont.)
Case-Based Reasoning Problem solving paradigm which utilizes specific knowledge experienced from concrete problem situations or cases. by remembering a previous similar situation and by reusing information and knowledge of that situation based on human information processing (HIP) model in some problem areas
Use much complex representation for instances Can be applied to problems such as Conceptual design of mechanical device Reasoning about new legal cases on prev. rulings Solving planning and scheduling problems by reusing and combining portions of previous solutions to similar problems Case-Based Reasoning (cont.)
CADET (Sycara et al. 1992): Conceptual design of simple mechanical devices Each training example < qualitative function, mechanical structure> New query : desired function Target value : mechanical structure for this function Process If an exact match is found, then this case can be returned If no exact match occurs, find cases that match various subgraphs of the desired function. By retrieving multiple cases that match different subgraphs, the entire design can be pieced together. In general, the process of producing a final solution from multiple retrieved cases can be very complex.  Case-Based Reasoning (cont.)
CADET example Design of water faucet Case-Based Reasoning (cont.)
Instances or cases may be represented by rich symbolic descriptions, such as function graphs used in CADET. This may require a similarity metric such as the size of the largest shared subgraph between two function graphs. Multiple retrieved cases may be combined to form the solution to the new problem, which relies on knowledge-based reasoning rather than statistical methods as in k-Nearest Neighbor approach. Tight coupling exists between case retrieval, knowledge-based reasoning, and search-intensive problem solving.  Case-Based Reasoning (cont.)
Current research issue is to develop improved methods for indexing cases: Syntactic similarity measures,  such as subgraph isomorphism between function   graphs , provides only as approximate indication of the relevance of a particular case to a particular problem.  When the CBR system attempts to reuse the retrieved cases, it may uncover difficulties that were not captured by this syntactic similarity measures:  For example, in CADET the multiple retrieved design fragments may turn out be incompatible with one another, making it impossible to combine them into a consistent final design. When this occurs in general, the CBR system may backtrack and search for additional cases, adapting the existing cases, or resort to other problem-solving methods. In particular, if a case is retrieved based on the similarity metric but found to be irrelevant based on further analysis, then the similarity metric should be refined to reject this case for similar subsequent queries.  Case-Based Reasoning (cont.)
Remarks on Lazy and Eager Learning Lazy Learning Method Generalization is delayed until each query is encountered Can consider the query when deciding how to generalize k-Nearest Neighbor, Locally Weighted Regression, Case-Based Reasoning, … Eager Learning Method Generalization beyond entire training set Radial Basis Function Networks, C4.5, Backpropagation, …
Computation time Lazy methods : less for training, but more for querying  Eager methods : more for training, but less for querying Generalization accuracy Given the same hypothesis space  H ,  Eager method provides global single approximation hypothesis Lazy method provides many different local approximation hypothesis Radial basis function networks Eager but  Use multiple local approximation But not the same as lazy… Uses pre-determined center, not query instance Remarks on Lazy and Eager Learning (cont.)

More Related Content

PPTX
Instance based learning
PPT
Instance Based Learning in Machine Learning
PPTX
Instance based learning
PPT
Clustering: Large Databases in data mining
PPT
lecture_mooney.ppt
PDF
K - Nearest neighbor ( KNN )
PDF
Training machine learning knn 2017
PDF
Machine learning in science and industry — day 2
Instance based learning
Instance Based Learning in Machine Learning
Instance based learning
Clustering: Large Databases in data mining
lecture_mooney.ppt
K - Nearest neighbor ( KNN )
Training machine learning knn 2017
Machine learning in science and industry — day 2

What's hot (20)

PDF
Machine learning in science and industry — day 1
PDF
Density Based Clustering
PDF
Optics ordering points to identify the clustering structure
PDF
Reweighting and Boosting to uniforimty in HEP
PDF
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
PPTX
Clustering on database systems rkm
PDF
Birch
PPTX
Machine Learning Algorithms (Part 1)
PDF
Machine learning in science and industry — day 3
PDF
17 Machine Learning Radial Basis Functions
PPT
3.2 partitioning methods
PPT
3.4 density and grid methods
PPTX
K-Nearest Neighbor Classifier
PDF
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
PPTX
Machine Learning Algorithms Review(Part 2)
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PPT
Zoooooohaib
PPTX
K-means Clustering
PPTX
DBSCAN (2014_11_25 06_21_12 UTC)
PDF
MLHEP 2015: Introductory Lecture #4
Machine learning in science and industry — day 1
Density Based Clustering
Optics ordering points to identify the clustering structure
Reweighting and Boosting to uniforimty in HEP
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Clustering on database systems rkm
Birch
Machine Learning Algorithms (Part 1)
Machine learning in science and industry — day 3
17 Machine Learning Radial Basis Functions
3.2 partitioning methods
3.4 density and grid methods
K-Nearest Neighbor Classifier
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Machine Learning Algorithms Review(Part 2)
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
Zoooooohaib
K-means Clustering
DBSCAN (2014_11_25 06_21_12 UTC)
MLHEP 2015: Introductory Lecture #4
Ad

Similar to Artificial Intelligence (20)

PPTX
UNIT IV (4).pptx
PPTX
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
PPTX
Lecture 17: Supervised Learning Recap
PPT
Parallel Computing 2007: Bring your own parallel application
PPT
Section5 Rbf
PPT
instance bases k nearest neighbor algorithm.ppt
PPT
Lect4
DOCX
Neural nw k means
PDF
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
PDF
MLHEP Lectures - day 1, basic track
PPT
[ppt]
PPT
[ppt]
PPT
Poggi analytics - distance - 1a
PDF
ANSSummer2015
PDF
Development of Multi-Level ROM
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
PPTX
Knn 160904075605-converted
PPT
Support Vector Machines
PPTX
Mining Regional Knowledge in Spatial Dataset
PPT
Machine Learning
UNIT IV (4).pptx
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Lecture 17: Supervised Learning Recap
Parallel Computing 2007: Bring your own parallel application
Section5 Rbf
instance bases k nearest neighbor algorithm.ppt
Lect4
Neural nw k means
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
MLHEP Lectures - day 1, basic track
[ppt]
[ppt]
Poggi analytics - distance - 1a
ANSSummer2015
Development of Multi-Level ROM
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Knn 160904075605-converted
Support Vector Machines
Mining Regional Knowledge in Spatial Dataset
Machine Learning
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Artificial Intelligence

  • 1. Artificial Intelligence & Computer Vision Lab School of Computer Science and Engineering Seoul National University Machine Learning Instance-based Learning
  • 2. Overview Introduction k -Nearest Neighbor Learning Locally Weighted Regression Radial Basis Functions Case-based Reasoning Remarks on Lazy and Eager Learning Summary
  • 3. Introduction Approaches in previous chapters Training examples New instances Classified Target function …
  • 4. Introduction (cont.) Instance-based learning Lazy : processing is postponed Until queries are encountered Stores all training examples When a new query is encountered, examples related to the query instance are retrieved and processed Training examples
  • 5. Introduction (cont.) Can construct a different approx. for target function for each query Local approximation Can use more complex and symbolic representation for instances (Case-based reasoning) Cost of classifying new instances is high Indexing is a significant issue
  • 6. k -Nearest Neighbor Learning Inductive bias The classification of a new instance is most similar to the one of other instances that are nearby in Euclidean distance. Assumption All instances are correspond to points in the n-dimensional space R n Distance from x i to x j is given by Target function Discrete or real-valued
  • 7. Illustrative example 5-nearest neighbor 2-dim. data Target class : boolean ( + or -) k -Nearest Neighbor Learning (cont.) + and - : location and target value of training instances x q : query instance
  • 8. Discrete-valued target function x 1 ~ x k : nearest k instances V : set of target values (v) Real-valued target function k -Nearest Neighbor Learning (cont.)
  • 9. k-NN never forms explicit general hypothesis Implicit Voronoi diagram 1-nearest neighbor k -Nearest Neighbor Learning (cont.) query point nearest neighbor
  • 10. 3 target classes Red, Green, Blue 2-dim. data k -Nearest Neighbor Learning (cont.)
  • 11. Large k Less sensitive to noise (particularly class noise) Better probability estimates Small k Captures fine structure of problem space better Cost less k -Nearest Neighbor Learning (cont.)
  • 12. Tradeoff between small and large k Want to use large k , but more emphasis on nearer neighbors Weight nearer neighbors more heavily Use all training examples instead of just k (Stepard’s method) k -Nearest Neighbor Learning (cont.) Discrete-valued Real-valued Distance-weighted NN
  • 13. Consider ‘all’ attributes Decision tree : Subset of attributes considered What if just some of attributes are relevant to target value? ⇒ Curse of Dimensionality Solutions to Curse of Dimensionality 1. Weight each attribute differently 2. Eliminate the least relevant attributes Cross validation To determine scaling factors for each attributes Leave-one-out cross-validation (for method 2 above) k -Nearest Neighbor Learning (cont.) Remarks
  • 14. Indexing is important Significant computation is required at query time Because of ‘lazy’ kd -tree (Bentley 75, Friedman et al. 1977) k -Nearest Neighbor Learning (cont.)
  • 15. Locally Weighted Regression Approximation to f over a local region surrounding x q Produce “piecewise approximation” to f k-NN : local approximation to f for each query point x q Global regression : global approximation to f Approximated function f Used to get the estimated target value f ( x q ) Different local approximation for each distinct query Various forms Constant, linear function, quadratic function, … ^ ^
  • 16. “ Locally Weighted Regression” “ locally” The function is approximated based only on data near the query point “ weighted” Contribution of each training example is weighted by its distance from the query point “ regression” approximating a real-valued function Locally Weighted Regression (cont.)
  • 17. Approximated linear function Should choose weights that minimize the sum of squared error Can apply gradient descent rule Locally Weighted Regression (cont.)
  • 18. Global approximation -> local approximation For just the k nearest neighbors Apply weight for all instances Combine above two Using gradient descent rule Locally Weighted Regression (cont.)
  • 19. A broad range of method for approximating the target function Constant, linear, quadratic function More complex functions are not common Fitting is costly Simple forms suffice for small subregion of instance space Locally Weighted Regression (cont.)
  • 20. An approach to function approximation related to distance-weighted regression and also to artificial neural networks. Approximated function Linear combination of radial kernel functions f(x) is a global approximation for f(x) K u (d(x u ,x)) is localized to region nearby x u Radial Basis Functions ^
  • 21. K(d(x, y)) : kernel function Decreases as distance d(x, y) increases E.g. Gaussian function Radial Basis Functions (cont.) Using Gaussian radial basis functions Using sigmoidal radial basis functions
  • 22. RBF Networks Two-layered network 1 st layer : computes K u 2 nd layer : computes weighted linear sum of values from 1 st layer Uses (typically) Gaussian kernel function Radial Basis Functions (cont.)
  • 23. Training RBFN : 2 phases 1 st phase Number k of hidden units is determined For each hidden unit u , choose x u and σ u 2 nd phase For each u , weights w u is trained Efficiently trained (kernel functions are already determined) Radial Basis Functions (cont.)
  • 24. Choosing number k of hidden units (kernel functions) 1. allocate a kernel function for each training example ( k = number of training examples) each training example < x i , f(x i ) > can influence the value of the approximated function only in the neighborhood of x i . costly 2. Choose ( k < number of training examples) much more efficient than above center x u should be determined uniformly centered throughout instance space randomly selecting a subset of training examples, thereby sampling the underlying distribution of instances identify clusters of instances, then add a kernel function centered at each cluster ( EM algorithm applied) Radial Basis Functions (cont.)
  • 25. Key advantage of RBFN Can be trained much more efficiently than feedfoward networks trained with Backpropagation because input layer and output layer of an RBFN are trained separately RBFN provides a global approximation to the target function, represented by a linear combination of many local kernel functions. The value for any given kernel function is non-negligible only when the input falls into the region defined by its particular center and width. Thus, the network can be viewed as a smooth linear combination of many local approximations to the target function. Radial Basis Functions (cont.)
  • 26. Case-Based Reasoning Problem solving paradigm which utilizes specific knowledge experienced from concrete problem situations or cases. by remembering a previous similar situation and by reusing information and knowledge of that situation based on human information processing (HIP) model in some problem areas
  • 27. Use much complex representation for instances Can be applied to problems such as Conceptual design of mechanical device Reasoning about new legal cases on prev. rulings Solving planning and scheduling problems by reusing and combining portions of previous solutions to similar problems Case-Based Reasoning (cont.)
  • 28. CADET (Sycara et al. 1992): Conceptual design of simple mechanical devices Each training example < qualitative function, mechanical structure> New query : desired function Target value : mechanical structure for this function Process If an exact match is found, then this case can be returned If no exact match occurs, find cases that match various subgraphs of the desired function. By retrieving multiple cases that match different subgraphs, the entire design can be pieced together. In general, the process of producing a final solution from multiple retrieved cases can be very complex. Case-Based Reasoning (cont.)
  • 29. CADET example Design of water faucet Case-Based Reasoning (cont.)
  • 30. Instances or cases may be represented by rich symbolic descriptions, such as function graphs used in CADET. This may require a similarity metric such as the size of the largest shared subgraph between two function graphs. Multiple retrieved cases may be combined to form the solution to the new problem, which relies on knowledge-based reasoning rather than statistical methods as in k-Nearest Neighbor approach. Tight coupling exists between case retrieval, knowledge-based reasoning, and search-intensive problem solving. Case-Based Reasoning (cont.)
  • 31. Current research issue is to develop improved methods for indexing cases: Syntactic similarity measures, such as subgraph isomorphism between function graphs , provides only as approximate indication of the relevance of a particular case to a particular problem. When the CBR system attempts to reuse the retrieved cases, it may uncover difficulties that were not captured by this syntactic similarity measures: For example, in CADET the multiple retrieved design fragments may turn out be incompatible with one another, making it impossible to combine them into a consistent final design. When this occurs in general, the CBR system may backtrack and search for additional cases, adapting the existing cases, or resort to other problem-solving methods. In particular, if a case is retrieved based on the similarity metric but found to be irrelevant based on further analysis, then the similarity metric should be refined to reject this case for similar subsequent queries. Case-Based Reasoning (cont.)
  • 32. Remarks on Lazy and Eager Learning Lazy Learning Method Generalization is delayed until each query is encountered Can consider the query when deciding how to generalize k-Nearest Neighbor, Locally Weighted Regression, Case-Based Reasoning, … Eager Learning Method Generalization beyond entire training set Radial Basis Function Networks, C4.5, Backpropagation, …
  • 33. Computation time Lazy methods : less for training, but more for querying Eager methods : more for training, but less for querying Generalization accuracy Given the same hypothesis space H , Eager method provides global single approximation hypothesis Lazy method provides many different local approximation hypothesis Radial basis function networks Eager but Use multiple local approximation But not the same as lazy… Uses pre-determined center, not query instance Remarks on Lazy and Eager Learning (cont.)