2.7 other classifiers

Classification Techniques
and Regression

Lazy vs. Eager Learning
 Lazy vs. Eager learning
 Lazy learning (e.g., instance-based learning): Simply stores training data (or
only minor processing) and waits until it is given a test tuple
 Eager learning (the above discussed methods): Given a set of training set,
constructs a classification model before receiving new (e.g., test) data to
classify
 Lazy: less time in training but more time in predicting
 Accuracy
 Lazy method effectively uses a richer hypothesis space
 Eager: must commit to a single hypothesis
2

Lazy Learner: Instance-Based Methods
 Instance-based learning:
 Store training examples and delay the processing (“lazy
evaluation”) until a new instance must be classified
 Typical approaches
 k-nearest neighbor approach
 Instances represented as points in a Euclidean space.
 Case-based reasoning
 Uses symbolic representations and knowledge-based
inference
3

The k-Nearest Neighbor Algorithm
 Labor Intensive – currently used for pattern recognition
 Learning by Analogy
 All instances correspond to points in the n-D space
 The nearest neighbor are defined in terms of Euclidean distance,
dist(X1, X2)
 For discrete-valued, k-NN returns the most common value among
the k training examples nearest to xq
 Assigns equal weight to all attributes
 Attributes have to be normalized
 Can also be used for prediction
4

k-NN Algorithm
 k-NN for real-valued prediction for a given unknown tuple
 Returns the mean values of the k nearest neighbors
 Distance-weighted nearest neighbor algorithm
 Weight the contribution of each of the k neighbors according to their
distance to the query xq
 Give greater weight to closer neighbors
 Robust to noisy data by averaging k-nearest neighbors
 Curse of dimensionality: distance between neighbors could be
dominated by irrelevant attributes
 To overcome it, axes stretch or elimination of the least relevant attributes
5

k-NN Algorithm
 Categorical attributes
 If two values are identical – difference =0
 Missing values
 For categorical values difference is 1 if either one or both values are
missing
 For numeric values
 Both are missing – difference is 1
 One value is missing other is v’ – difference is 1-v’ or v’
 Complexity
 Classification O(|D|) comparisons are needed

Case-Based Reasoning (CBR)
 CBR: Uses a database of problem solutions to solve new problems
 Store symbolic description (tuples or cases)—not points in a Euclidean space
 Applications: Customer-service (product-related diagnosis), legal ruling
 Methodology
 Instances represented by rich symbolic descriptions (e.g., function graphs)
 Search for similar cases, multiple retrieved cases may be combined
 Tight coupling between case retrieval, knowledge-based reasoning, and problem
solving
 Challenges
 Find a good similarity metric
 Indexing based on syntactic similarity measure, and when failure, backtracking,
and adapting to additional cases
7

Genetic Algorithms (GA)
 Genetic Algorithm: based on an analogy to biological evolution
 An initial population is created consisting of randomly generated rules
 Each rule is represented by a string of bits
 E.g., if A1 and ¬A2 then C2 can be encoded as 100
 If an attribute has k > 2 values, k bits can be used
 Based on the notion of survival of the fittest, a new population is formed to consist of the
fittest rules and their offsprings
 The fitness of a rule is represented by its classification accuracy on a set of training
examples
 Offsprings are generated by crossover and mutation
 The process continues until a population P evolves when each rule in P satisfies a
prespecified threshold
 Slow but easily parallelizable
8

Rough Set Approach
 Rough sets are used to approximately or “roughly” define equivalent
classes
 A rough set for a given class C is approximated by two sets: a lower
approximation (certain to be in C) and an upper approximation (cannot be
described as not belonging to C)
 Rough sets can be used for Feature Reduction and Relevance analysis
9

Fuzzy Set Approaches
 Fuzzy logic uses truth values between 0.0 and 1.0 to represent the
degree of membership (such as using fuzzy membership graph)
 Attribute values are converted to fuzzy values
 e.g., income is mapped into the discrete categories {low, medium, high}
with fuzzy values calculated
 For a given new sample, more than one fuzzy value may apply
 Each applicable rule contributes a vote for membership in the
categories
 Typically, the truth values for each predicted category are summed,
and these sums are combined
10

Fuzzy-Set Approaches
 Fuzzy measures
 AND operation
 m(high_income AND senior_employee) (x) = min(mhigh_income(x), msenior_employee(x))
 OR operation
 m(high_income OR senior_employee) (x) = max(mhigh_income(x), msenior_employee(x))

Prediction
 (Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
 Major method for prediction: regression
 model the relationship between one or more independent or predictor
variables and a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson regression,
log-linear models, regression trees
13

Linear Regression
 Linear regression: involves a response variable y and a single
predictor variable x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
 Multiple linear regression: involves more than one predictor variable
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method
∑
∑
=
=
−
−−
= ||
1
2
||
1
)(
))((
1 D
i
i
D
i
ii
xx
yyxx
w xwyw
10
−=
14

Nonlinear Regression
 Some nonlinear models can be modeled by a polynomial function
 A polynomial regression model can be transformed into linear
regression model. For example,
y = w0 + w1 x + w2 x2
+ w3 x3
convertible to linear
 Other functions, such as power function, can also be transformed to
linear model
 Some models are intractable nonlinear (e.g., sum of exponential
terms)
 possible to obtain least square estimates through extensive calculation on
more complex formulae
15

Other Regression-Based Models
 Generalized linear model:
 Foundation on which linear regression can be applied to modeling
categorical response variables
 Variance of y is a function of the mean value of y, not a constant
 Logistic regression: models the prob. of some event occurring as a linear
function of a set of predictor variables
 Poisson regression: models the data that exhibit a Poisson distribution
 Log-linear models: (for categorical data)
 Data Cubes
 Also useful for data compression and smoothing
 Decision trees
 Regression trees
 Model trees
16

2.7 other classifiers

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to 2.7 other classifiers (20)

More from Krish_ver2 (20)

Recently uploaded (20)

2.7 other classifiers