Lecture 11

Machine Learning Paradigm :
 Observe set of examples: training data
 Infer something about process that generated that
data
 Use inference to make predictions about previously
unseen data: test data
Supervised: given a set of feature/label pairs, find a
rule that predicts the label associated with a
previously unseen input
 Unsupervised: given a set of feature vectors
(without labels) group them into “natural clusters”

Clustering: An Optimization Problem
 Why not divide variability by size of cluster?
◦ Big and bad worse than small and bad
 Is optimization problem finding a C that
minimizes dissimilarity(C)?
◦ No, otherwise could put each example in its
own cluster
 Need a constraint, e.g.,
Minimum distance between clusters
Number of clusters

Hierarchical Clustering:
 Start by assigning each item to a cluster, so that if
you have N items, you now have N clusters, each
containing just one item.
Find the closest (most similar) pair of clusters and
merge them into a single cluster, so that now you
have one fewer cluster.
 Continue the process until all items are clustered
into a single cluster of size N.
What does distance mean?

Linkage Metrics
 Single-linkage: consider the distance between one
cluster and another cluster to be equal to the
shortest distance from any member of one cluster
to any member of the other cluster
 Complete-linkage: consider the distance between
one cluster and another cluster to be equal to the
greatest distance from any member of one cluster
Average-linkage: consider the distance between
one cluster and another cluster to be equal to the
average distance from any member of one cluster

Example of Hierarchical Clustering:

Clustering Algorithms:
 Hierarchical clustering
 Can select number of clusters using dendogram
 Deterministic
 Flexible with respect to linkage criteria
 Slow
 Naïve algorithm n3
 n2 algorithms exist for some linkage criteria
 K-means a much faster greedy algorithm
 Most useful when you know how many clusters
you want

K-means Algorithm:
randomly chose k examples as initial centroids
while true:
create k clusters by assigning each
example to closest centroid
compute k new centroids by averaging
examples in each cluster
if centroids don’t change:
Break
What is complexity of one iteration?
k*n*d, where n is number of points and d time
required to compute the distance between a pair of
points.

Issues with k-means:
 Choosing the “wrong” k can lead to strange results
 Consider k = 3
 Result can depend upon initial centroids
 Number of iterations
 Even final result
 Greedy algorithm can find different local optimas

How to Choose K:
A priori knowledge about application domain
There are two kinds of people in the world: k = 2
 There are five different types of bacteria: k = 5
 Search for a good k
Try different values of k and evaluate quality of
results
 Run hierarchical clustering on subset of data

Lecture 11

More Related Content

What's hot (20)

Similar to Lecture 11 (20)

More from Jeet Das (13)

Recently uploaded (20)

Lecture 11