K-means slides, K-means annotated, GMM slides, GMM annotated.pdf

CS229: Machine Learning
Clustering:
Grouping
Related Docs
©2022 Carlos Guestrin
Carlos Guestrin
Stanford University
Slides include content developed by and co-developed with
Emily Fox

Motivating clustering approaches

3
Goal: Structure documents by topic
Discover groups (clusters) of related articles
SPORTS WORLD NEWS

4
Why might clustering be useful?
I don’t just
like sports!
0
0.1
0.2
0.3
0.4
0.5
0.6
Sports
W
orld
N
ew
s
Entertainm
ent
Science

Learn user preferences
Cluster 1
Cluster 3 Cluster 4
Cluster 2
Use feedback
to learn user
preferences
over topics
Set of clustered documents read by user

Clustering: An unsupervised learning task

What if some of the labels are known?
Training set of labeled docs
SPORTS WORLD NEWS
ENTERTAINMENT SCIENCE

8
Clustering
No labels provided
…uncover cluster structure
from input alone
Input: docs as vectors xi
Output: cluster labels zi
An unsupervised
learning task

9
What defines a cluster?
Assign observation xi (doc)
to cluster k (topic label) if
- Score under cluster k is
higher than under others
- For simplicity, often define
score as distance to cluster
center (ignoring shape)
Cluster defined by center & shape/spread

10
Hope for unsupervised learning
Easy
Impossible
In between

11
Other (challenging!) clusters to discover…

12
Other (challenging!) clusters to discover…

k-means: A clustering algorithm

14
k-means
Assume
-Score= distance to
cluster center
(smaller better)
DATA
to
CLUSTER

15
k-means algorithm
0. Initialize cluster centers
1. Assign observations to
closest cluster center
2. Revise cluster centers as
mean of assigned
observations
3. Repeat 1.+2. until
convergence
µ1, µ2, . . . , µk

16
k-means algorithm
2. Revise cluster centers as
mean of assigned
observations
convergence
zi arg min
j
||µj xi||2
2
Inferred label for obs i, whereas
supervised learning has given label yi

17
k-means algorithm
2. Revise cluster centers
as mean of assigned
observations
convergence
µj =
1
nj
X
i:zi=j
xi

18
k-means algorithm
2. Revise cluster centers
as mean of assigned
observations
convergence

20
Why does K-means work???
• What’s k-means optimizing?
• Does it always converge?

21
What is k-means optimizing?
• Potential function F(µ,z) of centers µ and point
allocations z:
• Optimal k-means:

22
Does K-means converge??? Part 1
• Optimize potential function:
min
!
min
𝒛
𝐹(𝜇, 𝒛) = min
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)
• Fix µ and minimize z:

23
Does K-means converge??? Part 2
• Optimize potential function:
min
!
min
𝒛
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)
• Fix z and minimize µ:

24
Coordinate descent algorithms
• Want: mina minb F(a,b)
• Coordinate descent:
- fix a, minimize b
- fix b, minimize a
- repeat
• Converges!!!
- if F is bounded
- to a (often good) local optimum
• as we saw in applet (play with it!)
- (For LASSO it converged to the global
optimum, because of convexity)
• K-means is a coordinate descent algorithm!
min
!
min
𝒛
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)

Summary for k-means

56
Clustering images
• For search, group as:
- Ocean
- Pink flower
- Dog
- Sunset
- Clouds
- …

Limitations of k-means
Assign observations to closest cluster center
Revise cluster centers as mean of assigned
observatvergence
zi arg min
j
||µj xi||2
2
Can use weighted Euclidean,
but requires known weights
Equivalent to assuming
spherically symmetric clusters
Still assumes all clusters have
the same axis-aligned ellipses
Only center matters

Failure modes of k-means
disparate cluster sizes overlapping clusters different
shaped/oriented
clusters

59
What you can do now…
• Describe the input (unlabeled observations) and output (labels)
of a clustering algorithm
• Determine whether a task is supervised or unsupervised
• Cluster documents using k-means
• Describe potential applications of clustering

K-means slides, K-means annotated, GMM slides, GMM annotated.pdf

More Related Content

Recently uploaded (20)

Featured (20)

K-means slides, K-means annotated, GMM slides, GMM annotated.pdf