SlideShare a Scribd company logo
CS229: Machine Learning
Clustering:
Grouping
Related Docs
©2022 Carlos Guestrin
CS229: Machine Learning
Carlos Guestrin
Stanford University
Slides include content developed by and co-developed with
Emily Fox
CS229: Machine Learning
Motivating clustering approaches
©2022 Carlos Guestrin
CS229: Machine Learning
3
Goal: Structure documents by topic
Discover groups (clusters) of related articles
©2022 Carlos Guestrin
SPORTS WORLD NEWS
CS229: Machine Learning
4
Why might clustering be useful?
©2022 Carlos Guestrin
I don’t just
like sports!
0
0.1
0.2
0.3
0.4
0.5
0.6
Sports
W
orld
N
ew
s
Entertainm
ent
Science
CS229: Machine Learning
Learn user preferences
©2022 Carlos Guestrin
Cluster 1
Cluster 3 Cluster 4
Cluster 2
Use feedback
to learn user
preferences
over topics
Set of clustered documents read by user
CS229: Machine Learning
Clustering: An unsupervised learning task
©2022 Carlos Guestrin
CS229: Machine Learning
What if some of the labels are known?
Training set of labeled docs
©2022 Carlos Guestrin
SPORTS WORLD NEWS
ENTERTAINMENT SCIENCE
CS229: Machine Learning
8
Clustering
No labels provided
…uncover cluster structure
from input alone
Input: docs as vectors xi
Output: cluster labels zi
©2022 Carlos Guestrin
An unsupervised
learning task
CS229: Machine Learning
9
What defines a cluster?
Assign observation xi (doc)
to cluster k (topic label) if
- Score under cluster k is
higher than under others
- For simplicity, often define
score as distance to cluster
center (ignoring shape)
©2022 Carlos Guestrin
Cluster defined by center & shape/spread
CS229: Machine Learning
10
Hope for unsupervised learning
Easy
Impossible
In between
©2022 Carlos Guestrin
CS229: Machine Learning
11
Other (challenging!) clusters to discover…
©2022 Carlos Guestrin
CS229: Machine Learning
12
Other (challenging!) clusters to discover…
©2022 Carlos Guestrin
CS229: Machine Learning
k-means: A clustering algorithm
©2022 Carlos Guestrin
CS229: Machine Learning
14
k-means
Assume
-Score= distance to
cluster center
(smaller better)
©2022 Carlos Guestrin
DATA
to
CLUSTER
CS229: Machine Learning
15
k-means algorithm
0. Initialize cluster centers
1. Assign observations to
closest cluster center
2. Revise cluster centers as
mean of assigned
observations
3. Repeat 1.+2. until
convergence
©2022 Carlos Guestrin
µ1, µ2, . . . , µk
CS229: Machine Learning
16
k-means algorithm
0. Initialize cluster centers
1. Assign observations to
closest cluster center
2. Revise cluster centers as
mean of assigned
observations
3. Repeat 1.+2. until
convergence
©2022 Carlos Guestrin
zi arg min
j
||µj xi||2
2
Inferred label for obs i, whereas
supervised learning has given label yi
CS229: Machine Learning
17
k-means algorithm
0. Initialize cluster centers
1. Assign observations to
closest cluster center
2. Revise cluster centers
as mean of assigned
observations
3. Repeat 1.+2. until
convergence
©2022 Carlos Guestrin
µj =
1
nj
X
i:zi=j
xi
CS229: Machine Learning
18
k-means algorithm
0. Initialize cluster centers
1. Assign observations to
closest cluster center
2. Revise cluster centers
as mean of assigned
observations
3. Repeat 1.+2. until
convergence
©2022 Carlos Guestrin
CS229: Machine Learning
20
Why does K-means work???
• What’s k-means optimizing?
• Does it always converge?
©2022 Carlos Guestrin
CS229: Machine Learning
21
What is k-means optimizing?
• Potential function F(µ,z) of centers µ and point
allocations z:
• Optimal k-means:
©2022 Carlos Guestrin
CS229: Machine Learning
22
Does K-means converge??? Part 1
• Optimize potential function:
min
!
min
𝒛
𝐹(𝜇, 𝒛) = min
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)
• Fix µ and minimize z:
©2022 Carlos Guestrin
CS229: Machine Learning
23
Does K-means converge??? Part 2
• Optimize potential function:
min
!
min
𝒛
𝐹(𝜇, 𝒛) = min
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)
• Fix z and minimize µ:
©2022 Carlos Guestrin
CS229: Machine Learning
24
Coordinate descent algorithms
• Want: mina minb F(a,b)
• Coordinate descent:
- fix a, minimize b
- fix b, minimize a
- repeat
• Converges!!!
- if F is bounded
- to a (often good) local optimum
• as we saw in applet (play with it!)
- (For LASSO it converged to the global
optimum, because of convexity)
• K-means is a coordinate descent algorithm!
©2022 Carlos Guestrin
min
!
min
𝒛
𝐹(𝜇, 𝒛) = min
!
min
𝒛
+
#$%
&
𝜇'!
− 𝑥( )
)
CS229: Machine Learning
Summary for k-means
©2022 Carlos Guestrin
CS229: Machine Learning
56
Clustering images
• For search, group as:
- Ocean
- Pink flower
- Dog
- Sunset
- Clouds
- …
©2022 Carlos Guestrin
CS229: Machine Learning
Limitations of k-means
©2022 Carlos Guestrin
Assign observations to closest cluster center
Revise cluster centers as mean of assigned
observatvergence
zi arg min
j
||µj xi||2
2
Can use weighted Euclidean,
but requires known weights
Equivalent to assuming
spherically symmetric clusters
Still assumes all clusters have
the same axis-aligned ellipses
Only center matters
CS229: Machine Learning
Failure modes of k-means
©2022 Carlos Guestrin
disparate cluster sizes overlapping clusters different
shaped/oriented
clusters
CS229: Machine Learning
59
What you can do now…
• Describe the input (unlabeled observations) and output (labels)
of a clustering algorithm
• Determine whether a task is supervised or unsupervised
• Cluster documents using k-means
• Describe potential applications of clustering
©2022 Carlos Guestrin

More Related Content

PDF
Unit-10 Graphs .pdf
PDF
Unit-9 Searching .pdf
PDF
3. List .pdf
PDF
4. Linked list .pdf
PPTX
MIC3_The Intel 8086 .pptx
PDF
ch14_1 RISC Processors .pdf
PDF
ch16_1 Memory System Design .pdf
PDF
PRINCIPAL COMPONENTS (PCA) AND EXPLORATORY FACTOR ANALYSIS (EFA) WITH SPSS.pdf
Unit-10 Graphs .pdf
Unit-9 Searching .pdf
3. List .pdf
4. Linked list .pdf
MIC3_The Intel 8086 .pptx
ch14_1 RISC Processors .pdf
ch16_1 Memory System Design .pdf
PRINCIPAL COMPONENTS (PCA) AND EXPLORATORY FACTOR ANALYSIS (EFA) WITH SPSS.pdf

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PDF
[EN] Industrial Machine Downtime Prediction
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
[EN] Industrial Machine Downtime Prediction
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
Miokarditis (Inflamasi pada Otot Jantung)
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
STERILIZATION AND DISINFECTION-1.ppthhhbx
1_Introduction to advance data techniques.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Data Science and Data Analysis
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Analytics and business intelligence.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Ad
Ad

K-means slides, K-means annotated, GMM slides, GMM annotated.pdf

  • 1. CS229: Machine Learning Clustering: Grouping Related Docs ©2022 Carlos Guestrin CS229: Machine Learning Carlos Guestrin Stanford University Slides include content developed by and co-developed with Emily Fox
  • 2. CS229: Machine Learning Motivating clustering approaches ©2022 Carlos Guestrin
  • 3. CS229: Machine Learning 3 Goal: Structure documents by topic Discover groups (clusters) of related articles ©2022 Carlos Guestrin SPORTS WORLD NEWS
  • 4. CS229: Machine Learning 4 Why might clustering be useful? ©2022 Carlos Guestrin I don’t just like sports! 0 0.1 0.2 0.3 0.4 0.5 0.6 Sports W orld N ew s Entertainm ent Science
  • 5. CS229: Machine Learning Learn user preferences ©2022 Carlos Guestrin Cluster 1 Cluster 3 Cluster 4 Cluster 2 Use feedback to learn user preferences over topics Set of clustered documents read by user
  • 6. CS229: Machine Learning Clustering: An unsupervised learning task ©2022 Carlos Guestrin
  • 7. CS229: Machine Learning What if some of the labels are known? Training set of labeled docs ©2022 Carlos Guestrin SPORTS WORLD NEWS ENTERTAINMENT SCIENCE
  • 8. CS229: Machine Learning 8 Clustering No labels provided …uncover cluster structure from input alone Input: docs as vectors xi Output: cluster labels zi ©2022 Carlos Guestrin An unsupervised learning task
  • 9. CS229: Machine Learning 9 What defines a cluster? Assign observation xi (doc) to cluster k (topic label) if - Score under cluster k is higher than under others - For simplicity, often define score as distance to cluster center (ignoring shape) ©2022 Carlos Guestrin Cluster defined by center & shape/spread
  • 10. CS229: Machine Learning 10 Hope for unsupervised learning Easy Impossible In between ©2022 Carlos Guestrin
  • 11. CS229: Machine Learning 11 Other (challenging!) clusters to discover… ©2022 Carlos Guestrin
  • 12. CS229: Machine Learning 12 Other (challenging!) clusters to discover… ©2022 Carlos Guestrin
  • 13. CS229: Machine Learning k-means: A clustering algorithm ©2022 Carlos Guestrin
  • 14. CS229: Machine Learning 14 k-means Assume -Score= distance to cluster center (smaller better) ©2022 Carlos Guestrin DATA to CLUSTER
  • 15. CS229: Machine Learning 15 k-means algorithm 0. Initialize cluster centers 1. Assign observations to closest cluster center 2. Revise cluster centers as mean of assigned observations 3. Repeat 1.+2. until convergence ©2022 Carlos Guestrin µ1, µ2, . . . , µk
  • 16. CS229: Machine Learning 16 k-means algorithm 0. Initialize cluster centers 1. Assign observations to closest cluster center 2. Revise cluster centers as mean of assigned observations 3. Repeat 1.+2. until convergence ©2022 Carlos Guestrin zi arg min j ||µj xi||2 2 Inferred label for obs i, whereas supervised learning has given label yi
  • 17. CS229: Machine Learning 17 k-means algorithm 0. Initialize cluster centers 1. Assign observations to closest cluster center 2. Revise cluster centers as mean of assigned observations 3. Repeat 1.+2. until convergence ©2022 Carlos Guestrin µj = 1 nj X i:zi=j xi
  • 18. CS229: Machine Learning 18 k-means algorithm 0. Initialize cluster centers 1. Assign observations to closest cluster center 2. Revise cluster centers as mean of assigned observations 3. Repeat 1.+2. until convergence ©2022 Carlos Guestrin
  • 19. CS229: Machine Learning 20 Why does K-means work??? • What’s k-means optimizing? • Does it always converge? ©2022 Carlos Guestrin
  • 20. CS229: Machine Learning 21 What is k-means optimizing? • Potential function F(µ,z) of centers µ and point allocations z: • Optimal k-means: ©2022 Carlos Guestrin
  • 21. CS229: Machine Learning 22 Does K-means converge??? Part 1 • Optimize potential function: min ! min 𝒛 𝐹(𝜇, 𝒛) = min ! min 𝒛 + #$% & 𝜇'! − 𝑥( ) ) • Fix µ and minimize z: ©2022 Carlos Guestrin
  • 22. CS229: Machine Learning 23 Does K-means converge??? Part 2 • Optimize potential function: min ! min 𝒛 𝐹(𝜇, 𝒛) = min ! min 𝒛 + #$% & 𝜇'! − 𝑥( ) ) • Fix z and minimize µ: ©2022 Carlos Guestrin
  • 23. CS229: Machine Learning 24 Coordinate descent algorithms • Want: mina minb F(a,b) • Coordinate descent: - fix a, minimize b - fix b, minimize a - repeat • Converges!!! - if F is bounded - to a (often good) local optimum • as we saw in applet (play with it!) - (For LASSO it converged to the global optimum, because of convexity) • K-means is a coordinate descent algorithm! ©2022 Carlos Guestrin min ! min 𝒛 𝐹(𝜇, 𝒛) = min ! min 𝒛 + #$% & 𝜇'! − 𝑥( ) )
  • 24. CS229: Machine Learning Summary for k-means ©2022 Carlos Guestrin
  • 25. CS229: Machine Learning 56 Clustering images • For search, group as: - Ocean - Pink flower - Dog - Sunset - Clouds - … ©2022 Carlos Guestrin
  • 26. CS229: Machine Learning Limitations of k-means ©2022 Carlos Guestrin Assign observations to closest cluster center Revise cluster centers as mean of assigned observatvergence zi arg min j ||µj xi||2 2 Can use weighted Euclidean, but requires known weights Equivalent to assuming spherically symmetric clusters Still assumes all clusters have the same axis-aligned ellipses Only center matters
  • 27. CS229: Machine Learning Failure modes of k-means ©2022 Carlos Guestrin disparate cluster sizes overlapping clusters different shaped/oriented clusters
  • 28. CS229: Machine Learning 59 What you can do now… • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm • Determine whether a task is supervised or unsupervised • Cluster documents using k-means • Describe potential applications of clustering ©2022 Carlos Guestrin