SlideShare a Scribd company logo
Eric Xing © Eric Xing @ CMU, 2006-2010 1
Machine Learning
Spectral Clustering
Eric Xing
Lecture 8, August 13, 2010
Reading:
Eric Xing © Eric Xing @ CMU, 2006-2010 2
Data Clustering
Eric Xing © Eric Xing @ CMU, 2006-2010 3
Eric Xing © Eric Xing @ CMU, 2006-2010 4
Data Clustering
Compactness Connectivity
 Two different criteria
 Compactness, e.g., k-means, mixture models
 Connectivity, e.g., spectral clustering
Eric Xing © Eric Xing @ CMU, 2006-2010 5
Spectral Clustering
Data Similarities
Eric Xing © Eric Xing @ CMU, 2006-2010 6
 Some graph terminology
 Objects (e.g., pixels, data points)
i∈ I = vertices of graph G
 Edges (ij) = pixel pairs with Wij > 0
 Similarity matrix W = [ Wij ]
 Degree
di = Σj∈G Wij
dA = Σi∈A di degree of A G
 Assoc(A,B) = Σi∈A Σj∈B Wij
⊆
Wij
i
j
i
A
A
B
Weighted Graph Partitioning
Eric Xing © Eric Xing @ CMU, 2006-2010 7
 (edge) cut = set of edges whose removal makes a graph
disconnected
 weight of a cut:
cut( A, B ) = Σi∈A Σj∈B Wij=Assoc(A,B)
 Normalized Cut criteria: minimum cut(A,Ā)
More generally:
Cuts in a Graph
AA d
AA
d
AA
AA
),(cut),(cut
),Ncut( +=
∑∑
∑
∑
== ∈∈
∈∈








=








=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,

Eric Xing © Eric Xing @ CMU, 2006-2010 8
Graph-based Clustering
 Data Grouping
 Image sigmentation
 Affinity matrix:
 Degree matrix:
 Laplacian matrix:
 (bipartite) partition vector:
ijW
G = {V,E}
Wij
i
j
)),(( jiij xxdfW =
][ , jiwW =
)(diag idD =
WDL −=
],,,[
][
111111
1
−−−=
=
,,
,...,xxx N
Eric Xing © Eric Xing @ CMU, 2006-2010 9
Affinity Function
 Affinities grow as σ grows 
 How the choice of σ value affects the results?
 What would be the optimal choice for σ?
2
2
2
σ
ji XX
ji eW
−−
=,
Eric Xing © Eric Xing @ CMU, 2006-2010 10
Clustering via Optimizing
Normalized Cut
 The normalized cut:
 Computing an optimal normalized cut over all possible y (i.e.,
partition) is NP hard
 Transform Ncut equation to a matrix form (Shi & Malik 2000):
 Still an NP hard problem
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotientn
by },{ −∈ 1
...
),(
),(
;
11)1(
)1)(()1(
11
)1)(()1(
)Bdeg(
B)A,(
)Adeg(
B)A,(
B)A,(
0
=
=
−
−−−
+
+−+
=
+=
∑
∑ >
i
x
T
T
T
T
iiD
iiD
k
Dk
xWDx
Dk
xWDx
cutcut
Ncut
i
Eric Xing © Eric Xing @ CMU, 2006-2010 11
 Instead, relax into the continuous domain by solving generalized eigenvalue
system:
 Which gives:
 Note that so, the first eigenvector is y0=1 with eigenvalue 0.
 The second smallest eigenvector is the real valued solution to this problem!!
Relaxation
DyyWD λ=− )(
01 =− )( WD
Dyy
yWDy
xNcut T
T
yx
)(
min)(min
−
=
01 =DyT
Subject to:
Rayleigh quotient
n
by },{ −∈ 1
1=− DyyyWDy TT
y s.t.,)(min
Rayleigh quotient theorem
Eric Xing © Eric Xing @ CMU, 2006-2010 12
Algorithm
1. Define a similarity function between 2 nodes. i.e.:
2. Compute affinity matrix (W) and degree matrix (D).
3. Solve
 Do singular value decomposition (SVD) of the graph Laplacian
4. Use the eigenvector with the second smallest eigenvalue, , to
bipartition the graph.
 For each threshold k, Ak={i | yi among k largest element of y*}
Bk={i | yi among n-k smallest element of y*}
 Compute Ncut(Ak,Bk)
 Output
DyyWD λ=− )(
2
2
2
X
ji XX
ji ew σ
)()(
,
−−
=
WDL −=
VVL T
Λ= ⇒
*
y
),(Ncutmaxarg*
kk BAk = ** ,and kk
BA
Eric Xing © Eric Xing @ CMU, 2006-2010 13
.},,{with,
)(
),( 011 =−∈
−
= Dyby
Dyy
ySDy
BANcut T
iT
T
y2
i
i
A
y2
i
i
A
DyySD λ=− )(
Ideally …
Eric Xing © Eric Xing @ CMU, 2006-2010 14
Example (Xing et al, 2001)
Eric Xing © Eric Xing @ CMU, 2006-2010 15
Poor features can lead to poor
outcome (Xing et al 2001)
Eric Xing © Eric Xing @ CMU, 2006-2010 16
Cluster vs. Block matrix
BA d
BAcut
d
BAcut
BANcut
),(),(
),( +=
∑ ∈∈
= VjAi jiWADegree , ,)(
A
B
A
B
BA d
BA
d
BA
BA
),(cut),(cut
),Ncut( +=
Eric Xing © Eric Xing @ CMU, 2006-2010 17
 Criterion for partition:
Compare to Minimum cut
∑∈∈
=
BjAi
ji
BA
WBAcut
,
,
,
min),(min
First proposed by Wu and Leahy
A
B
Ideal Cut
Cuts with
lesser weight
than the
ideal cut
Problem!
Weight of cut is directly proportional
to the number of edges in the cut.
Eric Xing © Eric Xing @ CMU, 2006-2010 18
Superior Performance?
 K-means and Gaussian mixture methods are biased toward
convex clusters
Eric Xing © Eric Xing @ CMU, 2006-2010 19
Ncut is superior in certain cases
Eric Xing © Eric Xing @ CMU, 2006-2010 20
Why?
Eric Xing © Eric Xing @ CMU, 2006-2010 21
General Spectral Clustering
Data Similarities
Eric Xing © Eric Xing @ CMU, 2006-2010 22
Representation
[ ]KXXX ,...,1=
∑= j jiwiiD ,),(
WDL −=
segments
pixels
),(),( jiaffjiW =
 Partition matrix X:
 Pair-wise similarity matrix W:
 Degree matrix D:
 Laplacian matrix L:
Eric Xing © Eric Xing @ CMU, 2006-2010 23
 Given a set of points S={s1,…sn}
 Form the affinity matrix
 Define diagonal matrix Dii= Σκ aik
 Form the matrix
 Stack the k largest eigenvectors of L to for the columns of the new
matrix X:
 Renormalize each of X’s rows to have unit length and get new
matrix Y. Cluster rows of Y as points in R k
A Spectral Clustering Algorithm
Ng, Jordan, and Weiss 2003
0
2
2
2
=≠∀=
−−
ii
SS
ji wjiew
ji
,, ,,
σ
2121 // −−
= WDDL










=
|
|
|
|
|
|
kxxxX 21
Eric Xing © Eric Xing @ CMU, 2006-2010 24
SC vs Kmeans
Eric Xing © Eric Xing @ CMU, 2006-2010 25
Why it works?
 K-means in the spectrum space !
Eric Xing © Eric Xing @ CMU, 2006-2010 26
Eigenvectors and blocks
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
λ1= 2 λ2= 2 λ3= 0
λ4= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
λ1= 2.02 λ2= 2.02 λ3= -0.02
λ4= -0.02
 Block matrices have block eigenvectors:
 Near-block matrices have near-block eigenvectors:
Eric Xing © Eric Xing @ CMU, 2006-2010 27
Spectral Space
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e1
e2
e1 e2
1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e1
e2
e1 e2
 Can put items into blocks by eigenvectors:
 Clusters clear regardless of row ordering:
Eric Xing © Eric Xing @ CMU, 2006-2010 28
More formally …
 Recall generalized Ncut
 Minimizing this is equivalent to spectral clustering
∑∑
∑
∑
== ∈∈
∈∈








=








=
k
r A
rr
k
r VjAi ij
AVjAi ij
k
rr
rr
d
AA
W
W
AAA
11
21
),(cut
),Ncut(
,
,

∑=








=
k
r A
rr
k
r
d
AA
AAA
1
21
),(cut
),Ncut(min 
YWDDY 2121 //T
min −−
IYY =T
s.t.
segments
pixels
Y
Eric Xing © Eric Xing @ CMU, 2006-2010 29
Spectral Clustering
 Algorithms that cluster points using eigenvectors of matrices
derived from the data
 Obtain data representation in the low-dimensional space that
can be easily clustered
 Variety of methods that use the eigenvectors differently (we
have seen an example)
 Empirically very successful
 Authors disagree:
 Which eigenvectors to use
 How to derive clusters from these eigenvectors
 Two general methods
Eric Xing © Eric Xing @ CMU, 2006-2010 30
Method #1
 Partition using only one eigenvector at a time
 Use procedure recursively
 Example: Image Segmentation
 Uses 2nd (smallest) eigenvector to define optimal cut
 Recursively generates two clusters with each cut
Eric Xing © Eric Xing @ CMU, 2006-2010 31
Method #2
 Use k eigenvectors (k chosen by user)
 Directly compute k-way partitioning
 Experimentally has been seen to be “better”
Eric Xing © Eric Xing @ CMU, 2006-2010 32
Toy examples
Images from Matthew Brand (TR-2002-42)
Eric Xing © Eric Xing @ CMU, 2006-2010 33
User’s Prerogative
 Choice of k, the number of clusters
 Choice of scaling factor
 Realistically, search over and pick value that gives the tightest clusters
 Choice of clustering method: k-way or recursive bipartite
 Kernel affinity matrix
2
σ
),(, jiji SSKw =
Eric Xing © Eric Xing @ CMU, 2006-2010 34
Conclusions
 Good news:
 Simple and powerful methods to segment images.
 Flexible and easy to apply to other clustering problems.
 Bad news:
 High memory requirements (use sparse matrices).
 Very dependant on the scale factor for a specific problem.
2
2
2)()(
),( X
ji XX
ejiW σ
−−
=

More Related Content

PDF
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
PDF
Flexural analysis of thick beams using single
PDF
2013추계학술대회 인쇄용
PDF
GREY LEVEL CO-OCCURRENCE MATRICES: GENERALISATION AND SOME NEW FEATURES
PDF
Analyzing Soft Cut-off in Twitter
PDF
Universal Prediction without assuming either Discrete or Continuous
PDF
QMC Error SAMSI Tutorial Aug 2017
PDF
An algorithm for generating new mandelbrot and julia sets
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
Flexural analysis of thick beams using single
2013추계학술대회 인쇄용
GREY LEVEL CO-OCCURRENCE MATRICES: GENERALISATION AND SOME NEW FEATURES
Analyzing Soft Cut-off in Twitter
Universal Prediction without assuming either Discrete or Continuous
QMC Error SAMSI Tutorial Aug 2017
An algorithm for generating new mandelbrot and julia sets

What's hot (20)

PDF
A non-stiff boundary integral method for internal waves
PDF
Common fixed point and weak commuting mappings
PDF
Applied mathematics for CBE assignment
PDF
Lec11: Active Contour and Level Set for Medical Image Segmentation
PDF
PREDICTION OF COMPRESSIVE STRENGTH OF CONCRETE FROM EARLY AGE TEST RESULT
PPT
PPTX
Pixelrelationships
PDF
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
PDF
Image sampling and quantization
PDF
Predicting 28 Days Compressive Strength of Concrete from 7 Days Test Result
PDF
Lecture 19: Implementation of Histogram Image Operation
PDF
bayesImageS: an R package for Bayesian image analysis
PDF
An approximate solution for plates resting on winkler foundation
PDF
R05410102 finiteelementmethodsincivilengineering
PDF
Stability analysis of orthotropic reinforce concrete shear wall
PDF
Stability analysis of orthotropic reinforce concrete shear wall
PDF
Math behind the kernels
PDF
Blur Filter - Hanpo
PDF
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
A non-stiff boundary integral method for internal waves
Common fixed point and weak commuting mappings
Applied mathematics for CBE assignment
Lec11: Active Contour and Level Set for Medical Image Segmentation
PREDICTION OF COMPRESSIVE STRENGTH OF CONCRETE FROM EARLY AGE TEST RESULT
Pixelrelationships
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
Image sampling and quantization
Predicting 28 Days Compressive Strength of Concrete from 7 Days Test Result
Lecture 19: Implementation of Histogram Image Operation
bayesImageS: an R package for Bayesian image analysis
An approximate solution for plates resting on winkler foundation
R05410102 finiteelementmethodsincivilengineering
Stability analysis of orthotropic reinforce concrete shear wall
Stability analysis of orthotropic reinforce concrete shear wall
Math behind the kernels
Blur Filter - Hanpo
Think Locally, Act Gobally - Improving Defect and Effort Prediction Models
Ad

Viewers also liked (11)

PDF
Notes on Spectral Clustering
PDF
icml2004 tutorial on spectral clustering part II
PPTX
Spectral clustering Tutorial
PDF
Spectral Clustering Report
PPTX
Presentation PRIDE Cluster HUPO 2014
PPTX
Tensor Spectral Clustering
PPTX
Spectral graph theory
PPTX
Large Scale Data Clustering: an overview
PDF
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
PPTX
Community detection
PPTX
Spectral clustering - Houston ML Meetup
Notes on Spectral Clustering
icml2004 tutorial on spectral clustering part II
Spectral clustering Tutorial
Spectral Clustering Report
Presentation PRIDE Cluster HUPO 2014
Tensor Spectral Clustering
Spectral graph theory
Large Scale Data Clustering: an overview
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Community detection
Spectral clustering - Houston ML Meetup
Ad

Similar to Lecture8 xing (20)

PDF
Effect of Block Sizes on the Attributes of Watermarking Digital Images
PPTX
An Efficient Convex Hull Algorithm for a Planer Set of Points
PDF
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
PDF
Edge linking hough transform
PPTX
[Vldb 2013] skyline operator on anti correlated distributions
PDF
Free FE practice problems
PPTX
lecture 1.pptx
PDF
Lecture4 xing
PDF
Steven Duplij, Raimund Vogl, "Polyadic Braid Operators and Higher Braiding Ga...
PDF
Mm chap08 -_lossy_compression_algorithms
PDF
Information Matrices and Optimality Values for various Block Designs
PPTX
Peridynamic simulation of delamination propagation in fiber-reinforced composite
PDF
IRJET- 5th Order Shear Deformation Theory for Fixed Deep Beam
PDF
A Hybrid Technique for the Automated Segmentation of Corpus Callosum in Midsa...
PPTX
AI2SD 2020 Présentation_3D mesh Segmentation .pptx
PPTX
AI2SD 2020 Présentation_3D mesh segmentation.pptx
PDF
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
PDF
WITMSE 2013
PDF
Lecture7 xing fei-fei
PDF
Simulation and Experimental Studies on Composite Beams
Effect of Block Sizes on the Attributes of Watermarking Digital Images
An Efficient Convex Hull Algorithm for a Planer Set of Points
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
Edge linking hough transform
[Vldb 2013] skyline operator on anti correlated distributions
Free FE practice problems
lecture 1.pptx
Lecture4 xing
Steven Duplij, Raimund Vogl, "Polyadic Braid Operators and Higher Braiding Ga...
Mm chap08 -_lossy_compression_algorithms
Information Matrices and Optimality Values for various Block Designs
Peridynamic simulation of delamination propagation in fiber-reinforced composite
IRJET- 5th Order Shear Deformation Theory for Fixed Deep Beam
A Hybrid Technique for the Automated Segmentation of Corpus Callosum in Midsa...
AI2SD 2020 Présentation_3D mesh Segmentation .pptx
AI2SD 2020 Présentation_3D mesh segmentation.pptx
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
WITMSE 2013
Lecture7 xing fei-fei
Simulation and Experimental Studies on Composite Beams

More from Tianlu Wang (20)

PDF
L7 er2
PDF
L8 design1
PDF
L9 design2
PDF
14 pro resolution
PDF
13 propositional calculus
PDF
12 adversal search
PDF
11 alternative search
PDF
10 2 sum
PDF
22 planning
PDF
21 situation calculus
PDF
20 bayes learning
PDF
19 uncertain evidence
PDF
18 common knowledge
PDF
17 2 expert systems
PDF
17 1 knowledge-based system
PDF
16 2 predicate resolution
PDF
16 1 predicate resolution
PDF
15 predicate
PDF
09 heuristic search
PDF
08 uninformed search
L7 er2
L8 design1
L9 design2
14 pro resolution
13 propositional calculus
12 adversal search
11 alternative search
10 2 sum
22 planning
21 situation calculus
20 bayes learning
19 uncertain evidence
18 common knowledge
17 2 expert systems
17 1 knowledge-based system
16 2 predicate resolution
16 1 predicate resolution
15 predicate
09 heuristic search
08 uninformed search

Recently uploaded (20)

PDF
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
PPTX
CPRC-SOCIAL-STUDIES-FINAL-COACHING-DAY-1.pptx
PPTX
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
PPTX
Technical-Codes-presentation-G-12Student
PPTX
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
PPTX
White Green Simple and Professional Business Pitch Deck Presentation.pptx
PDF
; Projeto Rixa Antiga.pdf
PPTX
Green and Blue Illustrative Earth Day Presentation.pptx
PPTX
vsfbvefbegbefvsegbthnmthndgbdfvbrsjmrysnedgbdzndhzmsr
PPTX
65bc3704-6ed1-4724-977d-a70f145d40da.pptx
PPTX
Theatre Studies - Powerpoint Entertainmn
PPTX
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
PPTX
Brown and Beige Vintage Scrapbook Idea Board Presentation.pptx.pptx
PDF
Ricardo Salinas Pliego Accused of Acting as A Narcotics Kingpin
PDF
the saint and devil who dominated the outcasts
PPTX
Art Appreciation-Lesson-1-1.pptx College
PDF
Close Enough S3 E7 "Bridgette the Brain"
PPTX
A slideshow about aesthetic value in arts
PPTX
Physical Education and Health Q4-CO4-TARPAPEL
PPSX
Multiple scenes in a single painting.ppsx
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
CPRC-SOCIAL-STUDIES-FINAL-COACHING-DAY-1.pptx
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
Technical-Codes-presentation-G-12Student
G10 HOMEROOM PARENT-TEACHER ASSOCIATION MEETING SATURDAY.pptx
White Green Simple and Professional Business Pitch Deck Presentation.pptx
; Projeto Rixa Antiga.pdf
Green and Blue Illustrative Earth Day Presentation.pptx
vsfbvefbegbefvsegbthnmthndgbdfvbrsjmrysnedgbdzndhzmsr
65bc3704-6ed1-4724-977d-a70f145d40da.pptx
Theatre Studies - Powerpoint Entertainmn
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
Brown and Beige Vintage Scrapbook Idea Board Presentation.pptx.pptx
Ricardo Salinas Pliego Accused of Acting as A Narcotics Kingpin
the saint and devil who dominated the outcasts
Art Appreciation-Lesson-1-1.pptx College
Close Enough S3 E7 "Bridgette the Brain"
A slideshow about aesthetic value in arts
Physical Education and Health Q4-CO4-TARPAPEL
Multiple scenes in a single painting.ppsx

Lecture8 xing

  • 1. Eric Xing © Eric Xing @ CMU, 2006-2010 1 Machine Learning Spectral Clustering Eric Xing Lecture 8, August 13, 2010 Reading:
  • 2. Eric Xing © Eric Xing @ CMU, 2006-2010 2 Data Clustering
  • 3. Eric Xing © Eric Xing @ CMU, 2006-2010 3
  • 4. Eric Xing © Eric Xing @ CMU, 2006-2010 4 Data Clustering Compactness Connectivity  Two different criteria  Compactness, e.g., k-means, mixture models  Connectivity, e.g., spectral clustering
  • 5. Eric Xing © Eric Xing @ CMU, 2006-2010 5 Spectral Clustering Data Similarities
  • 6. Eric Xing © Eric Xing @ CMU, 2006-2010 6  Some graph terminology  Objects (e.g., pixels, data points) i∈ I = vertices of graph G  Edges (ij) = pixel pairs with Wij > 0  Similarity matrix W = [ Wij ]  Degree di = Σj∈G Wij dA = Σi∈A di degree of A G  Assoc(A,B) = Σi∈A Σj∈B Wij ⊆ Wij i j i A A B Weighted Graph Partitioning
  • 7. Eric Xing © Eric Xing @ CMU, 2006-2010 7  (edge) cut = set of edges whose removal makes a graph disconnected  weight of a cut: cut( A, B ) = Σi∈A Σj∈B Wij=Assoc(A,B)  Normalized Cut criteria: minimum cut(A,Ā) More generally: Cuts in a Graph AA d AA d AA AA ),(cut),(cut ),Ncut( += ∑∑ ∑ ∑ == ∈∈ ∈∈         =         = k r A rr k r VjAi ij AVjAi ij k rr rr d AA W W AAA 11 21 ),(cut ),Ncut( , , 
  • 8. Eric Xing © Eric Xing @ CMU, 2006-2010 8 Graph-based Clustering  Data Grouping  Image sigmentation  Affinity matrix:  Degree matrix:  Laplacian matrix:  (bipartite) partition vector: ijW G = {V,E} Wij i j )),(( jiij xxdfW = ][ , jiwW = )(diag idD = WDL −= ],,,[ ][ 111111 1 −−−= = ,, ,...,xxx N
  • 9. Eric Xing © Eric Xing @ CMU, 2006-2010 9 Affinity Function  Affinities grow as σ grows   How the choice of σ value affects the results?  What would be the optimal choice for σ? 2 2 2 σ ji XX ji eW −− =,
  • 10. Eric Xing © Eric Xing @ CMU, 2006-2010 10 Clustering via Optimizing Normalized Cut  The normalized cut:  Computing an optimal normalized cut over all possible y (i.e., partition) is NP hard  Transform Ncut equation to a matrix form (Shi & Malik 2000):  Still an NP hard problem BA d BA d BA BA ),(cut),(cut ),Ncut( += Dyy yWDy xNcut T T yx )( min)(min − = 01 =DyT Subject to: Rayleigh quotientn by },{ −∈ 1 ... ),( ),( ; 11)1( )1)(()1( 11 )1)(()1( )Bdeg( B)A,( )Adeg( B)A,( B)A,( 0 = = − −−− + +−+ = += ∑ ∑ > i x T T T T iiD iiD k Dk xWDx Dk xWDx cutcut Ncut i
  • 11. Eric Xing © Eric Xing @ CMU, 2006-2010 11  Instead, relax into the continuous domain by solving generalized eigenvalue system:  Which gives:  Note that so, the first eigenvector is y0=1 with eigenvalue 0.  The second smallest eigenvector is the real valued solution to this problem!! Relaxation DyyWD λ=− )( 01 =− )( WD Dyy yWDy xNcut T T yx )( min)(min − = 01 =DyT Subject to: Rayleigh quotient n by },{ −∈ 1 1=− DyyyWDy TT y s.t.,)(min Rayleigh quotient theorem
  • 12. Eric Xing © Eric Xing @ CMU, 2006-2010 12 Algorithm 1. Define a similarity function between 2 nodes. i.e.: 2. Compute affinity matrix (W) and degree matrix (D). 3. Solve  Do singular value decomposition (SVD) of the graph Laplacian 4. Use the eigenvector with the second smallest eigenvalue, , to bipartition the graph.  For each threshold k, Ak={i | yi among k largest element of y*} Bk={i | yi among n-k smallest element of y*}  Compute Ncut(Ak,Bk)  Output DyyWD λ=− )( 2 2 2 X ji XX ji ew σ )()( , −− = WDL −= VVL T Λ= ⇒ * y ),(Ncutmaxarg* kk BAk = ** ,and kk BA
  • 13. Eric Xing © Eric Xing @ CMU, 2006-2010 13 .},,{with, )( ),( 011 =−∈ − = Dyby Dyy ySDy BANcut T iT T y2 i i A y2 i i A DyySD λ=− )( Ideally …
  • 14. Eric Xing © Eric Xing @ CMU, 2006-2010 14 Example (Xing et al, 2001)
  • 15. Eric Xing © Eric Xing @ CMU, 2006-2010 15 Poor features can lead to poor outcome (Xing et al 2001)
  • 16. Eric Xing © Eric Xing @ CMU, 2006-2010 16 Cluster vs. Block matrix BA d BAcut d BAcut BANcut ),(),( ),( += ∑ ∈∈ = VjAi jiWADegree , ,)( A B A B BA d BA d BA BA ),(cut),(cut ),Ncut( +=
  • 17. Eric Xing © Eric Xing @ CMU, 2006-2010 17  Criterion for partition: Compare to Minimum cut ∑∈∈ = BjAi ji BA WBAcut , , , min),(min First proposed by Wu and Leahy A B Ideal Cut Cuts with lesser weight than the ideal cut Problem! Weight of cut is directly proportional to the number of edges in the cut.
  • 18. Eric Xing © Eric Xing @ CMU, 2006-2010 18 Superior Performance?  K-means and Gaussian mixture methods are biased toward convex clusters
  • 19. Eric Xing © Eric Xing @ CMU, 2006-2010 19 Ncut is superior in certain cases
  • 20. Eric Xing © Eric Xing @ CMU, 2006-2010 20 Why?
  • 21. Eric Xing © Eric Xing @ CMU, 2006-2010 21 General Spectral Clustering Data Similarities
  • 22. Eric Xing © Eric Xing @ CMU, 2006-2010 22 Representation [ ]KXXX ,...,1= ∑= j jiwiiD ,),( WDL −= segments pixels ),(),( jiaffjiW =  Partition matrix X:  Pair-wise similarity matrix W:  Degree matrix D:  Laplacian matrix L:
  • 23. Eric Xing © Eric Xing @ CMU, 2006-2010 23  Given a set of points S={s1,…sn}  Form the affinity matrix  Define diagonal matrix Dii= Σκ aik  Form the matrix  Stack the k largest eigenvectors of L to for the columns of the new matrix X:  Renormalize each of X’s rows to have unit length and get new matrix Y. Cluster rows of Y as points in R k A Spectral Clustering Algorithm Ng, Jordan, and Weiss 2003 0 2 2 2 =≠∀= −− ii SS ji wjiew ji ,, ,, σ 2121 // −− = WDDL           = | | | | | | kxxxX 21
  • 24. Eric Xing © Eric Xing @ CMU, 2006-2010 24 SC vs Kmeans
  • 25. Eric Xing © Eric Xing @ CMU, 2006-2010 25 Why it works?  K-means in the spectrum space !
  • 26. Eric Xing © Eric Xing @ CMU, 2006-2010 26 Eigenvectors and blocks 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 eigensolver .71 .71 0 0 0 0 .71 .71 λ1= 2 λ2= 2 λ3= 0 λ4= 0 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 eigensolver .71 .69 .14 0 0 -.14 .69 .71 λ1= 2.02 λ2= 2.02 λ3= -0.02 λ4= -0.02  Block matrices have block eigenvectors:  Near-block matrices have near-block eigenvectors:
  • 27. Eric Xing © Eric Xing @ CMU, 2006-2010 27 Spectral Space 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 .71 .69 .14 0 0 -.14 .69 .71 e1 e2 e1 e2 1 .2 1 0 .2 1 0 1 1 0 1 -.2 0 1 -.2 1 .71 .14 .69 0 0 .69 -.14 .71 e1 e2 e1 e2  Can put items into blocks by eigenvectors:  Clusters clear regardless of row ordering:
  • 28. Eric Xing © Eric Xing @ CMU, 2006-2010 28 More formally …  Recall generalized Ncut  Minimizing this is equivalent to spectral clustering ∑∑ ∑ ∑ == ∈∈ ∈∈         =         = k r A rr k r VjAi ij AVjAi ij k rr rr d AA W W AAA 11 21 ),(cut ),Ncut( , ,  ∑=         = k r A rr k r d AA AAA 1 21 ),(cut ),Ncut(min  YWDDY 2121 //T min −− IYY =T s.t. segments pixels Y
  • 29. Eric Xing © Eric Xing @ CMU, 2006-2010 29 Spectral Clustering  Algorithms that cluster points using eigenvectors of matrices derived from the data  Obtain data representation in the low-dimensional space that can be easily clustered  Variety of methods that use the eigenvectors differently (we have seen an example)  Empirically very successful  Authors disagree:  Which eigenvectors to use  How to derive clusters from these eigenvectors  Two general methods
  • 30. Eric Xing © Eric Xing @ CMU, 2006-2010 30 Method #1  Partition using only one eigenvector at a time  Use procedure recursively  Example: Image Segmentation  Uses 2nd (smallest) eigenvector to define optimal cut  Recursively generates two clusters with each cut
  • 31. Eric Xing © Eric Xing @ CMU, 2006-2010 31 Method #2  Use k eigenvectors (k chosen by user)  Directly compute k-way partitioning  Experimentally has been seen to be “better”
  • 32. Eric Xing © Eric Xing @ CMU, 2006-2010 32 Toy examples Images from Matthew Brand (TR-2002-42)
  • 33. Eric Xing © Eric Xing @ CMU, 2006-2010 33 User’s Prerogative  Choice of k, the number of clusters  Choice of scaling factor  Realistically, search over and pick value that gives the tightest clusters  Choice of clustering method: k-way or recursive bipartite  Kernel affinity matrix 2 σ ),(, jiji SSKw =
  • 34. Eric Xing © Eric Xing @ CMU, 2006-2010 34 Conclusions  Good news:  Simple and powerful methods to segment images.  Flexible and easy to apply to other clustering problems.  Bad news:  High memory requirements (use sparse matrices).  Very dependant on the scale factor for a specific problem. 2 2 2)()( ),( X ji XX ejiW σ −− =