SlideShare a Scribd company logo
Spectral Clustering
Royi Itzhak
Spectral Clustering
• Algorithms that cluster points using
eigenvectors of matrices derived from the data
• Obtain data representation in the low-
dimensional space that can be easily clustered
• Variety of methods that use the eigenvectors
differently
• Difficult to understand….
Elements of Graph Theory
• A graph G = (V,E) consists of a vertex set V and an edge
set E.
• If G is a directed graph, each edge is an ordered pair of
vertices
• A bipartite graph is one in which the vertices can be
divided into two groups, so that all edges join vertices
in different groups.
0.1
0.2
0.8
0.7
0.6
0.8
0.8
0.8
E={Wij} Set of weighted edges indicating pair-wise
similarity between points
Similarity Graph
• Distance decrease similarty increase
• Represent dataset as a weighted graph G(V,E)
1 2 6
{ , ,..., }
v v v
1
2
3
4
5
6
V={xi} Set of n vertices representing data points
Similarity Graph
• Wij represent similarity between vertex
• If Wij=0 where isn’t similarity
• Wii=0
Graph Partitioning
• Clustering can be viewed as
partitioning a similarity graph
• Bi-partitioning task:
– Divide vertices into two disjoint
groups (A,B)
1
2
3
4
5
6
A B
V=A U B
Graph partition is NP hard
Clustering Objectives
• Traditional definition of a “good” clustering:
1. Points assigned to same cluster should be highly similar.
2. Points assigned to different clusters should be highly dissimilar.
Minimize weight of between-group connections
0.1
0.2
0.8
0.7
0.6
0.8
0.8
0.8
1
2
3
4
5
6
• Apply these objectives to our graph representation
Graph Cuts
• Express partitioning objectives as a function of the
“edge cut” of the partition.
• Cut: Set of edges with only one vertex in a group.we
wants to find the minimal cut beetween groups. The
groups that has the minimal cut would be the partition



B
j
A
i
ij
w
B
A
cut
,
)
,
(
0.1
0.2
0.8
0.7
0.6
0.8
0.8
1
2
3
4
5
6
0.8
A B
cut(A,B) = 0.3
Graph Cut Criteria
• Criterion: Minimum-cut
– Minimise weight of connections between groups
min cut(A,B)
Optimal cut
Minimum cut
• Problem:
– Only considers external cluster connections
– Does not consider internal cluster density
• Degenerate case:
Graph Cut Criteria (continued)
• Criterion: Normalised-cut (Shi & Malik,’97)
– Consider the connectivity between groups
relative to the density of each group.
)
(
)
,
(
)
(
)
,
(
)
,
(
min
B
vol
B
A
cut
A
vol
B
A
cut
B
A
Ncut 

– Normalise the association between groups by volume.
• Vol(A): The total weight of the edges originating from
group A.
• Why use this criterion?
– Minimising the normalised cut is equivalent to
maximising normalised association.
– Produces more balanced partitions.
Second option
_ _ _ _
_ _ _ _
0
1
0 1
A B N
A number of vertexes on A
B number of vertexes on B
A B N
A B
A or B N
thats




  


 
 
 
 
 

  
The previous criteria was on he weight
This following criteria is on the size of the group
Example – 2 Spirals
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Dataset exhibits complex
cluster shapes
 K-means performs very
poorly in this space due bias
toward dense spherical
clusters.
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706
In the embedded space
given by two leading
eigenvectors, clusters are
trivial to separate.
Spectral Graph Theory
• Possible approach
– Represent a similarity graph as a matrix
– Apply knowledge from Linear Algebra…
• Spectral Graph Theory
– Analyse the “spectrum” of matrix representing a graph.
– Spectrum : The eigenvectors of a graph, ordered by the
magnitude(strength) of their corresponding eigenvalues.
}
,...,
,
{ 2
1 n





• The eigenvalues and eigenvectors
of a matrix provide global
information about its
structure.
11 1 1 1
1
n
n nn n n
w w x x
λ
w w x x
     
     

     
     
     
Matrix Representations
• Adjacency matrix (A)
– n x n matrix
– : edge weight between vertex xi and xj
x1 x2 x3 x4 x5 x6
x1 0 0.8 0.6 0 0.1 0
x2 0.8 0 0.8 0 0 0
x3 0.6 0.8 0 0.2 0 0
x4 0 0 0.2 0 0.8 0.7
x5 0.1 0 0 0.8 0 0.8
x6 0 0 0 0.7 0.8 0
0.1
0.2
0.8
0.7
0.6
0.8
0.8
1
2
3
4
5
6
0.8
• Important properties:
– Symmetric matrix
 Eigenvalues are real
Eigenvector could span orthogonal base
]
[ ij
w
A 
Matrix Representations (continued)
• Important application:
– Normalise adjacency matrix
• Degree matrix (D)
– n x n diagonal matrix
– : total weight of edges incident to vertex xi
x1 x2 x3 x4 x5 x6
x1 1.5 0 0 0 0 0
x2 0 1.6 0 0 0 0
x3 0 0 1.6 0 0 0
x4 0 0 0 1.7 0 0
x5 0 0 0 0 1.7 0
x6 0 0 0 0 0 1.5
0.1
0.2
0.8
0.7
0.6
0.8
0.8
1
2
3
4
5
6
0.8


j
ij
w
i
i
D )
,
(
Matrix Representations (continued)
• Laplacian matrix (L)
– n x n symmetric matrix
• Important properties:
– Eigenvalues are non-negative real numbers
– Eigenvectors are real and orthogonal
– Eigenvalues and eigenvectors provide an insight into the
connectivity of the graph…
0.1
0.2
0.8
0.7
0.6
0.8
0.8
1
2
3
4
5
6
0.8
L = D - A
x1 x2 x3 x4 x5 x6
x1 1.5 -0.8 -0.6 0 -0.1 0
x2 -0.8 1.6 -0.8 0 0 0
x3 -0.6 -0.8 1.6 -0.2 0 0
x4 0 0 -0.2 1.7 -0.8 -0.7
x5 -0.1 0 0 0.8
- 1.7 -0.8
x6 0 0 0 -0.7 -0.8 1.5
Another option – normalized laplasian
• Laplacian matrix (L)
– n x n symmetric matrix
0.00
-0.06
0.00
-0.39
-0.52
1.00
0.00
0.00
0.00
-0.50
1.00
-0.52
0.00
0.00
-0.12
1.00
-0.50
-0.39
-
0.44
-
0.47
1.00
-0.12
0.00
0.00
-
0.50
1.00
0.47
-
0.00
0.00
-0.06
1.00
-
0.50
-
0.44
0.00
0.00
0.00
• Important properties:
– Eigenvectors are real and normalize
– Each Aij (which i,j is not equal) =
0.1
0.2
0.8
0.7
0.6
0.8
0.8
1
2
3
4
5
6
0.8
0.5 0.5
( )
D D A D
 
  
ij
A
Dii

Find An Optimal Min-Cut (Hall’70,
Fiedler’73)
• Express a bi-partition (A,B) as a vector
1 if A
1 if B
i
i
i
x
p
x
 

 
 

• The laplacian is semi positive
• The Rayleigh Theorem shows:
– The minimum value for f(p) is given by
the 2nd smallest eigenvalue of the Laplacian L.
– The optimal solution for p is given by the corresponding
eigenvector λ2, referred as the Fiedler Vector.
2
,
)
(
)
( j
i
V
j
i
ij p
p
w
p
f 

•
W
e
c
a
n
m
i
p
L
pT

Laplacian
matrix
Proof
• Based on
• Consistency of Spectral Clustering
By Ulrike von Luxburg1, Mikhail Belkin2, Olivier Bousquet
Max Planck Institute for Biological Cybernetics
Pages 2-6
Proof
, ( ) ( )
deg( )
( ) deg( ), ( )
( ) min ( )
ij
i s
w
s v vol s vol s
i w
vol s i vol s cut
b g cut s


 

 
 


Some definitions:
1,[ ]
1,[ ]
i
N
s
i S
f
i S



  
 


Define f as follows
2
( ) 4 ( )
i j
T
s s ij s s
f Lf w f f vol s

  

Only the vertex that have edge between them
from different set would be meaningful
( )
T
s s ij
f Df w vol v
 

For each edge the sum is on the diagonal
[ ] [ ]
1 deg[ ] deg[ ] ( ) ( )
T
s
i s i s
f D i i vol s vol s
 
   
 
Hence it would be equal to zero only than vol(s)=vol(s’) now it could
definite as
{1, 1}, 1 0
( ) min t
w
f fD
b g f Lf
  
 
( )
w
b g
Continue …
n
, 1 0
( ) min
n
t
w t
f fD
f Lf
b g
f Df
 

{1, 1}, 1 0
( )
( ) min
4
t
w t
f fD
vol v f Lf
b g
f Df
  

From simple algebra..
The relaxation method is for each
vector on
Eigen value worth
Lf
Df
 
Because the min of eigenvalue is 0 it doesn’t give us any
information ,and that’s why its bw(g)=2nd eigenvalue
Spectral Clustering Algorithms
• Three basic stages:
1. Pre-processing
• Construct a matrix representation of the dataset.
2. Decomposition
• Compute eigenvalues and eigenvectors of the matrix.
• Map each point to a lower-dimensional representation
based on one or more eigenvectors.
3. Grouping
• Assign points to two or more clusters, based on the new
representation.
Spectral Bi-partitioning Algorithm
1. Pre-processing
– Build Laplacian
matrix L of the
graph
0.9
0.8
0.5
-0.2
-0.7
0.4
-0.2
-0.6
-0.8
-0.4
-0.7
0.4
-0.6
-0.4
0.2
0.9
-0.4
0.4
0.6
-0.2
0.0
-0.2
0.2
0.4
0.3
0.4
-0.
0.1
0.2
0.4
-0.9
-0.2
0.4
0.1
0.2
0.4
3.0
2.5
2.3
2.2
0.4
0.0
Λ = X =
2. Decomposition
– Find eigenvalues X
and eigenvectors Λ
of the matrix L
-0.7
x6
-0.7
x5
-0.4
x4
0.2
x3
0.2
x2
0.2
x1
– Map vertices to
corresponding
components of λ2
x1 x2 x3 x4 x5 x6
x1 1.5 -0.8 -0.6 0 -0.1 0
x2 -0.8 1.6 -0.8 0 0 0
x3 -0.6 -0.8 1.6 -0.2 0 0
x4 0 0 -0.2 1.7 -0.8 -0.7
x5 -0.1 0 0 -0.8 1.7 -0.8
x6 0 0 0 -0.7 -0.8 1.5
Spectral Bi-partitioning Algorithm
1
2
3
4
5
6
0.11
-
0.38
-
0.31
-
0.65
-0.41
0.41
0.22
0.71
0.30
0.01
-0.44
0.41
-
0.37
-
0.39
0.04
0.64
-0.37
0.41
0.61
0.00
-
0.45
0.34
0.37
0.41
-
0.65
0.35
-
0.30
-
0.17
0.41
0.41
0.09
-
0.29
0.72
-
0.18
0.45
0.41
The matrix which represents the eigenvector of the laplacian the
eigenvector matched to the corresponded eigenvalues with increasing order
Spectral Bi-partitioning (continued)
• Grouping
– Sort components of reduced 1-dimensional vector.
– Identify clusters by splitting the sorted vector in two.
• How to choose a splitting point?
– Naïve approaches:
• Split at 0, mean or median value
– More expensive approaches
• Attempt to minimise normalised cut criterion in 1-dimension
-0.7
x6
-0.7
x5
-0.4
x4
0.2
x3
0.2
x2
0.2
x1 Split at 0
Cluster A: Positive points
Cluster B: Negative points
0.2
x3
0.2
x2
0.2
x1
-0.7
x6
-0.7
x5
-0.4
x4
A B
3-Clusters
Lets assume the next data points
• If we use the 2nd eigen
vector
• If we use the 3rd eigen
vector
K-Way Spectral Clustering
• How do we partition a graph into k clusters?
• Two basic approaches:
1. Recursive bi-partitioning (Hagen et al.,’91)
• Recursively apply bi-partitioning algorithm in a
hierarchical divisive manner.
• Disadvantages: Inefficient, unstable
2. Cluster multiple eigenvectors (Shi & Malik,’00)
• Build a reduced space from multiple eigenvectors.
• Commonly used in recent papers
• A preferable approach…but its like to do PCA and then
k-means
3
( )
O n
Recursive bi-partitioning (Hagen et al.,’91)
• Partition using only one eigenvector at a time
• Use procedure recursively
• Example: Image Segmentation
– Uses 2nd (smallest) eigenvector to define optimal cut
– Recursively generates two clusters with each cut
Why use Multiple Eigenvectors?
1. Approximates the optimal cut (Shi & Malik,’00)
– Can be used to approximate the optimal k-way normalised cut.
2. Emphasises cohesive clusters (Brand & Huang,’02)
– Increases the unevenness in the distribution of the data.
 Associations between similar points are amplified, associations
between dissimilar points are attenuated.
 The data begins to “approximate a clustering”.
3. Well-separated space
– Transforms data to a new “embedded space”, consisting of k
orthogonal basis vectors.
• NB: Multiple eigenvectors prevent instability due to
information loss.
K-Eigenvector Clustering
• K-eigenvector Algorithm (Ng et al.,’01)
1. Pre-processing
– Construct the scaled adjacency matrix
2
/
1
2
/
1
' 

 AD
D
A
2. Decomposition
• Find the eigenvalues and eigenvectors of A'.
• Build embedded space from the eigenvectors
corresponding to the k largest eigenvalues.
3. Grouping
• Apply k-means to reduced n x k space to produce k
clusters.
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
K
Eigenvalue
Largest eigenvalues
of Cisi/Medline data λ1
λ2
Aside: How to select k?
•
Eigengap: the difference between two consecutive eigenvalues.
•
Most stable clustering is generally given by the value k that maximises
the expression
1



 k
k
k 

 Choose k=2
1
2
max 
 

k
Conclusion
• Clustering as a graph partitioning problem
– Quality of a partition can be determined using graph
cut criteria.
– Identifying an optimal partition is NP-hard.
• Spectral clustering techniques
– Efficient approach to calculate near-optimal
bi-partitions and k-way partitions.
– Based on well-known cut criteria and strong
theoretical background.
Selecting relevant genes with
spectral approach
Genomics
tissue samples
Gene
expressions
The microarray technology provides
many measurements of gene
expressions for different sample
tissues.
Goal: recognizing the relevant
genes that separate between
cells with different biological
characteristics (normal vs. tumor,
different subclasses of tumor cells)
• Classification of Tissue Samples (type of
Cancer, Normal vs. Tumor)
• Find Novel Subclasses (unsupervised)
• Find Genes responsible for classification
(new insights for drug design).
Few samples (~50) and large dimension (~10,000)
....
1
M i
M q
M
1
T
m
T
m2
T
n
m
lets the microarray data matrix define by M 1
[ ,..., ] n q
q
M M M R 
 
Normalized gene vector
tissue vector
2
1
i
m 
The gene expression levels form the rows of M
 
1
,..., s
T T
i i
m m
which are most “relevant” with respect to an
inference (learning) task.
Problem Definition
Feature Subset Relevance - Key Idea
1
M i
M q
M
T
m1
T
m2
T
n
m

ˆ
M1

ˆ
Mi

ˆ
Mq
Mi  Rn
ˆ
Mi  Rl

l  n

Rl
Working Assumption: the relevant subset of rows induce columns that are coherently clustered.
• How to measure cluster coherency? We wish to avoid explicitly clustering
for each subset of rows. We wish a measure which is amenable to
continuous functional analysis.
key idea: use spectral information from the affinity matrix

ˆ
M1

ˆ
Mi

ˆ
Mq

ˆ
M 

ˆ
MT ˆ
M
• How to represent ?

ˆ
MT ˆ
M
 
l
i
i
s ,...,
1

A  ˆ
MT ˆ
M  imimi
T
i1
n

subset of features


 

otherwise
s
i
i
0
1

Correlation matrix between I’th and j’th
Column (symmetric-positive)
Definition of Relevancy
The Standard Spectrum
General Idea:
Select a subset of rows from the sample matrix M such that the resulting
affinity matrix will have high values associated with the first k eigenvalues.
 
l
i
i
s ,...,
1

)
,...,
( 1 l
i
i x
x
rel )
( 


 Q
A
A
Q
trace T
T
  

k
j j
1
2



n
i
T
i
i
i m
m
A 1


subset of features


 

otherwise
s
i
i
0
1


Q consists of the first k eigenvectors of 
A
Optimization Problem
 
Q
A
A
Q
trace T
T
Q n



 ,...,
, 1
max


n
i
T
i
i
i m
m
A 1


I
Q
QT

Let for some unknown real scalars  T
n


 ,...,
1

subject to
2
1
1
n
i
i




Motivation: from spectral clustering it is known
that the eigenvectors tend to be discontinuous
and that may lead to an effortless sparsity
property.
The Algorithm


Q
 
Q
A
A
Q
trace T
T
Q n



 ,...,
, 1
max I
Q
QT
 1


T
If were known, then is known and Q is simply the first k eigenvectors of
 
A 
A
If Q were known, then the problem becomes:




G
T
n
,...,
1
max subject to 1


T
j
T
T
i
j
T
i
ij Qm
Q
m
m
m
G )
(

where
 is the largest eigenvector of G
The Algorithm
Power-embedded


Q
j
r
r
T
i
j
T
i
r
ij m
Q
Q
m
m
m
G
T
)
1
(
)
1
(
)
(
)
( 


1. Let be defined
)
(r
G
2. Let be the largest eigenvector of )
(r
G
)
(r

3. Let 

n
i
T
i
i
r
i
r
m
m
A 1
)
(
)
(

4. Let )
1
(
)
(
)
( 
 r
r
r
Q
A
Z
5.
)
(
)
(
)
( r
r
QR
r
R
Q
Z 
 “QR” factorization step
6. Increment r
orthogonal iteration
Note that r-is the index of iterations
The algorithm need 3 conditions:
• the algorithm converges to a local maximum
• At the local maximum
• The vector is sparse
0
i
 

The experiment
• Giving some data sets: blood cell
• Myeloid cell: HL-60 U937.
• T cell-Jurkat,
• Leukemia cell NB4
• The dimensionality of the expression data was 7229 genes
over 17 sampeles
• The goal is to find cluster of the expression level of the
gene without any restriction.
Copyright ©1999 by the National Academy of Sciences
Principle of SOMs. Initial geometry of nodes in 3 × 2 rectangular grid is
indicated by solid lines connecting the nodes. Hypothetical trajectories of nodes
as they migrate to fit data during successive iterations of SOM algorithm are
shown. Data points are represented by black dots, six nodes of SOM by large
circles, and trajectories by arrows.
SOM
S.O.M Results
• the time course data
with s.o.m and after
pre-processing of the
data
• The results was 24
cluster each cluster
consists 6-113 genes
Q-alpha
After applying to Q-alpha algorithm-
was found that
The set of relevant genes was consists
from small number of relevant genes on
the you can see plot of the sorted –alpha
The profile of the values indicates
sparsity meaning that around 95% of the
values are of an order of magnitude
smaller than the remaining 5%.
continue
A plot of 6 of the top 40 genes that correspond to clusters 20, 1, 22/23, 4, 15, 21 In
each of the six panels time courses of all four cell lines are shown (left to
right) HL-60, U937, NB4, Jurkat.
Another example
• Another dataset :
1. DLCL, reffered to as ”lymphoma” 7, 129 genes over 56 samples
2. Childhood medulloblastomas referred to as ”brain”. The
dimensionality of this dataset was 7, 129 and there were 60 samples
3. Breast tumors reffered to as ”breast met”. The dimensionality
of this dataset was 24,624 and there were 44 samples where
4. The fourth breast tumors for which corresponding lymph nodes either
were cancerous or not, referred to as ”lymph status”. The
dimensionality of this dataset is 12, 600 with 90 samples
The results-using leave one out
algorithm
• Compare to unsupervised method like PCA,GS
and supervised methods like SNR,RMB,RFE
The slide you all waited for

More Related Content

PPTX
Matrices ppt
PPTX
Introduction to Matlab and application.pptx
PPTX
Dimension Reduction Introduction & PCA.pptx
PPTX
Linear Algebra Presentation including basic of linear Algebra
PPTX
machine learning.pptx
PPTX
Support vector machine
PDF
Lecture_note2.pdf
PDF
Distributed Architecture of Subspace Clustering and Related
Matrices ppt
Introduction to Matlab and application.pptx
Dimension Reduction Introduction & PCA.pptx
Linear Algebra Presentation including basic of linear Algebra
machine learning.pptx
Support vector machine
Lecture_note2.pdf
Distributed Architecture of Subspace Clustering and Related

Similar to talk9.ppt (20)

PPTX
Lecture 8 about data mining and how to use it.pptx
PPT
PDF
Spectral Clustering Report
PPTX
Determinants, crammers law, Inverse by adjoint and the applications
PDF
Eigenvalues and eigenvectors
PPTX
Mathematical Foundations for Machine Learning and Data Mining
PDF
me310_5_regression.pdf numerical method for engineering
PDF
UNIT I_5.pdf
PPTX
MODULE_05-Matrix Decomposition.pptx
PDF
Random Matrix Theory and Machine Learning - Part 3
PDF
PPT
1533 game mathematics
PDF
PART I.3 - Physical Mathematics
PDF
1010n3a
PPTX
Graph based approaches to Gene Expression Clustering
PPTX
Higher order elements
PPTX
Linear Algebra and Matlab tutorial
Lecture 8 about data mining and how to use it.pptx
Spectral Clustering Report
Determinants, crammers law, Inverse by adjoint and the applications
Eigenvalues and eigenvectors
Mathematical Foundations for Machine Learning and Data Mining
me310_5_regression.pdf numerical method for engineering
UNIT I_5.pdf
MODULE_05-Matrix Decomposition.pptx
Random Matrix Theory and Machine Learning - Part 3
1533 game mathematics
PART I.3 - Physical Mathematics
1010n3a
Graph based approaches to Gene Expression Clustering
Higher order elements
Linear Algebra and Matlab tutorial
Ad

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Mega Projects Data Mega Projects Data
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Business Analytics and business intelligence.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PDF
Introduction to the R Programming Language
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Lecture1 pattern recognition............
Introduction to Knowledge Engineering Part 1
Galatica Smart Energy Infrastructure Startup Pitch Deck
Mega Projects Data Mega Projects Data
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Data Science and Data Analysis
Business Analytics and business intelligence.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
Introduction to the R Programming Language
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Lecture1 pattern recognition............
Ad

talk9.ppt

  • 2. Spectral Clustering • Algorithms that cluster points using eigenvectors of matrices derived from the data • Obtain data representation in the low- dimensional space that can be easily clustered • Variety of methods that use the eigenvectors differently • Difficult to understand….
  • 3. Elements of Graph Theory • A graph G = (V,E) consists of a vertex set V and an edge set E. • If G is a directed graph, each edge is an ordered pair of vertices • A bipartite graph is one in which the vertices can be divided into two groups, so that all edges join vertices in different groups.
  • 4. 0.1 0.2 0.8 0.7 0.6 0.8 0.8 0.8 E={Wij} Set of weighted edges indicating pair-wise similarity between points Similarity Graph • Distance decrease similarty increase • Represent dataset as a weighted graph G(V,E) 1 2 6 { , ,..., } v v v 1 2 3 4 5 6 V={xi} Set of n vertices representing data points
  • 5. Similarity Graph • Wij represent similarity between vertex • If Wij=0 where isn’t similarity • Wii=0
  • 6. Graph Partitioning • Clustering can be viewed as partitioning a similarity graph • Bi-partitioning task: – Divide vertices into two disjoint groups (A,B) 1 2 3 4 5 6 A B V=A U B Graph partition is NP hard
  • 7. Clustering Objectives • Traditional definition of a “good” clustering: 1. Points assigned to same cluster should be highly similar. 2. Points assigned to different clusters should be highly dissimilar. Minimize weight of between-group connections 0.1 0.2 0.8 0.7 0.6 0.8 0.8 0.8 1 2 3 4 5 6 • Apply these objectives to our graph representation
  • 8. Graph Cuts • Express partitioning objectives as a function of the “edge cut” of the partition. • Cut: Set of edges with only one vertex in a group.we wants to find the minimal cut beetween groups. The groups that has the minimal cut would be the partition    B j A i ij w B A cut , ) , ( 0.1 0.2 0.8 0.7 0.6 0.8 0.8 1 2 3 4 5 6 0.8 A B cut(A,B) = 0.3
  • 9. Graph Cut Criteria • Criterion: Minimum-cut – Minimise weight of connections between groups min cut(A,B) Optimal cut Minimum cut • Problem: – Only considers external cluster connections – Does not consider internal cluster density • Degenerate case:
  • 10. Graph Cut Criteria (continued) • Criterion: Normalised-cut (Shi & Malik,’97) – Consider the connectivity between groups relative to the density of each group. ) ( ) , ( ) ( ) , ( ) , ( min B vol B A cut A vol B A cut B A Ncut   – Normalise the association between groups by volume. • Vol(A): The total weight of the edges originating from group A. • Why use this criterion? – Minimising the normalised cut is equivalent to maximising normalised association. – Produces more balanced partitions.
  • 11. Second option _ _ _ _ _ _ _ _ 0 1 0 1 A B N A number of vertexes on A B number of vertexes on B A B N A B A or B N thats                        The previous criteria was on he weight This following criteria is on the size of the group
  • 12. Example – 2 Spirals -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Dataset exhibits complex cluster shapes  K-means performs very poorly in this space due bias toward dense spherical clusters. -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706 In the embedded space given by two leading eigenvectors, clusters are trivial to separate.
  • 13. Spectral Graph Theory • Possible approach – Represent a similarity graph as a matrix – Apply knowledge from Linear Algebra… • Spectral Graph Theory – Analyse the “spectrum” of matrix representing a graph. – Spectrum : The eigenvectors of a graph, ordered by the magnitude(strength) of their corresponding eigenvalues. } ,..., , { 2 1 n      • The eigenvalues and eigenvectors of a matrix provide global information about its structure. 11 1 1 1 1 n n nn n n w w x x λ w w x x                               
  • 14. Matrix Representations • Adjacency matrix (A) – n x n matrix – : edge weight between vertex xi and xj x1 x2 x3 x4 x5 x6 x1 0 0.8 0.6 0 0.1 0 x2 0.8 0 0.8 0 0 0 x3 0.6 0.8 0 0.2 0 0 x4 0 0 0.2 0 0.8 0.7 x5 0.1 0 0 0.8 0 0.8 x6 0 0 0 0.7 0.8 0 0.1 0.2 0.8 0.7 0.6 0.8 0.8 1 2 3 4 5 6 0.8 • Important properties: – Symmetric matrix  Eigenvalues are real Eigenvector could span orthogonal base ] [ ij w A 
  • 15. Matrix Representations (continued) • Important application: – Normalise adjacency matrix • Degree matrix (D) – n x n diagonal matrix – : total weight of edges incident to vertex xi x1 x2 x3 x4 x5 x6 x1 1.5 0 0 0 0 0 x2 0 1.6 0 0 0 0 x3 0 0 1.6 0 0 0 x4 0 0 0 1.7 0 0 x5 0 0 0 0 1.7 0 x6 0 0 0 0 0 1.5 0.1 0.2 0.8 0.7 0.6 0.8 0.8 1 2 3 4 5 6 0.8   j ij w i i D ) , (
  • 16. Matrix Representations (continued) • Laplacian matrix (L) – n x n symmetric matrix • Important properties: – Eigenvalues are non-negative real numbers – Eigenvectors are real and orthogonal – Eigenvalues and eigenvectors provide an insight into the connectivity of the graph… 0.1 0.2 0.8 0.7 0.6 0.8 0.8 1 2 3 4 5 6 0.8 L = D - A x1 x2 x3 x4 x5 x6 x1 1.5 -0.8 -0.6 0 -0.1 0 x2 -0.8 1.6 -0.8 0 0 0 x3 -0.6 -0.8 1.6 -0.2 0 0 x4 0 0 -0.2 1.7 -0.8 -0.7 x5 -0.1 0 0 0.8 - 1.7 -0.8 x6 0 0 0 -0.7 -0.8 1.5
  • 17. Another option – normalized laplasian • Laplacian matrix (L) – n x n symmetric matrix 0.00 -0.06 0.00 -0.39 -0.52 1.00 0.00 0.00 0.00 -0.50 1.00 -0.52 0.00 0.00 -0.12 1.00 -0.50 -0.39 - 0.44 - 0.47 1.00 -0.12 0.00 0.00 - 0.50 1.00 0.47 - 0.00 0.00 -0.06 1.00 - 0.50 - 0.44 0.00 0.00 0.00 • Important properties: – Eigenvectors are real and normalize – Each Aij (which i,j is not equal) = 0.1 0.2 0.8 0.7 0.6 0.8 0.8 1 2 3 4 5 6 0.8 0.5 0.5 ( ) D D A D      ij A Dii 
  • 18. Find An Optimal Min-Cut (Hall’70, Fiedler’73) • Express a bi-partition (A,B) as a vector 1 if A 1 if B i i i x p x         • The laplacian is semi positive • The Rayleigh Theorem shows: – The minimum value for f(p) is given by the 2nd smallest eigenvalue of the Laplacian L. – The optimal solution for p is given by the corresponding eigenvector λ2, referred as the Fiedler Vector. 2 , ) ( ) ( j i V j i ij p p w p f   • W e c a n m i p L pT  Laplacian matrix
  • 19. Proof • Based on • Consistency of Spectral Clustering By Ulrike von Luxburg1, Mikhail Belkin2, Olivier Bousquet Max Planck Institute for Biological Cybernetics Pages 2-6
  • 20. Proof , ( ) ( ) deg( ) ( ) deg( ), ( ) ( ) min ( ) ij i s w s v vol s vol s i w vol s i vol s cut b g cut s            Some definitions: 1,[ ] 1,[ ] i N s i S f i S           Define f as follows 2 ( ) 4 ( ) i j T s s ij s s f Lf w f f vol s      Only the vertex that have edge between them from different set would be meaningful ( ) T s s ij f Df w vol v    For each edge the sum is on the diagonal [ ] [ ] 1 deg[ ] deg[ ] ( ) ( ) T s i s i s f D i i vol s vol s         Hence it would be equal to zero only than vol(s)=vol(s’) now it could definite as {1, 1}, 1 0 ( ) min t w f fD b g f Lf      ( ) w b g
  • 21. Continue … n , 1 0 ( ) min n t w t f fD f Lf b g f Df    {1, 1}, 1 0 ( ) ( ) min 4 t w t f fD vol v f Lf b g f Df     From simple algebra.. The relaxation method is for each vector on Eigen value worth Lf Df   Because the min of eigenvalue is 0 it doesn’t give us any information ,and that’s why its bw(g)=2nd eigenvalue
  • 22. Spectral Clustering Algorithms • Three basic stages: 1. Pre-processing • Construct a matrix representation of the dataset. 2. Decomposition • Compute eigenvalues and eigenvectors of the matrix. • Map each point to a lower-dimensional representation based on one or more eigenvectors. 3. Grouping • Assign points to two or more clusters, based on the new representation.
  • 23. Spectral Bi-partitioning Algorithm 1. Pre-processing – Build Laplacian matrix L of the graph 0.9 0.8 0.5 -0.2 -0.7 0.4 -0.2 -0.6 -0.8 -0.4 -0.7 0.4 -0.6 -0.4 0.2 0.9 -0.4 0.4 0.6 -0.2 0.0 -0.2 0.2 0.4 0.3 0.4 -0. 0.1 0.2 0.4 -0.9 -0.2 0.4 0.1 0.2 0.4 3.0 2.5 2.3 2.2 0.4 0.0 Λ = X = 2. Decomposition – Find eigenvalues X and eigenvectors Λ of the matrix L -0.7 x6 -0.7 x5 -0.4 x4 0.2 x3 0.2 x2 0.2 x1 – Map vertices to corresponding components of λ2 x1 x2 x3 x4 x5 x6 x1 1.5 -0.8 -0.6 0 -0.1 0 x2 -0.8 1.6 -0.8 0 0 0 x3 -0.6 -0.8 1.6 -0.2 0 0 x4 0 0 -0.2 1.7 -0.8 -0.7 x5 -0.1 0 0 -0.8 1.7 -0.8 x6 0 0 0 -0.7 -0.8 1.5
  • 25. Spectral Bi-partitioning (continued) • Grouping – Sort components of reduced 1-dimensional vector. – Identify clusters by splitting the sorted vector in two. • How to choose a splitting point? – Naïve approaches: • Split at 0, mean or median value – More expensive approaches • Attempt to minimise normalised cut criterion in 1-dimension -0.7 x6 -0.7 x5 -0.4 x4 0.2 x3 0.2 x2 0.2 x1 Split at 0 Cluster A: Positive points Cluster B: Negative points 0.2 x3 0.2 x2 0.2 x1 -0.7 x6 -0.7 x5 -0.4 x4 A B
  • 26. 3-Clusters Lets assume the next data points
  • 27. • If we use the 2nd eigen vector • If we use the 3rd eigen vector
  • 28. K-Way Spectral Clustering • How do we partition a graph into k clusters? • Two basic approaches: 1. Recursive bi-partitioning (Hagen et al.,’91) • Recursively apply bi-partitioning algorithm in a hierarchical divisive manner. • Disadvantages: Inefficient, unstable 2. Cluster multiple eigenvectors (Shi & Malik,’00) • Build a reduced space from multiple eigenvectors. • Commonly used in recent papers • A preferable approach…but its like to do PCA and then k-means 3 ( ) O n
  • 29. Recursive bi-partitioning (Hagen et al.,’91) • Partition using only one eigenvector at a time • Use procedure recursively • Example: Image Segmentation – Uses 2nd (smallest) eigenvector to define optimal cut – Recursively generates two clusters with each cut
  • 30. Why use Multiple Eigenvectors? 1. Approximates the optimal cut (Shi & Malik,’00) – Can be used to approximate the optimal k-way normalised cut. 2. Emphasises cohesive clusters (Brand & Huang,’02) – Increases the unevenness in the distribution of the data.  Associations between similar points are amplified, associations between dissimilar points are attenuated.  The data begins to “approximate a clustering”. 3. Well-separated space – Transforms data to a new “embedded space”, consisting of k orthogonal basis vectors. • NB: Multiple eigenvectors prevent instability due to information loss.
  • 31. K-Eigenvector Clustering • K-eigenvector Algorithm (Ng et al.,’01) 1. Pre-processing – Construct the scaled adjacency matrix 2 / 1 2 / 1 '    AD D A 2. Decomposition • Find the eigenvalues and eigenvectors of A'. • Build embedded space from the eigenvectors corresponding to the k largest eigenvalues. 3. Grouping • Apply k-means to reduced n x k space to produce k clusters.
  • 32. 0 5 10 15 20 25 30 35 40 45 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 K Eigenvalue Largest eigenvalues of Cisi/Medline data λ1 λ2 Aside: How to select k? • Eigengap: the difference between two consecutive eigenvalues. • Most stable clustering is generally given by the value k that maximises the expression 1     k k k    Choose k=2 1 2 max     k
  • 33. Conclusion • Clustering as a graph partitioning problem – Quality of a partition can be determined using graph cut criteria. – Identifying an optimal partition is NP-hard. • Spectral clustering techniques – Efficient approach to calculate near-optimal bi-partitions and k-way partitions. – Based on well-known cut criteria and strong theoretical background.
  • 34. Selecting relevant genes with spectral approach
  • 35. Genomics tissue samples Gene expressions The microarray technology provides many measurements of gene expressions for different sample tissues. Goal: recognizing the relevant genes that separate between cells with different biological characteristics (normal vs. tumor, different subclasses of tumor cells) • Classification of Tissue Samples (type of Cancer, Normal vs. Tumor) • Find Novel Subclasses (unsupervised) • Find Genes responsible for classification (new insights for drug design). Few samples (~50) and large dimension (~10,000)
  • 36. .... 1 M i M q M 1 T m T m2 T n m lets the microarray data matrix define by M 1 [ ,..., ] n q q M M M R    Normalized gene vector tissue vector 2 1 i m  The gene expression levels form the rows of M   1 ,..., s T T i i m m which are most “relevant” with respect to an inference (learning) task. Problem Definition
  • 37. Feature Subset Relevance - Key Idea 1 M i M q M T m1 T m2 T n m  ˆ M1  ˆ Mi  ˆ Mq Mi  Rn ˆ Mi  Rl  l  n  Rl Working Assumption: the relevant subset of rows induce columns that are coherently clustered.
  • 38. • How to measure cluster coherency? We wish to avoid explicitly clustering for each subset of rows. We wish a measure which is amenable to continuous functional analysis. key idea: use spectral information from the affinity matrix  ˆ M1  ˆ Mi  ˆ Mq  ˆ M   ˆ MT ˆ M • How to represent ?  ˆ MT ˆ M   l i i s ,..., 1  A  ˆ MT ˆ M  imimi T i1 n  subset of features      otherwise s i i 0 1  Correlation matrix between I’th and j’th Column (symmetric-positive)
  • 39. Definition of Relevancy The Standard Spectrum General Idea: Select a subset of rows from the sample matrix M such that the resulting affinity matrix will have high values associated with the first k eigenvalues.   l i i s ,..., 1  ) ,..., ( 1 l i i x x rel ) (     Q A A Q trace T T     k j j 1 2    n i T i i i m m A 1   subset of features      otherwise s i i 0 1   Q consists of the first k eigenvectors of  A
  • 40. Optimization Problem   Q A A Q trace T T Q n     ,..., , 1 max   n i T i i i m m A 1   I Q QT  Let for some unknown real scalars  T n    ,..., 1  subject to 2 1 1 n i i     Motivation: from spectral clustering it is known that the eigenvectors tend to be discontinuous and that may lead to an effortless sparsity property.
  • 41. The Algorithm   Q   Q A A Q trace T T Q n     ,..., , 1 max I Q QT  1   T If were known, then is known and Q is simply the first k eigenvectors of   A  A If Q were known, then the problem becomes:     G T n ,..., 1 max subject to 1   T j T T i j T i ij Qm Q m m m G ) (  where  is the largest eigenvector of G
  • 42. The Algorithm Power-embedded   Q j r r T i j T i r ij m Q Q m m m G T ) 1 ( ) 1 ( ) ( ) (    1. Let be defined ) (r G 2. Let be the largest eigenvector of ) (r G ) (r  3. Let   n i T i i r i r m m A 1 ) ( ) (  4. Let ) 1 ( ) ( ) (   r r r Q A Z 5. ) ( ) ( ) ( r r QR r R Q Z   “QR” factorization step 6. Increment r orthogonal iteration Note that r-is the index of iterations
  • 43. The algorithm need 3 conditions: • the algorithm converges to a local maximum • At the local maximum • The vector is sparse 0 i   
  • 44. The experiment • Giving some data sets: blood cell • Myeloid cell: HL-60 U937. • T cell-Jurkat, • Leukemia cell NB4 • The dimensionality of the expression data was 7229 genes over 17 sampeles • The goal is to find cluster of the expression level of the gene without any restriction.
  • 45. Copyright ©1999 by the National Academy of Sciences Principle of SOMs. Initial geometry of nodes in 3 × 2 rectangular grid is indicated by solid lines connecting the nodes. Hypothetical trajectories of nodes as they migrate to fit data during successive iterations of SOM algorithm are shown. Data points are represented by black dots, six nodes of SOM by large circles, and trajectories by arrows. SOM
  • 46. S.O.M Results • the time course data with s.o.m and after pre-processing of the data • The results was 24 cluster each cluster consists 6-113 genes
  • 47. Q-alpha After applying to Q-alpha algorithm- was found that The set of relevant genes was consists from small number of relevant genes on the you can see plot of the sorted –alpha The profile of the values indicates sparsity meaning that around 95% of the values are of an order of magnitude smaller than the remaining 5%.
  • 48. continue A plot of 6 of the top 40 genes that correspond to clusters 20, 1, 22/23, 4, 15, 21 In each of the six panels time courses of all four cell lines are shown (left to right) HL-60, U937, NB4, Jurkat.
  • 49. Another example • Another dataset : 1. DLCL, reffered to as ”lymphoma” 7, 129 genes over 56 samples 2. Childhood medulloblastomas referred to as ”brain”. The dimensionality of this dataset was 7, 129 and there were 60 samples 3. Breast tumors reffered to as ”breast met”. The dimensionality of this dataset was 24,624 and there were 44 samples where 4. The fourth breast tumors for which corresponding lymph nodes either were cancerous or not, referred to as ”lymph status”. The dimensionality of this dataset is 12, 600 with 90 samples
  • 50. The results-using leave one out algorithm • Compare to unsupervised method like PCA,GS and supervised methods like SNR,RMB,RFE
  • 51. The slide you all waited for