IJCAI13 Paper review: Large-scale spectral clustering on graphs

Paper digest
“Large-Scale Spectral Clustering on Graphs”
Akisato Kimura
akisato@ieee.org, @_akisato

One-page abstract
• Approx. acceleration of spectral clustering
– by introducing additional nodes that enable us to
compress the original graph,
– resulting in a bipartite graph which is
computationally efficient for spectral clustering.
• Note
– Large-scale spectral clustering,
especially works well for dense graphs.
– Not suitable for large-scale graph clustering,
due to the sparsity in nature.

Spectral clustering [Shi & Malik 1997]
• Notations
– Undirected weighted graph 𝐺 = 𝑉, 𝐸
– Num. nodes 𝑛 = |𝑉|; Num. Edges 𝑚 = |𝐸|
– Adjacency matrix 𝑊 = 𝑊𝑖,𝑗 𝑖,𝑗=1,2,…,𝑛
• Objective function
– Solved by eigen-decomposition (EVD)
min
𝑋∈ℝ 𝑘×𝑛
𝑇𝑟(𝑋 𝑇
𝐷−1/2
𝐿𝐷−1/2
𝑋) s.t. 𝑋 𝑇
𝑋 = 𝐼
(𝐿: graph Laplacian of 𝑊, 𝐷 = 𝐿 − 𝑊, 𝑘: num.clusters)

Main contribution of this work
• SC needs 𝑂(𝑛3
) computations due to EVD.
• Several improvements so far.
– Compressing the adjacency matrix by Nystrom
method [Fowlkes+ 2004]
– Reducing samples (= nodes) [Shinnou & Sasaki 2008] [Yan+
2009] [Sakai & Imiya 2009] [Chen & Cai 2011]
– Early stopping of EVD [Chen+ 2006] [Liu+ 2007]
• In contrast, this work
– Reducing the size of the graph.

• Why supernodes? --- Intuition from co-clustering
– A partition of supernodes can induce a partition of the
observed nodes, and vise versa.
• Generating a set of 𝑑 ≪ 𝑛 supernodes
Introducing supernodes
Original graph
Regular nodes
Supernodes

How to generate supernodes
1. Randomly choosing 𝑑 regular nodes as seeds.
2. Calculating the shortest paths from the seeds
to the other regular nodes.
i. Converting adjacencies to distances.
ii. Applying Dijkstra’s algorithm.
3. Partitioning all the regular nodes into 𝑑
disjoint subsets based on the shortest paths.
4. (Each subset corresponds to a supernode.)

After generating supernodes
𝑛 regular nodes
𝑑 supernodes
𝑊
𝑅
𝑊 = 𝑅𝑊
𝑅 ∈ ℤ 𝑑×𝑛: binary bipartite graph
𝑊 ∈ ℝ 𝑑×𝑛:
bipartite, called a “reduced graph”
𝑊Propagating edge weights between
regular nodes and supernodes

Spectral clustering on reduced graphs
• Consider another representation of the
reduced graph
• Spectral clustering on 𝑊′
𝑛 regular nodes
𝑑 supernodes
𝑛 regular nodes
𝑑 supernodes
Result of spectral clustering on 𝑊′

Spectral clustering on reduced graphs
• Spectral clustering on 𝑊′ becomes
• It can be more simplified
– 𝑦 is also an eigenvector of 𝑍𝑍 𝑇 ∈ ℝ 𝑑×𝑑
𝑛 regular nodes
𝑑 supernodes
• Co-clustering structure
• 𝑥 and 𝑦 are left & right
singular vectors of 𝑍 ∈ ℝ 𝑑×𝑛.
∵ 𝑍𝑍 𝑇 𝑦 = 𝑍 1 − 𝜆 𝑥 = 1 − 𝜆 2 𝑦
(𝑍𝑍 𝑇
looks like a compressed representation of 𝑊.)

In summary
Described by now
Additional steps

Regenerating supernodes
• Intuitions
1. The matrix 𝑈 ∈ ℝ 𝑛×𝑘 implies
the current clustering.
2. Most of the nodes in the
same cluster expect to be
densely connected.
• Method
– Selecting 𝑘 − 1 right
(= with large eigenvalues)
vectors as supernodes. 𝑈
𝑛 regular nodes
𝑑 supernodes
𝑘 cluster nodes
𝑊

In detail
New regular-super links
Average affiliation score over all the samples.
• Resulting in (𝑘 − 1) edges from every regular node.
• Every edge stands for a binalized affiliation score
• So, this idea can be easily extended to quantized affiliation scores with arbitrary sizes

Finally, the algorithm is as follows
Generating or updating
supernodes
Small-size spectral clustering
can be replaced to a function of 𝑡 as 𝑙 𝑡

Computational costs
3-4. 𝑂(𝑚𝑑)
1-2. 𝑂(𝑛𝑑 log 𝑛)
6. 𝑂 𝑛𝑑2 + 𝑂(𝑑3)
7-9. 𝑂(𝑛𝑑𝑘)
Alg. 1: 𝑂(𝑛𝑑 log 𝑛 + 𝑚𝑑 + 𝑛𝑑2)
5. 𝑂(𝑛𝑑)
3. 𝑂(𝑛𝑑 log 𝑛 + 𝑚(𝑑 + 1))
5. 𝑂(𝑚𝑘)
Alg. 2: 𝑂(𝑚𝑘)
Alg. 3: 𝑂(𝑛𝑑 log 𝑛 + 𝑚𝑑 + 𝑚𝑘𝑡 + 𝑛(𝑑2 + 𝑘2 𝑡))
If 𝑑2 ≈ 𝑘2 𝑡 ≈ log2 𝑛 → 𝑂 𝑛 log2 𝑛
( = modularity-based clustering)

Data sets for experiments
• 2 synthetic, 2 real-world.
– Syn-1k: kNN graph; 100k: 100-ins & 40-outs
– DBLP: Author network, co-conference links.
– IMDB: Movie network, co-director links.
• Looks like moderate-scale (not large-scale) graphs…

Experimental results
Shortest Path (See Slide 6)
Proposed (Alg. 1)
Proposed (Alg. 3)
Spectral Clustering
[Khoa & Chawla 2012]
[Fowlkes+ 2004]
The proposed method is
suitable for dense graphs.
(if sparse, modularity-based
clustering would be better
(𝑂 𝑛 log 𝑛 ∼ 𝑂(𝑛 log2
𝑛)) )

Detailed results
Performance of the proposed methods
w.r.t parameter 𝑑 (num.supernodes).
Why not monotonically increasing?
Performance of the proposed methods
w.r.t parameter 𝑡 (num.iterations).

Qualitative evaluations
• Toy example on Syn-1K
Ground truth k-NN graph SP Proposed 1
Proposed 2
(5 iterations)
SC RESC Nystrom

Comments
• The idea and technique are interesting and
maybe versatile.
• (Serialized and parallel) implementation
would be quite simple.
– Matlab code is available at
http://jialu.cs.illinois.edu/publication
• Might be suitable only for dense graph
clustering (with features).

IJCAI13 Paper review: Large-scale spectral clustering on graphs

More Related Content

What's hot (20)

Similar to IJCAI13 Paper review: Large-scale spectral clustering on graphs (20)

More from Akisato Kimura (20)

Recently uploaded (20)

IJCAI13 Paper review: Large-scale spectral clustering on graphs