SlideShare a Scribd company logo
Graph Clustering
Based on Structural/Attribute Similarities
         Yang Zhou, Hong Cheng, Jeffrey Xu Yu

     Proc. Of the VLDB Endowment, France, 2009




                 Thursday, August 16, 2012



                           Presenter
                     Waqas Nawaz

   Data Knowledge and Engineering Lab, Kyung Hee University Korea
Agenda




                                              3/8
Data and Knowledge Engineering Lab            2
Introduction
 X = {x1, … , xN}: a set of data points
 S = (sij)i,j=1,…,N: the similarity matrix in which each element indicates the similarity sij
  between two data points xi and xj

 The goal of clustering is to divide the data points into several groups such that
  points in the same group are similar and points in different groups are dissimilar.

 Modeling the dataset as a graph

 The clustering problem in graph perspective is then formulated as a partition of
  the graph such that nodes in the same sub-graph are densely
  connected/homogeneous and sparsely connected /heterogeneous to the rest of
  the graph.

 Distances and similarities are reverse to each other. In the following, only talk
  about similarities, everything also works with distances.



                                                                                                 3/8
      Data and Knowledge Engineering Lab                                                         3
Motivation

 The identification of clusters, well-connected components in a
  graph, which is useful in many applications from biological
  function prediction to social community detection

                                                                     Attribute of Authors




                                  from manyeyes.alphaworks.ibm.com
                                                                                      3/8
    Data and Knowledge Engineering Lab                                                 4
Objective

 A desired clustering of attributed graph should achieve a good
  balance between the following:

    Structural cohesiveness: Vertices within one cluster are close to each
     other in terms of structure, while vertices between clusters are
     distant from each other

    Attribute homogeneity: Vertices within one cluster have similar
     attribute values, while vertices between clusters have quite different
     attribute values


                                  Structural
                                Cohesiveness     Attribute
                                               Homogeneity



                                                                              3/8
    Data and Knowledge Engineering Lab                                        5
Related Work

 Structure Based Clustering
    Normalized cuts [Shi and Malik, TPAMI 2000]
    Modularity [Newman and Girvan, Phys. Rev. 2004]
    SCAN [Xu et al., KDD'07]
  The clusters generated have a rather random distribution of vertex
  properties within clusters

 Attribute Based Clustering
    K-SNAP [Tian et al., SIGMOD’08]
    Attributes compatible grouping
  The clusters generated have a rather loose intra-cluster structure

  Is there any way to consider both factors (Structure and Attribute)
  simultaneously while Clustering…? YES

                                                                        3/8
    Data and Knowledge Engineering Lab                                  6
Graph Clustering with Structure & Attribute (1/11)

 Structure-based Clustering
    Vertices with heterogeneous values in a cluster

 Attribute-based Clustering
    Lose much structure information

 Structural/Attribute Cluster
    Vertices with homogeneous values in a cluster
    Keep most structure information




                                                       3/8
    Data and Knowledge Engineering Lab                 7
Graph Clustering with Structure & Attribute (2/11)
                                                                       r1. XML
 Example: A Coauthor Network

Attribute-based Cluster
Structural Clustering
Structural/Attribute Cluster
                                                    r3. XML, Skyline             r2. XML



                                                                             r4. XML


                                                                       r5. XML
                                                                                           r6. XML
                                             r9. Skyline




                             r10. Skyline              r11. Skyline              r7. XML      r8. XML




                                                                                                        3/8
        Data and Knowledge Engineering Lab                                                              8
Graph Clustering with Structure & Attribute (3/11)

 Proposed iDEA: Flow Diagram


                                            G        Transform vertex attributes
                  Desired
                                                     to attribute edges
                  Clusters



                     Clustering
                                                      Ga
                       on G


      Mapping onto the                                A unified distance
      original graph                    Clustering    on edges
                                          on Ga


                                                                                   3/8
   Data and Knowledge Engineering Lab                                              9
Graph Clustering with Structure & Attribute (4/11)

 Attribute Augmented Coauthor Graph with Topics
                                         r1. XML




                      r3. XML, Skyline             r2. XML



                                               r4. XML


                                         r5. XML
                                                             r6. XML
               r9. Skyline




r10. Skyline             r11. Skyline              r7. XML      r8. XML




                         Original                                         Modified
                        Then we use neighborhood random walk distance on the augmented
                               graph to combine structural and attribute similarities
                                                                                         3/8
           Data and Knowledge Engineering Lab                                            10
Neighborhood Random Walk (1/2)

    A           B           C                A         B           C

A                                        A
B                                        B
C                                        C


Adjacency matrix A                           Transition matrix P


                    B                                  B
            1                                    1
                                     1                                 1/2
                        1                                  1
        A                                    A

                1                                    1/2       C
                            C


                                                                             3/8
Data and Knowledge Engineering Lab                                           11
Neighborhood Random Walk (2/2)


                                t=0                                 t=1
                    B
          1
                                      1/2                   B
                        1
    A                                               1
                                                                            1/2
                                                                1
               1/2                              A
                            C
                                                        1/2         C
                                t=2
                B
      1                                                             t=3
                                  1/2                   B
                    1
  A                                             1
                                                                          1/2
                                                            1
              1/2           C               A

                                                    1/2         C

                                                                                  3/8
Data and Knowledge Engineering Lab                                                12
Graph Clustering with Structure & Attribute (5/11)

 The Kinds of Vertices and Edges
    Two kinds of vertices
         • The Structure Vertex Set V
         • The Attribute Vertex Set Va


    Two kinds of edges
         • The structure edges E
         • The attribute edges Ea


    The attribute augmented graph




                                                      3/8
    Data and Knowledge Engineering Lab                13
Graph Clustering with Structure & Attribute (6/11)

 New Clustering Framework
                                  Calculate the distance


                            Initialize the cluster centroids


                              Assign vertices to a cluster


                             Update the cluster centroids


                         Adjust edge weights automatically


                          Re-calculate the distance matrix
     The objective function converges


                                                               3/8
   Data and Knowledge Engineering Lab                          14
Graph Clustering with Structure & Attribute (7/11)

 Transition Probability Matrix on Attribute Augmented Graph




      PV: probabilities from structure vertices to structure vertices
      A: probabilities from structure vertices to attribute vertices
      B: probabilities from attribute vertices to structure vertices
      O: probabilities from attributes to attributes, all entries are zero

                                                                              3/8
   Data and Knowledge Engineering Lab                                         15
Graph Clustering with Structure & Attribute (8/11)

 A Unified Distance Measure
    The unified neighborhood random walk distance:


    The matrix form of the neighborhood random walk distance:


 Cluster Centroid Initialization
    Identify good initial centroids from the density point of view
     [Hinneburg and Keim, AAAI 1998]

    Influence function of vi on vj


    Density function of vi

                                                                      3/8
    Data and Knowledge Engineering Lab                                16
Graph Clustering with Structure & Attribute (9/11)

 Clustering Process (K-means framework)
    Assign each vertex vi              V to its closest centroid c* :


    Update the centroid with the most centrally located vertex in
     each cluster:
        •   Compute the “average point” vi of a cluster Vi




        • Find the new centroid whose random walk distance vector is the closest to
          the cluster average




                                                                                      3/8
   Data and Knowledge Engineering Lab                                                 17
Graph Clustering with Structure & Attribute (10/11)

 Edge Weight Definition
    Different types of edges may have different degrees of importance
        • Structure edge weight 0 fixed to 1.0 in the whole clustering process
        • Attribute edge weight i for i 1,2,...,m
        • All weights are initialized to 1.0, but will be automatically updated during clustering



  “Topic” has a
  more important
  role than “age”




                                                                                                    3/8
   Data and Knowledge Engineering Lab                                                               18
Graph Clustering with Structure & Attribute (11/11)

 Weight Self-Adjustment
    A vote mechanism determines whether two vertices share an
     attribute value:


    Weight Increment:




    How the weight adjustment affects clustering convergence?
        • Objective Function


        • Demonstrate that the weights are adjusted towards the direction of
          clustering convergence when we iteratively refine the clusters.



                                                                               3/8
   Data and Knowledge Engineering Lab                                          19
Experimental Evaluation (1/5)

 Datasets
    Political Blogs Dataset: 1490 vertices, 19090 edges, one
     attribute political leaning
    DBLP Dataset: 5000 vertices, 16010 edges, two attributes
     prolific and topic

 Methods
      K-SNAP [Tian et al., SIGMOD'08]: attribute only
      S-Cluster structure-based clustering
      W-Cluster weighted function
      SA-Cluster proposed method




                                                                3/8
   Data and Knowledge Engineering Lab                           20
Experimental Evaluation (2/5)

 Evaluation Metrics
    Density: intra-cluster structural cohesiveness




    Entropy: intra-cluster attribute homogeneity




                                                      3/8
   Data and Knowledge Engineering Lab                 21
Experimental Evaluation (3/5)

 Cluster Quality Evaluation




                                                   3/8
   Data and Knowledge Engineering Lab              22
Experimental Evaluation (4/5)

 Cluster Quality Evaluation




                                                   3/8
   Data and Knowledge Engineering Lab              23
Experimental Evaluation (5/5)

 Clustering Convergence




                                                   3/8
   Data and Knowledge Engineering Lab              24
Conclusion
 Studied the problem of clustering graph with multiple
  attributes on the attribute augmented graph

 A unified neighborhood random walk distance measures vertex
  closeness on an attribute augmented graph

 Theoretical analysis to quantitatively        estimate   the
  contributions of attribute similarity

 Automatically adjust the degree of contributions of different
  attributes towards the direction of clustering convergence



                                                                  3/8
   Data and Knowledge Engineering Lab                         25
Critical Review
 In literature, many algorithms have been proposed by various
  authors, however they consider structural or attribute aspect
  for finding similarities among nodes in the graph

 In this paper, both aspects are considered simultaneously
  which reflect the true nature of the cluster or similarity among
  different objects

 It utilizes the concept of Random Walk on the graph which
  requires matrix manipulation (i.e. multiplication) so it become
  unrealistic for huge dataset

 Due to iterative calculation of the similarity , it can not be
  scalable to huge network (graph dataset)
                                                                     3/8
    Data and Knowledge Engineering Lab                           26
Feasible Improvements
 Iterative nature of the similarity calculation should be avoided
  by incorporating other feasible methods for relevancy check

 It can be scalable to the network where the nodes are not
  densely connected with each other. In this way, they have less
  degree and similarity calculation can be done easily

 Augmentation process can be remodeled/avoided to reduce
  the space complexity and time consumption




                                                                     3/8
    Data and Knowledge Engineering Lab                           27
Questions




                                Suggestions…!
                                                3/8
Data and Knowledge Engineering Lab              28

More Related Content

PPTX
Collaborative Similarity Measure for Intra-Graph Clustering
PDF
Scalable and Adaptive Graph Querying with MapReduce
PDF
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
PPTX
A survey on graph kernels
PDF
Introduction to ggplot2
PDF
Recent advances on low-rank and sparse decomposition for moving object detection
PDF
Comparison of Matrix Completion Algorithms for Background Initialization in V...
Collaborative Similarity Measure for Intra-Graph Clustering
Scalable and Adaptive Graph Querying with MapReduce
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
A survey on graph kernels
Introduction to ggplot2
Recent advances on low-rank and sparse decomposition for moving object detection
Comparison of Matrix Completion Algorithms for Background Initialization in V...

What's hot (20)

PDF
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
PDF
CSMR11b.ppt
PDF
DNR - Auto deep lab paper review ppt
PDF
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
PDF
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
PDF
International Journal of Engineering Research and Development
PDF
Matrix and Tensor Tools for Computer Vision
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PDF
Object Detection Beyond Mask R-CNN and RetinaNet III
PDF
ensembles_emptytemplate_v2
PDF
Detection focal loss 딥러닝 논문읽기 모임 발표자료
PDF
Deformable DETR Review [CDM]
PDF
Training and Inference for Deep Gaussian Processes
PDF
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
PDF
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
PDF
Recent Advances in Kernel-Based Graph Classification
PPT
Section5 Rbf
PDF
Exact network reconstruction from consensus signals and one eigen value
PDF
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PPTX
Compressing Graphs and Indexes with Recursive Graph Bisection
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
CSMR11b.ppt
DNR - Auto deep lab paper review ppt
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
International Journal of Engineering Research and Development
Matrix and Tensor Tools for Computer Vision
Tutorial of topological data analysis part 3(Mapper algorithm)
Object Detection Beyond Mask R-CNN and RetinaNet III
ensembles_emptytemplate_v2
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Deformable DETR Review [CDM]
Training and Inference for Deep Gaussian Processes
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Recent Advances in Kernel-Based Graph Classification
Section5 Rbf
Exact network reconstruction from consensus signals and one eigen value
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
Compressing Graphs and Indexes with Recursive Graph Bisection
Ad

Similar to Presentation on Graph Clustering (vldb 09) (20)

PDF
Bitmap Indexes for Relational XML Twig Query Processing
PDF
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
PPT
Modelo de dados vetorial e matricial - slides
PDF
Materials Design in the Age of Deep Learning and Quantum Computation
PPTX
Data Integration at the Ontology Engineering Group
PDF
Project TRAIN
PPT
Trends In Graph Data Management And Mining
PPTX
Topology in GIS
PDF
GraphREL: A Relational Graph Query Processor
PDF
Graph Theory and Databases
PPTX
Raster data and Vector data
PPT
Integrating GIS utility data in the UK
PDF
Cross domain sentiment classification via spectral feature alignment
 
PDF
Query Optimization Techniques in Graph Databases
PDF
Ontology-based approach for BIM exchanges
PDF
Spine net learning scale permuted backbone for recognition and localization
PPTX
Geographical Information System Power Point Presentation
PDF
Graph Space Viewer
PDF
Ijetcas14 314
PPTX
Geographical Information System (GIS)
Bitmap Indexes for Relational XML Twig Query Processing
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
Modelo de dados vetorial e matricial - slides
Materials Design in the Age of Deep Learning and Quantum Computation
Data Integration at the Ontology Engineering Group
Project TRAIN
Trends In Graph Data Management And Mining
Topology in GIS
GraphREL: A Relational Graph Query Processor
Graph Theory and Databases
Raster data and Vector data
Integrating GIS utility data in the UK
Cross domain sentiment classification via spectral feature alignment
 
Query Optimization Techniques in Graph Databases
Ontology-based approach for BIM exchanges
Spine net learning scale permuted backbone for recognition and localization
Geographical Information System Power Point Presentation
Graph Space Viewer
Ijetcas14 314
Geographical Information System (GIS)
Ad

More from Waqas Nawaz (13)

PPT
Undergraduate Course Ecommerce Lecture 1.ppt
PPTX
Design and analysis of algorithms - Abstract View
PPTX
(Icca 2014) shortest path analysis in social graphs
PPTX
(Icmia 2013) personalized community detection using collaborative similarity ...
PPTX
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
PDF
Andrewng webinar moocs
PPTX
Oritentation session at Kyung Hee University for new students 2014
PPTX
Fast directional weighted median filter for removal of random valued impulse ...
PPTX
Social Media and We
PPTX
Social Media vs. Social Relationships
PPTX
Fourteen steps to a clearly written technical paper
PPTX
Big data
PPT
강의(영어) 한국의Smu(이재창)-2012
Undergraduate Course Ecommerce Lecture 1.ppt
Design and analysis of algorithms - Abstract View
(Icca 2014) shortest path analysis in social graphs
(Icmia 2013) personalized community detection using collaborative similarity ...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
Andrewng webinar moocs
Oritentation session at Kyung Hee University for new students 2014
Fast directional weighted median filter for removal of random valued impulse ...
Social Media and We
Social Media vs. Social Relationships
Fourteen steps to a clearly written technical paper
Big data
강의(영어) 한국의Smu(이재창)-2012

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Institutional Correction lecture only . . .
PPTX
Lesson notes of climatology university.
2.FourierTransform-ShortQuestionswithAnswers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GDM (1) (1).pptx small presentation for students
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
102 student loan defaulters named and shamed – Is someone you know on the list?
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Complications of Minimal Access Surgery at WLH
O7-L3 Supply Chain Operations - ICLT Program
Sports Quiz easy sports quiz sports quiz
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pharma ospi slides which help in ospi learning
Institutional Correction lecture only . . .
Lesson notes of climatology university.

Presentation on Graph Clustering (vldb 09)

  • 1. Graph Clustering Based on Structural/Attribute Similarities Yang Zhou, Hong Cheng, Jeffrey Xu Yu Proc. Of the VLDB Endowment, France, 2009 Thursday, August 16, 2012 Presenter Waqas Nawaz Data Knowledge and Engineering Lab, Kyung Hee University Korea
  • 2. Agenda 3/8 Data and Knowledge Engineering Lab 2
  • 3. Introduction  X = {x1, … , xN}: a set of data points  S = (sij)i,j=1,…,N: the similarity matrix in which each element indicates the similarity sij between two data points xi and xj  The goal of clustering is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar.  Modeling the dataset as a graph  The clustering problem in graph perspective is then formulated as a partition of the graph such that nodes in the same sub-graph are densely connected/homogeneous and sparsely connected /heterogeneous to the rest of the graph.  Distances and similarities are reverse to each other. In the following, only talk about similarities, everything also works with distances. 3/8 Data and Knowledge Engineering Lab 3
  • 4. Motivation  The identification of clusters, well-connected components in a graph, which is useful in many applications from biological function prediction to social community detection Attribute of Authors from manyeyes.alphaworks.ibm.com 3/8 Data and Knowledge Engineering Lab 4
  • 5. Objective  A desired clustering of attributed graph should achieve a good balance between the following:  Structural cohesiveness: Vertices within one cluster are close to each other in terms of structure, while vertices between clusters are distant from each other  Attribute homogeneity: Vertices within one cluster have similar attribute values, while vertices between clusters have quite different attribute values Structural Cohesiveness Attribute Homogeneity 3/8 Data and Knowledge Engineering Lab 5
  • 6. Related Work  Structure Based Clustering  Normalized cuts [Shi and Malik, TPAMI 2000]  Modularity [Newman and Girvan, Phys. Rev. 2004]  SCAN [Xu et al., KDD'07] The clusters generated have a rather random distribution of vertex properties within clusters  Attribute Based Clustering  K-SNAP [Tian et al., SIGMOD’08]  Attributes compatible grouping The clusters generated have a rather loose intra-cluster structure Is there any way to consider both factors (Structure and Attribute) simultaneously while Clustering…? YES 3/8 Data and Knowledge Engineering Lab 6
  • 7. Graph Clustering with Structure & Attribute (1/11)  Structure-based Clustering  Vertices with heterogeneous values in a cluster  Attribute-based Clustering  Lose much structure information  Structural/Attribute Cluster  Vertices with homogeneous values in a cluster  Keep most structure information 3/8 Data and Knowledge Engineering Lab 7
  • 8. Graph Clustering with Structure & Attribute (2/11) r1. XML  Example: A Coauthor Network Attribute-based Cluster Structural Clustering Structural/Attribute Cluster r3. XML, Skyline r2. XML r4. XML r5. XML r6. XML r9. Skyline r10. Skyline r11. Skyline r7. XML r8. XML 3/8 Data and Knowledge Engineering Lab 8
  • 9. Graph Clustering with Structure & Attribute (3/11)  Proposed iDEA: Flow Diagram G Transform vertex attributes Desired to attribute edges Clusters Clustering Ga on G Mapping onto the A unified distance original graph Clustering on edges on Ga 3/8 Data and Knowledge Engineering Lab 9
  • 10. Graph Clustering with Structure & Attribute (4/11)  Attribute Augmented Coauthor Graph with Topics r1. XML r3. XML, Skyline r2. XML r4. XML r5. XML r6. XML r9. Skyline r10. Skyline r11. Skyline r7. XML r8. XML Original Modified Then we use neighborhood random walk distance on the augmented graph to combine structural and attribute similarities 3/8 Data and Knowledge Engineering Lab 10
  • 11. Neighborhood Random Walk (1/2) A B C A B C A A B B C C Adjacency matrix A Transition matrix P B B 1 1 1 1/2 1 1 A A 1 1/2 C C 3/8 Data and Knowledge Engineering Lab 11
  • 12. Neighborhood Random Walk (2/2) t=0 t=1 B 1 1/2 B 1 A 1 1/2 1 1/2 A C 1/2 C t=2 B 1 t=3 1/2 B 1 A 1 1/2 1 1/2 C A 1/2 C 3/8 Data and Knowledge Engineering Lab 12
  • 13. Graph Clustering with Structure & Attribute (5/11)  The Kinds of Vertices and Edges  Two kinds of vertices • The Structure Vertex Set V • The Attribute Vertex Set Va  Two kinds of edges • The structure edges E • The attribute edges Ea  The attribute augmented graph 3/8 Data and Knowledge Engineering Lab 13
  • 14. Graph Clustering with Structure & Attribute (6/11)  New Clustering Framework Calculate the distance Initialize the cluster centroids Assign vertices to a cluster Update the cluster centroids Adjust edge weights automatically Re-calculate the distance matrix The objective function converges 3/8 Data and Knowledge Engineering Lab 14
  • 15. Graph Clustering with Structure & Attribute (7/11)  Transition Probability Matrix on Attribute Augmented Graph  PV: probabilities from structure vertices to structure vertices  A: probabilities from structure vertices to attribute vertices  B: probabilities from attribute vertices to structure vertices  O: probabilities from attributes to attributes, all entries are zero 3/8 Data and Knowledge Engineering Lab 15
  • 16. Graph Clustering with Structure & Attribute (8/11)  A Unified Distance Measure  The unified neighborhood random walk distance:  The matrix form of the neighborhood random walk distance:  Cluster Centroid Initialization  Identify good initial centroids from the density point of view [Hinneburg and Keim, AAAI 1998]  Influence function of vi on vj  Density function of vi 3/8 Data and Knowledge Engineering Lab 16
  • 17. Graph Clustering with Structure & Attribute (9/11)  Clustering Process (K-means framework)  Assign each vertex vi V to its closest centroid c* :  Update the centroid with the most centrally located vertex in each cluster: • Compute the “average point” vi of a cluster Vi • Find the new centroid whose random walk distance vector is the closest to the cluster average 3/8 Data and Knowledge Engineering Lab 17
  • 18. Graph Clustering with Structure & Attribute (10/11)  Edge Weight Definition  Different types of edges may have different degrees of importance • Structure edge weight 0 fixed to 1.0 in the whole clustering process • Attribute edge weight i for i 1,2,...,m • All weights are initialized to 1.0, but will be automatically updated during clustering “Topic” has a more important role than “age” 3/8 Data and Knowledge Engineering Lab 18
  • 19. Graph Clustering with Structure & Attribute (11/11)  Weight Self-Adjustment  A vote mechanism determines whether two vertices share an attribute value:  Weight Increment:  How the weight adjustment affects clustering convergence? • Objective Function • Demonstrate that the weights are adjusted towards the direction of clustering convergence when we iteratively refine the clusters. 3/8 Data and Knowledge Engineering Lab 19
  • 20. Experimental Evaluation (1/5)  Datasets  Political Blogs Dataset: 1490 vertices, 19090 edges, one attribute political leaning  DBLP Dataset: 5000 vertices, 16010 edges, two attributes prolific and topic  Methods  K-SNAP [Tian et al., SIGMOD'08]: attribute only  S-Cluster structure-based clustering  W-Cluster weighted function  SA-Cluster proposed method 3/8 Data and Knowledge Engineering Lab 20
  • 21. Experimental Evaluation (2/5)  Evaluation Metrics  Density: intra-cluster structural cohesiveness  Entropy: intra-cluster attribute homogeneity 3/8 Data and Knowledge Engineering Lab 21
  • 22. Experimental Evaluation (3/5)  Cluster Quality Evaluation 3/8 Data and Knowledge Engineering Lab 22
  • 23. Experimental Evaluation (4/5)  Cluster Quality Evaluation 3/8 Data and Knowledge Engineering Lab 23
  • 24. Experimental Evaluation (5/5)  Clustering Convergence 3/8 Data and Knowledge Engineering Lab 24
  • 25. Conclusion  Studied the problem of clustering graph with multiple attributes on the attribute augmented graph  A unified neighborhood random walk distance measures vertex closeness on an attribute augmented graph  Theoretical analysis to quantitatively estimate the contributions of attribute similarity  Automatically adjust the degree of contributions of different attributes towards the direction of clustering convergence 3/8 Data and Knowledge Engineering Lab 25
  • 26. Critical Review  In literature, many algorithms have been proposed by various authors, however they consider structural or attribute aspect for finding similarities among nodes in the graph  In this paper, both aspects are considered simultaneously which reflect the true nature of the cluster or similarity among different objects  It utilizes the concept of Random Walk on the graph which requires matrix manipulation (i.e. multiplication) so it become unrealistic for huge dataset  Due to iterative calculation of the similarity , it can not be scalable to huge network (graph dataset) 3/8 Data and Knowledge Engineering Lab 26
  • 27. Feasible Improvements  Iterative nature of the similarity calculation should be avoided by incorporating other feasible methods for relevancy check  It can be scalable to the network where the nodes are not densely connected with each other. In this way, they have less degree and similarity calculation can be done easily  Augmentation process can be remodeled/avoided to reduce the space complexity and time consumption 3/8 Data and Knowledge Engineering Lab 27
  • 28. Questions Suggestions…! 3/8 Data and Knowledge Engineering Lab 28