13 Community Detection

Communities in Networks
Peter J. Mucha, UNC–Chapel Hill
AGRICULTURE
APPROPRIATIONS
INTERNATIONAL RELATIONS
BUDGET
HOUSE ADMINISTRATION
ENERGY/COMMERCE
FINANCIAL SERVICES
VETERANS’ AFFAIRS
EDUCATION
ARMED SERVICES
JUDICIARY
RESOURCES
RULES
SCIENCE
SMALL BUSINESS
OFFICIAL CONDUCT
TRANSPORTATION
GOVERNMENT REFORM
WAYS AND MEANS
INTELLIGENCE
HOMELAND SECURITY

Outline & Acknowledgements
1. What is community detection and
why is it useful?
2. How do you calculate communities?
– Software links
– Importance of resolution parameters
3. Short intro to multilayer networks
– If time permits (I’ll leave you slides)
4. R Notebook
 Shankar Bhamidi, Skyler Cranmer,
James Fowler, James Gleeson,
Jeff Henderson, Jim Moody,
Marc Niethammer, Andrew Nobel,
J-P Onnela, Mason Porter
 Dani Bassett, Clara Granell,
Kaveri Parker, Saray Shai, Dane
Taylor
 Elizabeth Menninga, Natalie Stanley,
Mandi Traud, Andrew Waugh, William
Weir, James Wilson
 Thomas Callaghan, Scott Emmons,
AJ Friend, Ryan Gibson, Eric Kelsic,
Kevin Macon, Thomas Richardson,
Casey Warmbrand
 ARO, CDC, JSMF, NICHD, NIDDK,
NIGMS, NSF, UNC UCRF
Apologies that this presentation will seriously err on the self-absorbed side.
It’s a big field, and I do not promise to cover even a small piece of it here.

The Real Outline
why is it useful?
– Software links
– If time permits (I’ll leave you slides)
4. R Notebook
Apologies that this presentation will seriously err on the self-absorbed side.
It’s a big field, and I do not promise to cover even a small piece of it here.
~25% Theory
~25% Examples
~25% Demonstration
~25% Bad Jokes

 Jim Moody (paraphrased):
“I’ve been accused of turning everything into a network.”
 PJM (in response):
“I’m accused of turning everything into a network and a graph partitioning problem.”
 “Structure  Function”
Philosophical Disclaimer
Images by Aaron Clauset

Karate Club Example
This partition optimizes modularity, which measures the
number of intra-community ties (relative to a random model)
“If your method doesn’t work on this network, then go home.”

Karate Club Club
“Cris Moore (left) is the
inaugural recipient of the
Zachary Karate Club Club prize,
awarded on behalf of the
community by Aric Hagberg
(right). (9 May 2013)”

Facebook
Traud et al., “Comparing community structure to characteristics in
online collegiate social networks” (2011)
Traud et al., “Social structure of Facebook networks” (2012)
Caltech 2005:
Colors indicate residential
“House” affiliations
Purple = Not provided

Facebook
Traud et al., “Comparing community structure to characteristics in
online collegiate social networks” (2011)
Traud et al., “Social structure of Facebook networks” (2012)
Caltech 2005:
Colors indicate residential
“House” affiliations

Community Detection Firehose Overview
 “Hard/rigid” v. “soft/overlapping” clusters
 cf. biclustering methods and mathematics of expander graphs
 A community should describe a “cohesive group”: varying formulations/algorithms
• Linkage clustering (average, single), local clustering coefficients,
betweeness (geodesic, random walk), spectral, conductance,…
 Classic approach in CS: Spectral Graph Partitioning
• Need to specify number of communities sought
 Conductance
 MDL, Infomap, OSLOM, … (many other things I’ve missed) …
 Stochastic Block Models: generative with in/out probabilities between labeled groups
 Modularity: a good partition has more total intra-community edge weight than one would
expect at random (but according to what model?)
“Communities in Networks,” M. A. Porter, J.-P. Onnela & P. J. Mucha,
Notices of the American Mathematical Society 56, 1082-97 & 1164-6 (2009).
“Community Detection in Graphs,” S. Fortunato, Physics Reports 486, 75-174 (2010).
“Community detection in networks: A user guide,” S. Fortunato & D. Hric, Physics Reports 659, 1-44 (2016).
“Case studies in network community detection,” S. Shai, N. Stanley, C. Granell, D. Taylor & P. J. Mucha, arXiv:1705.02305.

Modularity (see Newman & Girvan 2004 and other Newman papers)
Total edge
weight
Modularity
matrix
Indicator on
nodes i & j in
same community
Your data:
Edge from i to j?
Random
“null model”
for expected
edge weight

 GOAL: Assign nodes to communities in order to maximize
quality function Q
 NP-Complete [Brandes et al. 2008]
~ enumerate possible partitions
 Numerous packages developed/developing
• e.g. igraph library (R, python), NetworkX, Louvain
• Need appropriate null model

 ER degree distribution (binomial/Poisson) is not a good model
for many real-world data sets
 Independent edges, constrained to expected
degree sequence same as observed.
 Requires Pij = f(ki)f(kj), quickly yielding
 g resolution parameter ad hoc (default = 1)
[Reichardt & Bornholdt, 2006; Lambiotte et al., 2008 & 2015]

Null Models for Modularity Quality Functions
 Erdős–Rényi (Bernoulli)  Newman-Girvan*
• Leicht-Newman* (directed) • Barber* (bipartite)

Louvain Method (Blondel et al., “Fast unfolding of communities in large networks”, 2008)

“Virality Prediction and Community Structure in
Social Networks”, Weng, Menczer & Ahn (2013)

Melnik et al., “Dynamics on modular networks with
heterogeneous correlations” (2014)
Fraction of active nodes
Watts threshold model
Multi-university Facebook network

Lambiotte, Delvenne & Barahona [arXiv:0812.1770]
showed a way to derive modularity from normalized
Laplacian dynamics, defining partition quality in terms
of stability (autocovariance in Markov process)
Expansion of matrix exponential to first-order in t recovers
Newman-Girvan modularity with resolution g = 1/t.
(This is going to be important again for multilayer networks)
Modularity from Laplacian Dynamics

U.S. Congressional Roll Call as a similarity network
Waugh et al., “Party polarization in Congress: a network science approach” (2009)
AGRICULTURE
APPROPRIATIONS
INTERNATIONAL RELATIONS
BUDGET
HOUSE ADMINISTRATION
ENERGY/COMMERCE
FINANCIAL SERVICES
VETERANS’ AFFAIRS
EDUCATION
ARMED SERVICES
JUDICIARY
RESOURCES
RULES
SCIENCE
SMALL BUSINESS
OFFICIAL CONDUCT
TRANSPORTATION
GOVERNMENT REFORM
WAYS AND MEANS
INTELLIGENCE
HOMELAND SECURITY
Adjacency matrix of similarities is dense
and weighted, cf. other typical networks
(see committees: weighted but sparse)
85th Senate

U.S. Congressional Roll Call as a similarity network
Waugh et al., “Party polarization in Congress: a network science approach” (2009)
85th Senate 108th Senate

Moody & Mucha, “Portrait of political party polarization” (2013)

Parker et al., “Network Analysis Reveals Sex- and Antibiotic Resistance-
Associated Antivirulence Targets in Clinical Uropathogens” (2015)

Outline & Summary
why is it useful?
– Software links
4. R Notebook

Recall the (pesky) resolution parameter?
Fenn et al., “Dynamic Communities in
Multichannel Data: An Application to the
Foreign Exchange Market During the
2007-2008 Credit Crisis” (2009)

Picking resolution parameters still active research
https://github.com/wweir827/CHAMP

“Division I-A” College Football
50,000 Louvain calls
384 unique partitions

But Qs(g) isn’t a point; it’s a line for partition s

But Qs(g) isn’t a point; it’s a line for partition s
19 admissible partitions

Pairwise compare admissible partitions
19 admissible partitions

Human Protein Reactome
20,000 calls
19,980 unique
39 admissible

Self loops of weight r as a form of resolution parameter
Arenas et al., “Analysis of the structure of complex networks at different resolution levels” (2008)
(see also Shai et al., “Case studies in network community detection,” 2017)

Outline & Summary
why is it useful?
– Software links
– We are surely out of time… If we had
more time, we would talk a lot about
the refs in the following slides
4. R Notebook
 Networks appear in many
disciplines
 Network representations provide a
flexible framework for studying
general data types, leveraging
methods of social network analysis
and network science.
 Community detection is a powerful
tool for exploring and
understanding network structures,
including multilayer networks.
 Network structures identify
essential features for modeling and
understanding data in applications.

Multilayer Networks
OrderedCategorical
Mucha et al., “Community structure in time-dependent,
multiscale, and multiplex networks” (2010)
Kivelä et al., “Multilayer Networks” (2014)

Multilayer Modularity
Mucha et al., “Community structure in time-dependent, multiscale, and multiplex networks” (2010)
How to count the expected weights of interlayer arcs given that they are definitional to the data structure?
Generalized Lambiotte et al. (2008) connection between modularity and autocorrelation under Laplacian dynamics
to re-derive null models for bipartite (Barber), directed (Leicht-Newman), and signed (Traag et al.) networks,
specified in terms of one-step conditional probabilities
intra-layer
adjacency
data and null
inter-layer
identity arcs
Same formalism works for more general multilayer networks,
with sum over inter-layer connections within same community

U.S. Senators across 2-year Congresses
Mucha et al., “Community
structure in time-dependent,
multiscale, and multiplex
networks” (2010)
Each point is a
Senator in a Congress
Colored bars indicate
temporal extent of each
community, labeled by
nominal party labels
Grey bars indicate Congresses
including more than two
communities

Bassett et al. “Dynamic reconfiguration of human
brain networks during learning” (2011)

Cranmer et al., “Kantian fractionalization predicts the
conflict propensity of the international system” (2015)
• Identified communities of
nation states in multiplex
international relations of trade,
IGOs, democracies
• Granger causal relationship to
total system-level conflict
• Negligible contribution from
joint democracy layer

See mapequation.org
Phys. Rev. X 6, 011036 (2016)

Stanley et al., “Clustering network layers with the
strata multilayer stochastic block model” (2016)

Multilayer CHAMP (Weir et al., 2017)

U.S. Senate Roll Call Similarities (Congresses 1-110)
240,000 GenLouvain calls; 197,879 unique partitions; 1,447 admissible partitions

Outline & Summary
why is it useful?
– Software links
4. R Notebook
 Networks appear in many
disciplines
 Network representations provide a
flexible framework for studying
general data types, leveraging
methods of social network analysis
and network science.
 Community detection is a powerful
tool for exploring and
understanding network structures,
including multilayer networks.
 Network structures identify
essential features for modeling and
understanding data in applications.

13 Community Detection

More Related Content

What's hot (20)

Similar to 13 Community Detection (20)

More from Duke Network Analysis Center (17)

Recently uploaded (20)

13 Community Detection