SlideShare a Scribd company logo
Network Analysis
Sara Terp, 2015
Network Analysis
• What is a network?
• What features does a network have?
• What analysis is possible with those features?
• How do we explain that analysis?
“Network”
“A group of interconnected people or things”
(Oxford English Dictionary)
Use networks to understand, use and explain
relationships
Infrastructure Networks
NPR: Visualising the US Power Grid
Transport for London: London Underground Map
Social Networks
(Sara’s Facebook friends, in Gephi)
Songs
Spotify API reference
Words
(Wise blogpost on word co-occurance matrices)
Network Analysis
Use networks to understand, use and explain
relationships
Network Features
C
D
A
B
E
F
G
Node
Edge
Directed
edge
Undirected
edge
Clique
Network Representations
• Diagram
• Adjacency matrix
[[ 0, 1, 1, 1, 0, 0, 0, 0, 0, 1],
[ 1, 0, 0, 1, 1, 0, 1, 0, 0, 1],
[ 1, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[ 1, 1, 1, 0, 1, 1, 1, 0, 0, 0],
[ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0],
[ 0, 0, 1, 1, 0, 0, 1, 0, 0, 0],
[ 0, 1, 0, 1, 1, 1, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[ 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
[ 1, 1, 0, 0, 0, 0, 0, 0, 1, 0]]
• Adjacency list
{0: [1, 2, 3, 9], 1: [0, 9, 3, 4, 6], 2: [0, 3, 5], 3: [0, 1, 2, 4, 5, 6],
4: [1, 3, 6], 5: [2, 3, 6], 6: [1, 3, 4, 5], 7: [8], 8: [9, 7], 9: [8, 1, 0]}
• Edge list
{(0,1),(0,2),(0,3),(0,9),(1,3),(1,4),(1,6),(1,9),(2,3),(2,5),(3,4),
(3,5),(3,6),(4,6),(5,6),(7,8),(8,9)}
• Maths
G = (V,E,e)
3
6
0
1
5
7
2
98
4
The NetworkX Library
• Python network analysis library
import networkx as nx
edgelist =
{(0,1),(0,2),(0,3),(0,9),(1,3),(1,4),(1,6),(1,9),(2,3),(2,5),(3,4),(3,5),(3,6),(4,6),(5,6),(7
,8),(8,9)}
G = nx.Graph()
for edge in edgelist:
G.add_edge(edge[0], edge[1])
Node Centrality
• Finding the most “important”/“influential” nodes
• i.e. how “central” is a node to the network
Degree centrality: “who has
lots of friends?”
3
6
0
1
5
7
2
98
4
3 0.666
0 0.555
1 0.555
5 0.444
6 0.444
2 0.333
4 0.333
9 0.333
8 0.222
7 0.111
nx.degree_centrality(G)
= number of edges directly connected to n
Betweenness centrality: “who
are the bridges”?
3
6
0
1
5
7
2
98
4
9 0.38
0 0.23
1 0.23
8 0.22
3 0.10
5 0.02
6 0.02
2 0.00
4 0.00
7 0.00 nx.betweenness_centrality(G)
= (number of shortest paths including n / total
number of shortest paths) / number of pairs of
nodes
Closeness centrality: “who
are the hubs”?
3
6
0
1
5
7
2
98
4
0 0.64
1 0.64
3 0.60
9 0.60
5 0.52
6 0.52
2 0.50
4 0.50
8 0.42
7 0.31 nx.closeness_centrality(G)
= sum(distance to each other node) / (number of nodes-1)
Eigenvalue centrality “who
has most network influence”?
3
6
0
1
5
7
2
98
4
3 0.48
0 0.39
1 0.39
5 0.35
6 0.35
2 0.28
4 0.28
9 0.19
8 0.04
7 0.01
nx.eigenvector_centrality(G)
Network properties
• Characteristic path length: average shortest
distance between all pairs of nodes
• Clustering coefficient: how likely a network is to
contain highly-connected groups
• Degree distribution: histogram of node degrees
Community Detection
“Are there groups in this network?”
“What can I do with that information?”
Disconnected Networks
• Not all nodes are connected to each other
• Connected component = every node in the
component can be reached from every other node
• Giant component = connected component that
covers most of the network
Cliques and K-Cores
nx.find_cliques(G)
nx.k_clique_communities(G, 3)
3
6
0
1
5
7
2
98
4
3-cores: [[0,2,3,5], [1,3,4,6]]
2-core: [0,1,2,3,4,5,6,9]
4-cliques: [[0,2,3,5],[1,3,4,6]]
3-cliques: [[0,1,3],[0,1,9]]
2-cliques: [[7,8],[8,9]]
Other Clique methods
• N-clique: every node in the clique is connected to all
other nodes by a path of length n or less
• P-clique: each node is connected to at least p% of
the other nodes in the group.
Network Effects
Predict how information or states (e.g. political opinion
or rumours) are most likely to move across a network
Diffusion (Simple contagion)
3
6
0
1
5
7
2
98
4
Complex contagion
3
6
0
1
5
7
2
98
4
Describing Networks
bl.ocks.org/mbostock/4062045
http://bost.ocks.org/mike/uberdata/
http://bl.ocks.org/mbostock/7607999
Network diagram Edge bundling
Network Analysis Tools
• Python libraries:
• NetworkX
• iGraph
• graph-tool
• Matplotlib (visualisation)
• Pygraphviz (visualisation)
• Mayavi (3d visualisation)
Longer list: http://en.wikipedia.org/wiki/Social_network_analysis_software
• Standalone tools:
• SNAP
• GUESS
• NetMiner (free for students)
• Gephi (visualisation)
• GraphViz (visualisation)
• NodeXL (excel add-on)

More Related Content

PPTX
Types of data and graphical representation
PPTX
4.5. logistic regression
 
PPTX
Univariate & bivariate analysis
PPT
Data preprocessing
PPT
Parametric and non parametric test
PDF
Missing data handling
PPTX
Introduction to Descriptive Statistics
PPTX
Applications of central tendency
Types of data and graphical representation
4.5. logistic regression
 
Univariate & bivariate analysis
Data preprocessing
Parametric and non parametric test
Missing data handling
Introduction to Descriptive Statistics
Applications of central tendency

What's hot (20)

PPT
Stochastic Process
PPT
Social Network Analysis
PPTX
Social Network Analysis
PPTX
Linear and Logistics Regression
PDF
Ridge regression
PPTX
3.5 Exploratory Data Analysis
PDF
Final exam 2012 spring
DOCX
Questionnaire Vs schedule.docx
PPTX
Logistic regression
PPT
Going MAD: A Framework For Delivering Pervasive BI Solutions
PPTX
Cluster analysis
PPT
Spss an introduction
PPTX
Shortest route and mst
PPTX
data science chapter-4,5,6
PDF
Deep learning Book 8.4-8.7
PPTX
Group and Community Detection in Social Networks
PDF
KIT-601 Lecture Notes-UNIT-1.pdf
PPT
similarity measure
PPTX
ML - Simple Linear Regression
PPT
Applications of mean ,mode & median
Stochastic Process
Social Network Analysis
Social Network Analysis
Linear and Logistics Regression
Ridge regression
3.5 Exploratory Data Analysis
Final exam 2012 spring
Questionnaire Vs schedule.docx
Logistic regression
Going MAD: A Framework For Delivering Pervasive BI Solutions
Cluster analysis
Spss an introduction
Shortest route and mst
data science chapter-4,5,6
Deep learning Book 8.4-8.7
Group and Community Detection in Social Networks
KIT-601 Lecture Notes-UNIT-1.pdf
similarity measure
ML - Simple Linear Regression
Applications of mean ,mode & median
Ad

Similar to Network analysis lecture (20)

PPTX
Session 09 learning relationships.pptx
PPTX
Session 09 learning relationships.pptx
PDF
Oxford Digital Humanities Summer School
PDF
Graph Analytics with Greenplum and Apache MADlib
PDF
Graph Analyses with Python and NetworkX
PPTX
Social Network Analysis (SNA) 2018
PPTX
Node XL - features and demo
PPTX
2013 NodeXL Social Media Network Analysis
PPTX
LSS'11: Charting Collections Of Connections In Social Media
PPTX
20111103 con tech2011-marc smith
PPTX
20120301 strata-marc smith-mapping social media networks with no coding using...
PPTX
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
PPTX
AI Class Topic 5: Social Network Graph
PPTX
Social Network Analysis for small learning groups
PPTX
Discrete mathematics presentation related to application
PPTX
Apache Spark GraphX highlights.
PPTX
20121010 marc smith - mapping collections of connections in social media with...
PPTX
The Science Of Social Networks
PPTX
Social Network Analysis Introduction including Data Structure Graph overview.
Session 09 learning relationships.pptx
Session 09 learning relationships.pptx
Oxford Digital Humanities Summer School
Graph Analytics with Greenplum and Apache MADlib
Graph Analyses with Python and NetworkX
Social Network Analysis (SNA) 2018
Node XL - features and demo
2013 NodeXL Social Media Network Analysis
LSS'11: Charting Collections Of Connections In Social Media
20111103 con tech2011-marc smith
20120301 strata-marc smith-mapping social media networks with no coding using...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
AI Class Topic 5: Social Network Graph
Social Network Analysis for small learning groups
Discrete mathematics presentation related to application
Apache Spark GraphX highlights.
20121010 marc smith - mapping collections of connections in social media with...
The Science Of Social Networks
Social Network Analysis Introduction including Data Structure Graph overview.
Ad

More from Sara-Jayne Terp (20)

PPTX
Distributed defense against disinformation: disinformation risk management an...
PPTX
Risk, SOCs, and mitigations: cognitive security is coming of age
PPTX
disinformation risk management: leveraging cyber security best practices to s...
PPTX
Cognitive security: all the other things
PPTX
The Business(es) of Disinformation
PPTX
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
PPTX
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
PPTX
2021-02-10_CogSecCollab_UBerkeley
PPTX
Using AMITT and ATT&CK frameworks
PPTX
2020 12 nyu-workshop_cog_sec
PPTX
2020 09-01 disclosure
PDF
2019 11 terp_mansonbulletproof_master copy
PPTX
BSidesLV 2018 talk: social engineering at scale, a community guide
PPTX
Social engineering at scale
PPTX
engineering misinformation
PPTX
Online misinformation: they're coming for our brainz now
PPTX
Sj terp ciwg_nyc2017_credibility_belief
PPT
Belief: learning about new problems from old things
PPT
risks and mitigations of releasing data
PPTX
Session 10 handling bigger data
Distributed defense against disinformation: disinformation risk management an...
Risk, SOCs, and mitigations: cognitive security is coming of age
disinformation risk management: leveraging cyber security best practices to s...
Cognitive security: all the other things
The Business(es) of Disinformation
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021-02-10_CogSecCollab_UBerkeley
Using AMITT and ATT&CK frameworks
2020 12 nyu-workshop_cog_sec
2020 09-01 disclosure
2019 11 terp_mansonbulletproof_master copy
BSidesLV 2018 talk: social engineering at scale, a community guide
Social engineering at scale
engineering misinformation
Online misinformation: they're coming for our brainz now
Sj terp ciwg_nyc2017_credibility_belief
Belief: learning about new problems from old things
risks and mitigations of releasing data
Session 10 handling bigger data

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Network analysis lecture

Editor's Notes

  • #3: Let’s talk about network analysis. Starting with “what is a network”. I can start talking about nodes and edges and maths and stuff, but it’s easier to start by showing you.
  • #4: A network is a set of things that are linked together. Networks are usually visualized as a set of points (“nodes”) connected by lines (“edges”). “relationships”: This can be as simple as “a relationship exists”, or as complex as “this probability matrix describes the complex relationship between the states in these nodes”
  • #5: Classic networks include things like communications and power grids; in my world, they also explain things like the movement of water supplies between streams, rivers, farms and processing plants. As you look at these examples, I want you to think about the types of questions you could start asking with this network data. For example, in infrastructure, you often only have access to the junctions and the fact that there *is* a connection between two points. With transport, the dataset gets richer. You not only have the nodes and links between them, you also have timetables that list the average time between stations (and the current state of the network) and the switching costs of changing lines at a station. Aside: The London Underground map is one of my favorite network visualizations: a wonderful simplification of a complex system.
  • #6: And here the dataset gets richer still: this is just my Facebook network; I have many other networks that I connect to people with, and overlapping uses for those networks. I can also start investigating the information that’s carried across those networks, and their effect on my state (e.g. my political opinions).
  • #7: Many datasets can be framed as networks. Here, the Spotify API gives me relationships between its artists; I can also create some of my own relationship data from this API by looking at which songs and artists are on the same playlists.
  • #8: Much of text analysis can also be framed as networks. Here’s a matrix showing words that occur together in sentences and how many times they’ve co-occurred in the dataset. If we see every words as a node, and every nonzero co-occurrance score as a link, we’ve got ourselves a network. This can also be applied at the document level, e.g. Jonathan Stray’s Overview project analysed networks of documents to find civilian deaths in the Iraq War.
  • #9: So why do we bother representing things as networks? After all, we could list the songs that are played together most, or the stations with the most travellers. The bottom line is that, when you look at something as a network, you can start to see which things have the most important relationships in your network, and where to concentrate effort if you want to affect it all (e.g. who do want to retweet your ideas?). We’re going to look at network analysis at 3 levels: node, group and network.
  • #10: But first, some nomenclature I’m using computer science language for this. Other groups that study networks and their words for nodes and edges are: Here: network, node, edge maths: graph, vertex, arc/edge Physics: network, site, bond Sociology: network, actor, relation Biology: network, node, edge
  • #11: This is all valid python code (you can use it to generate a network diagram with NetworkX - see next slide). Different representations are useful for different things (if you’re coding up your own algorithms): Diagram: good for explaining a network (especially if interactive) Adjacency matrix: good for dense graphs (can also use scipy.sparse to use this for sparse graphs) Adjacency list: good for sparse graphs (e.g. social networks tend to be sparse); used by NetworkX Edge list: good general representation Maths: good for describing algorithms. V = vertices (nodes); E=edges; e=map from edges to nodes. n is the number of nodes; m is the number of edges
  • #12: I’ve listed several python libraries for network management at the end of these slides. The one we’re using here is NetworkX. It produces ugly graphs, but has a good set of network analysis tools. NB Use nx.DiGraph() if you want a directed graph
  • #13: NetworkX centrality functions: http://networkx.lanl.gov/reference/algorithms.centrality.html
  • #14: Simplest form of centrality “degree” = how many direct links connect to this node Note that degree centrality is normalized (divided) by the largest possible number of connections per node: in this case, 9. Degree centrality is not a great measure of power: what’s important is the number of nodes that the node can easily reach, and the highest-ranked node might be part of a clique (e.g. not well connected to the outside world).
  • #15: Between = how many nodes are there between two nodes? Nodes with high betweenness have influence over the flow of information or goods through a network: they bridge separate communities (good) but also often are a single point of failure in communications between those communities (bad).
  • #16: Closeness = has the shortest average path to all other nodes in the network. Nodes with high closeness have great influence over the rest of the network, especially if influence diminishes with path length; these points are also good places to observe all information flows from.
  • #17: Eigenvector centrality measures how much influence a node has in the whole network, taking account of their connections to other highly-connected nodes. These are the “kings” of your network - they might not have great closeness or betweenness, but they do wield a lot of influence. PageRank is based on eigenvector centrality. NB You’ll need to look at the eigenvectors of the adjacency matrix to build this one, and like neural networks, eigenvector centrality algorithms won’t always converge to a solution.
  • #18: All available in networkx Social networks = short path lengths, high clustering, skewed degree distributions. Small worlds = lots of highly-connected small groups with fewer connections to other groups: Saw this effect in the Ebola response contact-tracking.
  • #19: Let’s look at communities: groupings within your network. These are useful for questions like “how is a network likely to split into groups” and “how do I efficiently influence this network”. Note that when we have a community, we can study it as a network in its own right, including finding the most important nodes in it. “Small world theory” = there are roughly 6 steps on the shortest path between each pair of nodes in the world (see also “6 degrees of Kevin Bacon” http://en.wikipedia.org/wiki/Six_degrees_of_separation). The maths works out at roughly s = ln(n)/ln(k) where n is the population size and k is the average number of connections per node. For k=30, s is usually roughly 6. NetworkX community functions: http://networkx.lanl.gov/reference/algorithms.community.html
  • #20: First, let’s cover networks where there isn’t a path from every node to every other node in the network. These networks are called “disconnected” networks and can be interesting because of the lack of connections between groups (e.g. you’re trying to most efficiently connect up different transport systems).
  • #21: These are group measures based on the numbers of links K-core: Every node in the clique is connected to K or more other nodes in the clique. Clique-level analysis and node-level analysis interact with each other, e.g. if you find a set of cliques in a network, you can then look for and use the central nodes in those cliques.
  • #22: K-cores and cliques don’t always find the natural cliques in a graph (especially one containing human relationship). N-cliques: “friend of friend” cliques; use Bron and Kerbosch algorithm. Issues include nodes that contribute to the clique aren’t included in it. P-clique addresses some of this. Other approaches: n-clans, k-plexes etc.: see http://faculty.ucr.edu/~hanneman/nettext/C11_Cliques.html#nclique
  • #24: Achoo! Diffusion model used when it’s important that you find *everybody* in contact, e.g. for Ebola, you have to assume that everyone an infectious person is in contact with is a potential carrier. Here, we assume that node 9 changes state first; in the next step of the algorithm, the nodes directly connected to it (0,1,7) change state; in the next step, the nodes connected to (0,1,7) change state, etc. etc. Thought experiment: infections are time-sensitive, e.g. you get infected, then either get better or die. How would you represent this in a network? What would you expect to happen in a small-world network? 
  • #25: Only if… Diffusion models for more complex choices, e.g. whether to go see a movie, based on your friends’ opinions plus reading movie reviews. In complex contagion, a node changes state based on the state of *all* its neighbors, and often also on outside information; just because 9 is in one state, 1 doesn’t have to change to that state too (but it might change state with probability p).
  • #26: Network diagrams are still the best way to describe networks Edge bundling is useful for small world networks Metanodes are useful for large networks of communities An adjacency matrix can help if it’s nicely grouped, but sometimes it’s just more confusing. Explaining graphs to the C-suite? Use visual cues they’re used to. Carefully. Some examples are in the Visualisation Periodic Table at http://www.visual-literacy.org/periodic_table/periodic_table.html