SlideShare a Scribd company logo
M E T I S M E E T U P
Networks All Around Us: Analyzing Networks in your Problem Domain | 3/3/2016
Russell Jurney
http://bit.ly/socialnetworkanalysis2
RELATO
MAPS
MARKET
BACKGROUND
Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu,
Apache Whirr, Azkaban, MongoDB
Apache Commi?er
Three-Bme O'Reilly Author Started & Shipped Product at E8 Security
Ning, LinkedIn, Hortonworks veteran
2009 2010 2011
2012 2014
EXAMPLES OF NETWORKS
FOUNDER
NETWORKS
node = company
edge = employment transition as in people who…
…worked at one startup, founded another
WEBSITE
BEHAVIOR
node = web page
edge = user browses one page, then another
ONLINE
SOCIAL
NETWORKS
node = linkedin profile, edge = linked connection
EMAIL
INBOX
node = email address, edge = sent email
MARKETS
node = company, edge = partnership
MARKET
REPORTS
TYPES OF NETWORKS
TINKERPOP
“Marko Rodriguez is the Doug Cutting of graph analytics.”
—Mark Twain
PROPERTY
GRAPHS
A PROPERTY
GRAPH IN
EVERY
DATABASE
PROPERTY GRAPHS IN YOUR DOMAIN
identify entities
identify relationships
specify schema (or not)
populate graph database
learn to think in graph walks (hard)
query in batch
query in realtime
POPULATING A PROPERTY GRAPH
// Add nodes
while((json = company_reader.readLine()) != null)
{
document = jsonSlurper.parseText(json)
v = graph.addVertex('company')
v.property("_id", document._id)
v.property("domain", document.domain)
v.property("name", document.name)
}
POPULATING A PROPERTY GRAPH
// Get a graph traverser
g = graph.traversal()
while((json = links_reader.readLine()) != null)
{
document = jsonSlurper.parseText(json)
// Add edges to graph
v1 = g.V().has('domain', document.home_domain).next()
v2 = g.V().has('domain', document.link_domain).next()
v1.addEdge(document.type, v2)
}
MULTI
RELATIONAL
TO SINGLE
RELATIONAL
g.E(‘friend’).subgraph()
final Graph g = TinkerFactory.createClassic();
try (final OutputStream os = new FileOutputStream(“jsondump/links.json")) {
GraphSONWriter.build().create().writeGraph(os, g);
}
EXPORT LINKS AS JSON
THEN USE
SNA
LIBRARIES
#
# Example - calculate friendship dispersion
#
di_graph = nx.DiGraph()
all_edges = util.json_cr_file_2_array('jsondump/links.json')
for edge in all_edges:
if 'type' in edge and edge['type'] == 'partnership':
di_graph.add_edge(edge['domain1'], edge[‘domain2'])
dispersion = nx.dispersion(di_graph)
TOOLS OF
SNA
SNA = Social Network Analysis
centrality
clustering
block models
cores
dispersion
center-pieces
CENTRALITY
Centrality is a way of measuring how central or important a particular
node is in a social network.
OR
What nodes should I care about?
SINGLE-RELATIONAL CENTRALITY(S)
# all-links-the-same-type-centrality
g.V().out().groupCount()
# things-humans-walk-centrality
g.V().hasLabel(‘human’).out(‘walks’).groupCount()
# things-dogs-eat-centrality
g.V().hasLabel(‘dog’).out(‘eats’).groupCount()
MULTI-RELATIONAL CENTRALITY(S)
# things-eaten-by-things-humans-walk-centrality
g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()
# things-hated-by-things-humans-pet-centrality
g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()
# things-that-pet-things-that-eat-mice-centrality
g.V().in(‘eats’).in(‘pets’).groupCount()
CENTRALITIES
degree centrality
closeness centrality
betweenness centrality
eigenvector centrality
DEGREE CENTRALITY
in-degree centrality is nice…
it works even if you’re missing
a node’s outbound links
DEGREE CENTRALITY
# computation
count connections
…its that simple
in-degree centrality = popularity
out-degree centrality = gregariousness
# meaning
risk of catching cold
DEGREE CENTRALITY IN GREMLIN
# all-links-the-same-type-centrality
g.V().out().groupCount()
CLOSENESS CENTRALITY
# computation
count hops of all shortest paths
distance from all other nodes
reciprocal of farness
# meaning
communication efficiency
spread of information
CLOSENESS CENTRALITY IN GREMLIN
closenessCentrality =
g.V().as(“a”).repeat(both(‘relationship_type').simplePath()).emit().as("b")
.dedup().by(select(“a","b")).path()
.group().by(limit(local, 1)).by(count(local)
.map {1/it.get()}.sum())
BETWEENNESS CENTRALITY
# computation
count of times node appears in shortest paths…
…between all pairs of nodes
# meaning
control of communication between other nodes
EIGENVECTOR CENTRALITY
# computation
counts connections of connected nodes
more connected neighbors matter more
# meaning
influence of one node on others
pagerank is an eigenvector centrality
EIGENVECTOR CENTRALITY IN GREMLIN
g.V()
.repeat(out(‘relationship_type’).groupCount(‘m').by('unique_key'))
.times(n).cap('m')
CLUSTERING
CLUSTERING
property based clustering: k-means
graph based clustering: modularity
property graph based clustering: CESNA
BLOCK MODELS
how much do clusters
connect?
are links reciprocal?
circos are helpful
CORES
DISPERSION
Romantic Partnerships and the Dispersion of Social Ties:
A Network Analysis of Relationship Status on Facebook
CENTER-PIECE SUBGRAPHS
*Slide stolen from Tong, Faloutsos, Pan
Russell Jurney, CEO
rjurney@relato.io
twi?er.com/rjurney
404-317-3620
http://bit.ly/socialnetworkanalysis2

More Related Content

PDF
Social Network Analysis in Your Problem Domain
PDF
Networks All Around Us: Extracting networks from your problem domain
PDF
Agile Data Science 2.0 - Big Data Science Meetup
PDF
Agile Data Science
PDF
Agile Data Science 2.0: Using Spark with MongoDB
PDF
Agile Data Science 2.0
PPT
Agile Data Science: Building Hadoop Analytics Applications
PDF
Agile Data Science 2.0
Social Network Analysis in Your Problem Domain
Networks All Around Us: Extracting networks from your problem domain
Agile Data Science 2.0 - Big Data Science Meetup
Agile Data Science
Agile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science 2.0

What's hot (15)

PDF
Agile analytics applications on hadoop
PDF
Agile Data Science 2.0
PDF
Agile Data Science 2.0
PPT
Agile Data Science: Hadoop Analytics Applications
PPTX
Running Intelligent Applications inside a Database: Deep Learning with Python...
PDF
Data science apps: beyond notebooks
PDF
Telemetry doesn't have to be scary; Ben Ford
PPTX
Security Operations, Engineering, and Intelligence Integration through the po...
PDF
Increasing the Impact of Visualization Research
PDF
D3: Easy and flexible data visualisation using web standards
PDF
UBC STAT545 2014 Cm002 deep thoughts
PDF
Oracle to vb 6.0 connectivity
PPTX
Data science unit3
PPTX
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Agile analytics applications on hadoop
Agile Data Science 2.0
Agile Data Science 2.0
Agile Data Science: Hadoop Analytics Applications
Running Intelligent Applications inside a Database: Deep Learning with Python...
Data science apps: beyond notebooks
Telemetry doesn't have to be scary; Ben Ford
Security Operations, Engineering, and Intelligence Integration through the po...
Increasing the Impact of Visualization Research
D3: Easy and flexible data visualisation using web standards
UBC STAT545 2014 Cm002 deep thoughts
Oracle to vb 6.0 connectivity
Data science unit3
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Ad

Viewers also liked (18)

PDF
Introduction to PySpark
PPSX
Your moment is Waiting
PDF
JSON-LD Update
PDF
Enabling Multimodel Graphs with Apache TinkerPop
PDF
SF Python Meetup: TextRank in Python
PPTX
Mapa mental de un lider tahi
PPTX
Feb 13 17 word of the day (1)
PDF
Blistering fast access to Hadoop with SQL
PPTX
tarea 7 gabriel
PPTX
Mapa mental
PDF
Bitraf - Particle Photon IoT workshop
PDF
ConsumerLab: The Self-Driving Future
PDF
Zipcar
PDF
How to Become a Data Scientist
PDF
Networks All Around Us: Extracting networks from your problem domain
PDF
Teraproc Application Cluster-as-a-Service Overview Presentation
PDF
Creating HTML Pages
PDF
Top Insights from SaaStr by Leading Enterprise Software Experts
Introduction to PySpark
Your moment is Waiting
JSON-LD Update
Enabling Multimodel Graphs with Apache TinkerPop
SF Python Meetup: TextRank in Python
Mapa mental de un lider tahi
Feb 13 17 word of the day (1)
Blistering fast access to Hadoop with SQL
tarea 7 gabriel
Mapa mental
Bitraf - Particle Photon IoT workshop
ConsumerLab: The Self-Driving Future
Zipcar
How to Become a Data Scientist
Networks All Around Us: Extracting networks from your problem domain
Teraproc Application Cluster-as-a-Service Overview Presentation
Creating HTML Pages
Top Insights from SaaStr by Leading Enterprise Software Experts
Ad

Similar to Networks All Around Us: Extracting networks from your problem domain (20)

PPTX
Network analysis lecture
PPTX
Social Network Analysis (SNA) 2018
PDF
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
PPTX
AI Class Topic 5: Social Network Graph
PDF
Social network analysis basics
PDF
Mathematical Foundations For Social Network Analysis
PPTX
Social Network Analysis
PPTX
Discrete mathematics presentation related to application
PPTX
Social Network Analysis Introduction including Data Structure Graph overview.
PPTX
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
PPTX
Mining Social Networks, an Introduction and Overview - Andy Pryke
PDF
CS6010 Social Network Analysis Unit V
PDF
Graph Analyses with Python and NetworkX
PDF
Social Network Analysis & an Introduction to Tools
PPTX
Apache Spark GraphX highlights.
PPTX
Module1:Social Networks-PG(Computer Network Engineering)
PPTX
The Science Of Social Networks
PPTX
Small Worlds Social Graphs Social Media
PPT
Social Networks of Performance
PPT
Social Network Based Information Systems (Tin180 Com)
Network analysis lecture
Social Network Analysis (SNA) 2018
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
AI Class Topic 5: Social Network Graph
Social network analysis basics
Mathematical Foundations For Social Network Analysis
Social Network Analysis
Discrete mathematics presentation related to application
Social Network Analysis Introduction including Data Structure Graph overview.
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
Mining Social Networks, an Introduction and Overview - Andy Pryke
CS6010 Social Network Analysis Unit V
Graph Analyses with Python and NetworkX
Social Network Analysis & an Introduction to Tools
Apache Spark GraphX highlights.
Module1:Social Networks-PG(Computer Network Engineering)
The Science Of Social Networks
Small Worlds Social Graphs Social Media
Social Networks of Performance
Social Network Based Information Systems (Tin180 Com)

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PDF
Introduction to the R Programming Language
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Business Analytics and business intelligence.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
annual-report-2024-2025 original latest.
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
Introduction to the R Programming Language
IB Computer Science - Internal Assessment.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Business Analytics and business intelligence.pdf
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
SAP 2 completion done . PRESENTATION.pptx
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
annual-report-2024-2025 original latest.
[EN] Industrial Machine Downtime Prediction
STERILIZATION AND DISINFECTION-1.ppthhhbx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Database Infoormation System (DBIS).pptx

Networks All Around Us: Extracting networks from your problem domain