SlideShare a Scribd company logo
A Modern Approach to
Connected Data
Practical Graph Algorithms with
Neo4j
About me
Head of Developer Relations Engineering
Follow & tweet at me @mesirii
(graphs)-[:ARE]->(everywhere)
Value from Data Relationships
Common Current Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
http://neo4j.com/use-cases
Insights from Algorithms
Improving all use-cases
Graph Algorithms
Relevance
Clustering
Structural Insights
Machine Learning
Classification, Regression
NLP, Struct/Content Pred
NN <-> Graph
Graph As Compute
http://neo4j.com/use-cases
Example: Twitter Analytics
Right Relevance
neo4j.com/blog/graph-algorithms-make-election-data-great-again
Reasoning RR
Custom Pipeline Neo4j <-> R
Goal: Simplification & Performance
● PageRank
● Clustering
● Weighted Similarities
● ... prop. IP ...
Applicable Tools
• Store Tweets in Neo4j (e.g. 300M)
• Use APOC + Graph Algorithms for processing
• Use NLP Algorithms / Procedures (e.g. from GraphAware)
• Neo4j Tableau (WDC) Connector
• Use APOC for streaming results to Gephi
Graph Algorithms
Project Goals
• high performance graph
algorithms
• user friendly (procedures)
• support graph projections
• augment OLTP with OLAP
• integrate efficiently with live
Neo4j database
(read, write, Cypher)
• common Graph API to write your
own
Kudos
Development
• Martin Knobloch (AVGL)
• Paul Horn (AVGL)
Documentation
• Tomaz Bratanic
Usage 1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and
configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph
returns statistics
CALL algo.<name>('Label','TYPE',{conf});
5. Cypher projection:
pass in Cypher for node- and relationship-lists
CALL algo.<name>(
'MATCH ... RETURN id(n)',
'MATCH (n)-->(m) RETURN id(n), id(m)',
{graph:'cypher'})
Architecture
1. Load Data in parallel
from Neo4j
2. Store in efficient
Datastructure
3. Run Graph Algorithm
in parallel using
Graph API
4. Write data back in
parallel
Neo4j
1, 2
Algorithm
Datastructures
4
3
Graph API
TheAlgorithms
Centralities
•PageRank (baseline)
•Betweeness
•Closeness
•Degree
Example: PageRank
https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_page_rank
CALL algo.pageRank.stream
('Page', 'LINKS', {iterations:20, dampingFactor:0.85})
YIELD node, score
RETURN node,score order by score desc limit 20
CALL algo.pageRank('Page', 'LINKS',
{iterations:20, dampingFactor:0.85,
write: true,writeProperty:"pagerank"})
YIELD nodes, iterations, loadMillis, computeMillis,
writeMillis, dampingFactor, write, writeProperty
Clustering
•Label Propagation
•Louvain (Phase2)
•Union Find / WCC
•Strongly Connected
Components
•Triangle-Count/Clusteri
ng CoeffTheAlgorithms
Example: UnionFind (CC)
https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_connected_components
CALL algo.unionFind.stream('User', 'FRIEND', {})
YIELD nodeId,setId
RETURN setId, count(*)
ORDER BY count(*) DESC LIMIT 100;
CALL algo.unionFind('User', 'FRIEND',
{write:true, partitionProperty:'partition'})
YIELD nodes, setCount, loadMillis, computeMillis,
writeMillis
Path-Expansion / Traversal
• Single Short Path
• All-Nodes SSP
• Parallel BFS / DFS
TheAlgorithms
MATCH(start:Loc {name:'A'})
CALL algo.deltaStepping.stream(start, 'cost', 3.0)
YIELD nodeId, distance
RETURN nodeId, distance ORDER LIMIT 20
MATCH(start:Node {name:'A'})
CALL algo.deltaStepping(start, 'cost', 3.0,
{write:true, writeProperty:'ssp'})
YIELD nodeCount, loadDuration, evalDuration,
writeDuration
RETURN *
Example: UnionFind (CC)
https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_all_pairs_and_single_source_shortest_path
DEMO
Datasets
● Neo4j Community Graph
○ 280k nodes, 1.4m relationships
○ Centralities, Clustering, Grouping
● DBPedia
○ 11m nodes, 116m relationships
○ Page Rank, unionFind
● Bitcoin
○ 1.7bn nodes, 2.7bn rels
○ degree distribution
○ pageRank, unionFind
Neo4j Community Graph
● Neo4j Community activity from Twitter, GitHub, StackOverflow
● Let's look at tweets
● Tweet-PageRank
● Projection -> mention-user-network
○ centralities
○ clustering
○ grouping
DBPedia
● Shallow Copy of Wikipedia
● Just (Page) -[:Link]-> (Page)
CALL algo.pageRank.stream('Page', 'Link', {iterations:5}) YIELD node, score
WITH * ORDER BY score DESC LIMIT 5
RETURN node.title, score;
+--------------------------------------+
| node.title | score |
+--------------------------------------+
| "United States" | 13349.2 |
| "Animal" | 6077.77 |
| "France" | 5025.61 |
| "List of sovereign states" | 4913.92 |
| "Germany" | 4662.32 |
+--------------------------------------+
5 rows
46247 ms
DBPedia
CALL algo.pageRank('Page', 'Link', {write:true,iterations:20});
+--------------------------------------------------------------------------------------+
| nodes | iter | loadMillis | computeMillis | writeMillis | damping | writeProperty |
+--------------------------------------------------------------------------------------+
| 11474730 | 20 | 34106 | 9712 | 1810 | 0.85 | "pagerank" |
+--------------------------------------------------------------------------------------+
1 row
47888 ms
BitCoin
● Full Copy of the BitCoin BlockChain
○ from learnmeabitcoin.com
○ thanks Greg Walker
○ (see Online Meetup)
● 1.7bn nodes, 2.7bn rels
○ 474k blocks,
○ 240m tx,
○ 280m addresses
○ 650m outputs
BitCoin
● distribution of "locked" relationships for "addresses"
○ = participation in transactions
call apoc.stats.degrees('<locked');
+--------------------------------------------------------------------------------------------------------------+
| type | direction | total | p50 | p75 | p90 | p95 | p99 | p999 | max | min | mean |
+--------------------------------------------------------------------------------------------------------------+
| "locked" | "INCOMING" | 654662356 | 0 | 0 | 1 | 1 | 2 | 28 | 1891327 | 0 | 0.37588608290716047 |
+--------------------------------------------------------------------------------------------------------------+
1 row
308619 ms
● Inferred network of addresses, via transaction and output
● (a)<-[:locked]-(o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2)
● use cypher projections
● 1M outputs (24s) to start with, connected addresses via tx
● 10M outputs (296s)
call algo.unionFind.stream(
'match (o:output)-[:locked]->(a) with a limit 10000000
return id(a) as id',
'match (o:output)-[:locked]->(a) with o,a limit 10000000
match (o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2)
return id(a) as source, id(a2) as target, count(tx) as value',
{graph:'cypher'}) YIELD setId, nodeId
RETURn setId, count(*) as size
ORDER BY size DESC LIMIT 10;
BitCoin
+-------------------+
| setId | size |
+-------------------+
| 5036 | 4409420 |
| 6295282 | 1999 |
| 5839746 | 1488 |
| 9356302 | 833 |
| 6560901 | 733 |
| 6370777 | 637 |
| 8101710 | 392 |
| 5945867 | 369 |
| 2489036 | 264 |
| 1703620 | 203 |
+-------------------+
10 rows
296109 ms
First release July 2017
• neo4j.com/blog/efficient-graph-algorithms-neo4j
Second Release Sept/Oct 2017
• huge graphs, additional algorithms, bugfixes
Timing
• Launch & Perf:
• Docs: neo4j-contrib.github.io/neo4j-graph-algorithms
• Tomaz Bratanic: tbgraph.wordpress.com (docs, social, tube, GoT)
• Community Graph: github.com/community-graph
• Twitter Analytics:
neo4j.com/blog/graph-algorithms-make-election-data-great-again
More examples
Reading Material
• Thanks to Amy Hodler
• 13 Top Resources on Graph Theory
• neo4j.com/blog/
top-13-resources-graph-theory-algorithms/
We need your feedback & use-cases!
neo4j.com/slack -> #neo4j-graph-algorithms
github.com/neo4j-contrib/neo4j-graph-algorithms
michael@neo4j.com
Please get in touch!!
The Sky is the Limit
Questions?
Don't forget! Subscribe & Like
youtube.com/c/neo4j

More Related Content

PPTX
Graph Analytics: Graph Algorithms Inside Neo4j
PDF
Interpreting Relational Schema to Graphs
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
PPTX
A whirlwind tour of graph databases
PDF
Introducing Neo4j 3.0
PDF
Signals from outer space
PPTX
GraphQL - The new "Lingua Franca" for API-Development
PDF
Cypher and apache spark multiple graphs and more in open cypher
Graph Analytics: Graph Algorithms Inside Neo4j
Interpreting Relational Schema to Graphs
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
A whirlwind tour of graph databases
Introducing Neo4j 3.0
Signals from outer space
GraphQL - The new "Lingua Franca" for API-Development
Cypher and apache spark multiple graphs and more in open cypher

What's hot (20)

PDF
Combine Spring Data Neo4j and Spring Boot to quickl
PDF
How Graph Databases efficiently store, manage and query connected data at s...
PPTX
The openCypher Project - An Open Graph Query Language
PDF
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
PDF
Power of Polyglot Search
PPTX
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
PPTX
Building a modern Application with DataFrames
PDF
Congressional PageRank: Graph Analytics of US Congress With Neo4j
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
PDF
New Directions for Spark in 2015 - Spark Summit East
PPTX
Graph databases: Tinkerpop and Titan DB
PDF
Intro to Graphs and Neo4j
PDF
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
PDF
Automatic Detection of Web Trackers by Vasia Kalavri
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Multiplaform Solution for Graph Datasources
PDF
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
PDF
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
PDF
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
PDF
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Combine Spring Data Neo4j and Spring Boot to quickl
How Graph Databases efficiently store, manage and query connected data at s...
The openCypher Project - An Open Graph Query Language
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Power of Polyglot Search
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Building a modern Application with DataFrames
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Graphs are everywhere! Distributed graph computing with Spark GraphX
New Directions for Spark in 2015 - Spark Summit East
Graph databases: Tinkerpop and Titan DB
Intro to Graphs and Neo4j
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
Automatic Detection of Web Trackers by Vasia Kalavri
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Multiplaform Solution for Graph Datasources
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Ad

Similar to Practical Graph Algorithms with Neo4j (20)

PPTX
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
PDF
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
PPTX
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
PPTX
Swift distributed tracing method and tools v2
PPTX
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
PDF
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
Adaptive Query Optimization
PDF
Ge aviation spark application experience porting analytics into py spark ml p...
PDF
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
PDF
Performance Schema for MySQL Troubleshooting
PPTX
Presentation
PPTX
Writing efficient sql
ODP
Dynamic Tracing of your AMP web site
PDF
PGQL: A Language for Graphs
PDF
Window functions in MySQL 8.0
PDF
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
PDF
How to use Impala query plan and profile to fix performance issues
PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PDF
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Swift distributed tracing method and tools v2
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Adaptive Query Optimization
Ge aviation spark application experience porting analytics into py spark ml p...
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
Performance Schema for MySQL Troubleshooting
Presentation
Writing efficient sql
Dynamic Tracing of your AMP web site
PGQL: A Language for Graphs
Window functions in MySQL 8.0
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
How to use Impala query plan and profile to fix performance issues
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
GraphConnect 2014 SF: From Zero to Graph in 120: Scale
Ad

More from jexp (20)

PDF
Looming Marvelous - Virtual Threads in Java Javaland.pdf
PDF
Easing the daily grind with the awesome JDK command line tools
PDF
Looming Marvelous - Virtual Threads in Java
PPTX
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
PPTX
Neo4j Connector Apache Spark FiNCENFiles
PPTX
How Graphs Help Investigative Journalists to Connect the Dots
PPTX
The Home Office. Does it really work?
PDF
Polyglot Applications with GraalVM
PPTX
Neo4j Graph Streaming Services with Apache Kafka
PPTX
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
PPTX
Refactoring, 2nd Edition
PPTX
A Game of Data and GraphQL
PPTX
Querying Graphs with GraphQL
PDF
Graphs & Neo4j - Past Present Future
PDF
Class graph neo4j and software metrics
PDF
New Neo4j Auto HA Cluster
KEY
Spring Data Neo4j Intro SpringOne 2012
KEY
Intro to Cypher
KEY
Geekout publish
KEY
Intro to Neo4j presentation
Looming Marvelous - Virtual Threads in Java Javaland.pdf
Easing the daily grind with the awesome JDK command line tools
Looming Marvelous - Virtual Threads in Java
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
Neo4j Connector Apache Spark FiNCENFiles
How Graphs Help Investigative Journalists to Connect the Dots
The Home Office. Does it really work?
Polyglot Applications with GraalVM
Neo4j Graph Streaming Services with Apache Kafka
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
Refactoring, 2nd Edition
A Game of Data and GraphQL
Querying Graphs with GraphQL
Graphs & Neo4j - Past Present Future
Class graph neo4j and software metrics
New Neo4j Auto HA Cluster
Spring Data Neo4j Intro SpringOne 2012
Intro to Cypher
Geekout publish
Intro to Neo4j presentation

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PDF
System and Network Administraation Chapter 3
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
top salesforce developer skills in 2025.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Introduction to Artificial Intelligence
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
ISO 45001 Occupational Health and Safety Management System
medical staffing services at VALiNTRY
System and Network Administraation Chapter 3
Softaken Excel to vCard Converter Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
top salesforce developer skills in 2025.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
How Creative Agencies Leverage Project Management Software.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Understanding Forklifts - TECH EHS Solution
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction to Artificial Intelligence
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Which alternative to Crystal Reports is best for small or large businesses.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo POS Development Services by CandidRoot Solutions
ISO 45001 Occupational Health and Safety Management System

Practical Graph Algorithms with Neo4j

  • 1. A Modern Approach to Connected Data Practical Graph Algorithms with Neo4j
  • 2. About me Head of Developer Relations Engineering Follow & tweet at me @mesirii
  • 4. Value from Data Relationships Common Current Use Cases Internal Applications Master Data Management Network and IT Operations Fraud Detection Customer-Facing Applications Real-Time Recommendations Graph-Based Search Identity and Access Management http://neo4j.com/use-cases
  • 5. Insights from Algorithms Improving all use-cases Graph Algorithms Relevance Clustering Structural Insights Machine Learning Classification, Regression NLP, Struct/Content Pred NN <-> Graph Graph As Compute http://neo4j.com/use-cases
  • 6. Example: Twitter Analytics Right Relevance neo4j.com/blog/graph-algorithms-make-election-data-great-again
  • 9. Goal: Simplification & Performance ● PageRank ● Clustering ● Weighted Similarities ● ... prop. IP ...
  • 10. Applicable Tools • Store Tweets in Neo4j (e.g. 300M) • Use APOC + Graph Algorithms for processing • Use NLP Algorithms / Procedures (e.g. from GraphAware) • Neo4j Tableau (WDC) Connector • Use APOC for streaming results to Gephi
  • 12. Project Goals • high performance graph algorithms • user friendly (procedures) • support graph projections • augment OLTP with OLAP • integrate efficiently with live Neo4j database (read, write, Cypher) • common Graph API to write your own
  • 13. Kudos Development • Martin Knobloch (AVGL) • Paul Horn (AVGL) Documentation • Tomaz Bratanic
  • 14. Usage 1. Call as Cypher procedure 2. Pass in specification (Label, Prop, Query) and configuration 3. ~.stream variant returns (a lot) of results CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score 4. non-stream variant writes results to graph returns statistics CALL algo.<name>('Label','TYPE',{conf}); 5. Cypher projection: pass in Cypher for node- and relationship-lists CALL algo.<name>( 'MATCH ... RETURN id(n)', 'MATCH (n)-->(m) RETURN id(n), id(m)', {graph:'cypher'})
  • 15. Architecture 1. Load Data in parallel from Neo4j 2. Store in efficient Datastructure 3. Run Graph Algorithm in parallel using Graph API 4. Write data back in parallel Neo4j 1, 2 Algorithm Datastructures 4 3 Graph API
  • 17. Example: PageRank https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_page_rank CALL algo.pageRank.stream ('Page', 'LINKS', {iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node,score order by score desc limit 20 CALL algo.pageRank('Page', 'LINKS', {iterations:20, dampingFactor:0.85, write: true,writeProperty:"pagerank"}) YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty
  • 18. Clustering •Label Propagation •Louvain (Phase2) •Union Find / WCC •Strongly Connected Components •Triangle-Count/Clusteri ng CoeffTheAlgorithms
  • 19. Example: UnionFind (CC) https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_connected_components CALL algo.unionFind.stream('User', 'FRIEND', {}) YIELD nodeId,setId RETURN setId, count(*) ORDER BY count(*) DESC LIMIT 100; CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:'partition'}) YIELD nodes, setCount, loadMillis, computeMillis, writeMillis
  • 20. Path-Expansion / Traversal • Single Short Path • All-Nodes SSP • Parallel BFS / DFS TheAlgorithms
  • 21. MATCH(start:Loc {name:'A'}) CALL algo.deltaStepping.stream(start, 'cost', 3.0) YIELD nodeId, distance RETURN nodeId, distance ORDER LIMIT 20 MATCH(start:Node {name:'A'}) CALL algo.deltaStepping(start, 'cost', 3.0, {write:true, writeProperty:'ssp'}) YIELD nodeCount, loadDuration, evalDuration, writeDuration RETURN * Example: UnionFind (CC) https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_all_pairs_and_single_source_shortest_path
  • 22. DEMO
  • 23. Datasets ● Neo4j Community Graph ○ 280k nodes, 1.4m relationships ○ Centralities, Clustering, Grouping ● DBPedia ○ 11m nodes, 116m relationships ○ Page Rank, unionFind ● Bitcoin ○ 1.7bn nodes, 2.7bn rels ○ degree distribution ○ pageRank, unionFind
  • 24. Neo4j Community Graph ● Neo4j Community activity from Twitter, GitHub, StackOverflow ● Let's look at tweets ● Tweet-PageRank ● Projection -> mention-user-network ○ centralities ○ clustering ○ grouping
  • 25. DBPedia ● Shallow Copy of Wikipedia ● Just (Page) -[:Link]-> (Page) CALL algo.pageRank.stream('Page', 'Link', {iterations:5}) YIELD node, score WITH * ORDER BY score DESC LIMIT 5 RETURN node.title, score; +--------------------------------------+ | node.title | score | +--------------------------------------+ | "United States" | 13349.2 | | "Animal" | 6077.77 | | "France" | 5025.61 | | "List of sovereign states" | 4913.92 | | "Germany" | 4662.32 | +--------------------------------------+ 5 rows 46247 ms
  • 26. DBPedia CALL algo.pageRank('Page', 'Link', {write:true,iterations:20}); +--------------------------------------------------------------------------------------+ | nodes | iter | loadMillis | computeMillis | writeMillis | damping | writeProperty | +--------------------------------------------------------------------------------------+ | 11474730 | 20 | 34106 | 9712 | 1810 | 0.85 | "pagerank" | +--------------------------------------------------------------------------------------+ 1 row 47888 ms
  • 27. BitCoin ● Full Copy of the BitCoin BlockChain ○ from learnmeabitcoin.com ○ thanks Greg Walker ○ (see Online Meetup) ● 1.7bn nodes, 2.7bn rels ○ 474k blocks, ○ 240m tx, ○ 280m addresses ○ 650m outputs
  • 28. BitCoin ● distribution of "locked" relationships for "addresses" ○ = participation in transactions call apoc.stats.degrees('<locked'); +--------------------------------------------------------------------------------------------------------------+ | type | direction | total | p50 | p75 | p90 | p95 | p99 | p999 | max | min | mean | +--------------------------------------------------------------------------------------------------------------+ | "locked" | "INCOMING" | 654662356 | 0 | 0 | 1 | 1 | 2 | 28 | 1891327 | 0 | 0.37588608290716047 | +--------------------------------------------------------------------------------------------------------------+ 1 row 308619 ms
  • 29. ● Inferred network of addresses, via transaction and output ● (a)<-[:locked]-(o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2) ● use cypher projections ● 1M outputs (24s) to start with, connected addresses via tx ● 10M outputs (296s) call algo.unionFind.stream( 'match (o:output)-[:locked]->(a) with a limit 10000000 return id(a) as id', 'match (o:output)-[:locked]->(a) with o,a limit 10000000 match (o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2) return id(a) as source, id(a2) as target, count(tx) as value', {graph:'cypher'}) YIELD setId, nodeId RETURn setId, count(*) as size ORDER BY size DESC LIMIT 10; BitCoin +-------------------+ | setId | size | +-------------------+ | 5036 | 4409420 | | 6295282 | 1999 | | 5839746 | 1488 | | 9356302 | 833 | | 6560901 | 733 | | 6370777 | 637 | | 8101710 | 392 | | 5945867 | 369 | | 2489036 | 264 | | 1703620 | 203 | +-------------------+ 10 rows 296109 ms
  • 30. First release July 2017 • neo4j.com/blog/efficient-graph-algorithms-neo4j Second Release Sept/Oct 2017 • huge graphs, additional algorithms, bugfixes Timing
  • 31. • Launch & Perf: • Docs: neo4j-contrib.github.io/neo4j-graph-algorithms • Tomaz Bratanic: tbgraph.wordpress.com (docs, social, tube, GoT) • Community Graph: github.com/community-graph • Twitter Analytics: neo4j.com/blog/graph-algorithms-make-election-data-great-again More examples
  • 32. Reading Material • Thanks to Amy Hodler • 13 Top Resources on Graph Theory • neo4j.com/blog/ top-13-resources-graph-theory-algorithms/
  • 33. We need your feedback & use-cases! neo4j.com/slack -> #neo4j-graph-algorithms github.com/neo4j-contrib/neo4j-graph-algorithms michael@neo4j.com Please get in touch!!
  • 34. The Sky is the Limit Questions?
  • 35. Don't forget! Subscribe & Like youtube.com/c/neo4j