SlideShare a Scribd company logo
Neo4j GDS Best Practices
Graph Data Science Workflow
Objective
Know how to put together the right workflow
for your project, and how to combine
algorithms effectively.
GDS in production
GDS in production
Emphasis on data modeling:
(1) Define the problem you’re trying to answer:
Making recommendations? Finding anomalies?
(2) Match the problem to the correct set of algorithms:
recommendation = similarity, anomalies = centrality
(3) Modify your data model for the algorithms you want to use
- mono- or bi-partite
- modify labels, relationships to use native graphs
- consider weights, seeding
GDS in production
Configuration & best practices:
(1) Do I have enough memory to run this?
(2) How long will this take to run?
(3) What do I need to change?
GDS in production
Loading the analytics graph:
● This is one of the slowest steps, unavoidable, and takes up the
most memory so we want to minimize the number of graphs
we load
● Is it possible for us to load data for all the algorithms into a
single graph?
GDS in production
Running your algorithms:
- .stats returns summary statistics about the results of the algorithm
without writing to the database. Run this first to check if the calculations
make sense!
- .stream returns all the results as a stream - use this if you’re extracting
them for use elsewhere (eg. Python)
- .mutate writes to the in-memory graph. Output from the first
algorithm are written before moving onto the next one in the sequence
- .write writes to the Neo4j database. This is the slowest, so only run it
once you know your algos make sense!
GDS in production
Don’t forget this step:
CALL gds.graph.drop('Similarity-Graph');d
CALL gds.graph.drop('Monopartite-Graph');
Double check that you’ve got all of them:
CALL gds.graph.list();
GDS in production
Time to data science:
GDS is general purpose!
Data Pre-processing:
1. Identifying and removing super nodes: degree centrality
2. Identifying subgraphs: weakly connected components
Common Algorithms & Combinations
Community Detection + ???? = Profit
Community detection algorithms break up your graph into smaller
subgraphs based on edges. Use community detection to downscope
problems on large graphs and focus on the important parts.
- Speed up calculations: Community detection + node similarity
- Focus on the important stuff: Community detection + centrality
- Aggregate your graph: Treat communities as nodes and run centrality,
similarity, etc
Common Algorithms & Combinations
What else can we use GDS for?
Fraud Detection
Weakly Connected
Components - First Party
Fraud
Louvain - Fraud Rings
Page Rank, Degree
Centrality - Anomalies
Disambiguation
Weakly Connected
Components - Common
identifiers
Label Propagation -
Overlapping relationships
Node Similarity
Recommendations
Louvain - Interacting
communities
Page Rank, Betweenness,
Closeness Centrality-
Important users
Node Similarity
And much more!
Transactional graph:
What kind of questions can we answer?
Examples from … Retail
Customer segmentation:
Identify customers who buy similar items (node
similarity), and use Louvain to identify clusters of
consumers with similar behaviour
Item recommendations:
Recommend items that are frequently bought by the
same customers or in the same transaction with node
similarity.
Find items that influence copurchases using Page
Rank, or fast moving items with closeness Centrality
Examples from … Retail
Examples from … Marketing
Web Traffic Graph:
How can we
tell whos
who?
Examples from … Marketing
Disambiguation:
Identify subgraphs of users with co-occurring
identifiers using Weakly Connected Components
User Behavior:
Identify communities of unique users that interact
with the same websites using Label Propagation
Identify which websites drive traffic using Page
Rank
Knowledge graph representing genes, chemicals, diseases:
Examples from … Life Sciences
What questions can we answer?
PageRank & Betweenness to identify essential regulatory genes or
drug targets
Louvain to identify protein regulatory networks
Shortest path to link drug targets to possible outcomes or side
effects
Node Similarity to find structurally similar chemicals
Link Prediction to estimate likelihood of interactions
Examples from … Life Sciences
What are your use
cases? Let’s brainstorm!
Questions?

More Related Content

PDF
4. Document Discovery with Graph Data Science
PDF
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
PDF
Graphs in Telecommunications - Jesus Barrasa, Neo4j
PDF
Neo4j Graph Data Science - Webinar
PDF
Graph Databases and Graph Data Science in Neo4j
PDF
Introduction to Neo4j
PDF
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
4. Document Discovery with Graph Data Science
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Neo4j Graph Data Science - Webinar
Graph Databases and Graph Data Science in Neo4j
Introduction to Neo4j
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...

What's hot (14)

PDF
Graph technology meetup slides
ODP
How do You Graph
PDF
Relationships Matter: Using Connected Data for Better Machine Learning
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
PPTX
The years of the graph: The future of the future is here
PPTX
GraphTour - Neo4j Platform Overview
PDF
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
PDF
RAPIDS cuGraph – Accelerating all your Graph needs
PDF
GraphTour London 2020 - Customer Journey
PDF
Translating the Human Analog to Digital with Graphs
PDF
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
PDF
Apouc 2014-business-analytics-and-big-data
PPTX
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
PPTX
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Graph technology meetup slides
How do You Graph
Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
The years of the graph: The future of the future is here
GraphTour - Neo4j Platform Overview
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
RAPIDS cuGraph – Accelerating all your Graph needs
GraphTour London 2020 - Customer Journey
Translating the Human Analog to Digital with Graphs
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Apouc 2014-business-analytics-and-big-data
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Ad

Similar to Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices (20)

PPTX
Apache Spark GraphX highlights.
PPTX
Cloudera Data Science Challenge 3 Solution by Doug Needham
PPTX
Big Data Analytics
PDF
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
PPTX
Data Structure Graph DMZ #DMZone
PPTX
Egypt hackathon 2014 analytics & spss session
PDF
aRangodb, un package per l'utilizzo di ArangoDB con R
PPTX
Cloudera Data Science Challenge
PPTX
Data Science Challenge presentation given to the CinBITools Meetup Group
DOCX
Self Study Business Approach to DS_01022022.docx
PPTX
Big Data and Hadoop
PDF
Data science guide
PDF
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
PDF
Lambda Architecture and open source technology stack for real time big data
PDF
Big Data Processing & Analytics: Improving data insight.pdf
PDF
How to Become a Big Data Professional.pdf
PDF
Analytics demystified
PDF
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
PPT
Hadoop Demo eConvergence
PPTX
Advertising using big data
Apache Spark GraphX highlights.
Cloudera Data Science Challenge 3 Solution by Doug Needham
Big Data Analytics
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Data Structure Graph DMZ #DMZone
Egypt hackathon 2014 analytics & spss session
aRangodb, un package per l'utilizzo di ArangoDB con R
Cloudera Data Science Challenge
Data Science Challenge presentation given to the CinBITools Meetup Group
Self Study Business Approach to DS_01022022.docx
Big Data and Hadoop
Data science guide
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Lambda Architecture and open source technology stack for real time big data
Big Data Processing & Analytics: Improving data insight.pdf
How to Become a Big Data Professional.pdf
Analytics demystified
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Hadoop Demo eConvergence
Advertising using big data
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks

Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices

  • 1. Neo4j GDS Best Practices Graph Data Science Workflow
  • 2. Objective Know how to put together the right workflow for your project, and how to combine algorithms effectively.
  • 4. GDS in production Emphasis on data modeling: (1) Define the problem you’re trying to answer: Making recommendations? Finding anomalies? (2) Match the problem to the correct set of algorithms: recommendation = similarity, anomalies = centrality (3) Modify your data model for the algorithms you want to use - mono- or bi-partite - modify labels, relationships to use native graphs - consider weights, seeding
  • 5. GDS in production Configuration & best practices: (1) Do I have enough memory to run this? (2) How long will this take to run? (3) What do I need to change?
  • 6. GDS in production Loading the analytics graph: ● This is one of the slowest steps, unavoidable, and takes up the most memory so we want to minimize the number of graphs we load ● Is it possible for us to load data for all the algorithms into a single graph?
  • 7. GDS in production Running your algorithms: - .stats returns summary statistics about the results of the algorithm without writing to the database. Run this first to check if the calculations make sense! - .stream returns all the results as a stream - use this if you’re extracting them for use elsewhere (eg. Python) - .mutate writes to the in-memory graph. Output from the first algorithm are written before moving onto the next one in the sequence - .write writes to the Neo4j database. This is the slowest, so only run it once you know your algos make sense!
  • 8. GDS in production Don’t forget this step: CALL gds.graph.drop('Similarity-Graph');d CALL gds.graph.drop('Monopartite-Graph'); Double check that you’ve got all of them: CALL gds.graph.list();
  • 9. GDS in production Time to data science:
  • 10. GDS is general purpose!
  • 11. Data Pre-processing: 1. Identifying and removing super nodes: degree centrality 2. Identifying subgraphs: weakly connected components Common Algorithms & Combinations
  • 12. Community Detection + ???? = Profit Community detection algorithms break up your graph into smaller subgraphs based on edges. Use community detection to downscope problems on large graphs and focus on the important parts. - Speed up calculations: Community detection + node similarity - Focus on the important stuff: Community detection + centrality - Aggregate your graph: Treat communities as nodes and run centrality, similarity, etc Common Algorithms & Combinations
  • 13. What else can we use GDS for? Fraud Detection Weakly Connected Components - First Party Fraud Louvain - Fraud Rings Page Rank, Degree Centrality - Anomalies Disambiguation Weakly Connected Components - Common identifiers Label Propagation - Overlapping relationships Node Similarity Recommendations Louvain - Interacting communities Page Rank, Betweenness, Closeness Centrality- Important users Node Similarity And much more!
  • 14. Transactional graph: What kind of questions can we answer? Examples from … Retail
  • 15. Customer segmentation: Identify customers who buy similar items (node similarity), and use Louvain to identify clusters of consumers with similar behaviour Item recommendations: Recommend items that are frequently bought by the same customers or in the same transaction with node similarity. Find items that influence copurchases using Page Rank, or fast moving items with closeness Centrality Examples from … Retail
  • 16. Examples from … Marketing Web Traffic Graph: How can we tell whos who?
  • 17. Examples from … Marketing Disambiguation: Identify subgraphs of users with co-occurring identifiers using Weakly Connected Components User Behavior: Identify communities of unique users that interact with the same websites using Label Propagation Identify which websites drive traffic using Page Rank
  • 18. Knowledge graph representing genes, chemicals, diseases: Examples from … Life Sciences What questions can we answer?
  • 19. PageRank & Betweenness to identify essential regulatory genes or drug targets Louvain to identify protein regulatory networks Shortest path to link drug targets to possible outcomes or side effects Node Similarity to find structurally similar chemicals Link Prediction to estimate likelihood of interactions Examples from … Life Sciences
  • 20. What are your use cases? Let’s brainstorm!