SlideShare a Scribd company logo
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
1
Creating a Knowledge Graph with
Neo4j:
A Simple Machine Learning Approach
Clair J. Sullivan, PhD
Data Science Advocate
https://medium.com/@cj2001
@CJLovesData1
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
All materials for this demonstration are available on the
workshop GitHub repo:
https://github.com/cj2001/nodes2021_kg_workshop
3
(I will put this link up several times!)
Neo4j, Inc. All rights reserved 2021
To run today’s code:
1. Jupyter or Google Colab
◦ We will have some dependencies to manage in either
◦ If you are bringing your own Jupyter, you probably want to create a virtual
environment for this workshop
2. Neo4j Sandbox
◦ https://dev.neo4j.com/sandbox
We can either populate the database manually, or I will show how to download
a pre-populated one...
Neo4j, Inc. All rights reserved 2021
5
Neo4j, Inc. All rights reserved 2021
6
By the end of this workshop you will be able to...
Apply data science
and machine
learning to
knowledge graph
Vectorize
knowledge graph
(create graph
embeddings)
Create a knowledge
graph
Get some text and
extract relevant
information
Neo4j, Inc. All rights reserved 2021
7
By the end of this workshop you will be able to...
Apply data science
and machine
learning to
knowledge graph
Vectorize
knowledge graph
(create graph
embeddings)
Create a knowledge
graph
Get some text and
extract relevant
information
Natural Language Processing (NLP) Graph Data Science Library + Basic ML
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
8
Two Key
Concepts 1. There is no proverbial “silver bullet” with
Natural Language Processing (NLP)
2. The quality of what you get out of a
knowledge graph depends on the quality of
what you put into it
Neo4j, Inc. All rights reserved 2021
9
https://yashuseth.blog/2019/10/08/introduction-question-answering-knowledge-graphs-kgqa/
Neo4j, Inc. All rights reserved 2021
• “Things not strings”
• What knowledge graphs are useful for
◦ Search
◦ Question answering
◦ Recommendation engine
• Can be generated a lot of different ways
◦ Co-occurrence
◦ Resource Description Framework (RDF)
◦ Subject-Verb-Object (SVO)
10
Introduction to knowledge graphs
Neo4j, Inc. All rights reserved 2021
11
Word co-occurrence
https://covid19biblio.com/2020/04/28/keyword-co-occurrence-network-graph-for-the-overall-research-field-on-covid-19-up-to-april-27th-2020/
Neo4j, Inc. All rights reserved 2021
12
RDF triples
https://en.wikipedia.org/wiki/Resource_Description_Framework#Examples
Neo4j, Inc. All rights reserved 2021
13
SVO triples
Neo4j, Inc. All rights reserved 2021
• Named Entity Recognition (NER)
• SVO / SPO triples
◦ ...but verbs can be difficult to reliably detect via NLP!
• Very language dependent
• Very topic-area dependent
14
NLP considerations for knowledge graph creation
Neo4j, Inc. All rights reserved 2021
15
Neo4j, Inc. All rights reserved 2021
16
Barack Hussein Obama II is an American politician and
attorney who served as the 44th president of the United
States from 2009 to 2017.
Neo4j, Inc. All rights reserved 2021
17
Barack Hussein
Obama II is an
American politician
and attorney who
served as the 44th
president of the
United States from
2009 to 2017.
Text Lemma Tag POS DEP is_stop
Barack Barack NNP PROPN compound FALSE
Hussein Hussein NNP PROPN compound FALSE
Obama Obama NNP PROPN compound FALSE
II II NNP PROPN nsubj FALSE
is be VBZ AUX ROOT TRUE
an an DT DET det TRUE
American american JJ ADJ amod FALSE
politician politician NN NOUN attr FALSE
and and CC CCONJ cc TRUE
attorney attorney NN NOUN conj FALSE
who who WP PRON nsubj TRUE
served serve VBD VERB relcl FALSE
as as IN ADP prep TRUE
the the DT DET det TRUE
44th 44th JJ ADJ amod FALSE
president president NN NOUN pobj FALSE
of of IN ADP prep TRUE
the the DT DET det TRUE
United United NNP PROPN compound FALSE
States States NNP PROPN pobj FALSE
from from IN ADP prep TRUE
2009 2009 CD NUM pobj FALSE
to to IN ADP prep TRUE
2017 2017 CD NUM pobj FALSE
. . . PUNCT punct FALSE
https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
Neo4j, Inc. All rights reserved 2021
18
Barack Hussein
Obama II is an
American politician
and attorney who
served as the 44th
president of the
United States from
2009 to 2017.
Text Lemma Tag POS DEP is_stop
Barack Barack NNP PROPN compound FALSE
Hussein Hussein NNP PROPN compound FALSE
Obama Obama NNP PROPN compound FALSE
II II NNP PROPN nsubj FALSE
is be VBZ AUX ROOT TRUE
an an DT DET det TRUE
American american JJ ADJ amod FALSE
politician politician NN NOUN attr FALSE
and and CC CCONJ cc TRUE
attorney attorney NN NOUN conj FALSE
who who WP PRON nsubj TRUE
served serve VBD VERB relcl FALSE
as as IN ADP prep TRUE
the the DT DET det TRUE
44th 44th JJ ADJ amod FALSE
president president NN NOUN pobj FALSE
of of IN ADP prep TRUE
the the DT DET det TRUE
United United NNP PROPN compound FALSE
States States NNP PROPN pobj FALSE
from from IN ADP prep TRUE
2009 2009 CD NUM pobj FALSE
to to IN ADP prep TRUE
2017 2017 CD NUM pobj FALSE
. . . PUNCT punct FALSE
https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
Neo4j, Inc. All rights reserved 2021
19
Barack Hussein
Obama II is an
American politician
and attorney who
served as the 44th
president of the
United States from
2009 to 2017.
Text Lemma Tag POS DEP is_stop
Barack Barack NNP PROPN compound FALSE
Hussein Hussein NNP PROPN compound FALSE
Obama Obama NNP PROPN compound FALSE
II II NNP PROPN nsubj FALSE
is be VBZ AUX ROOT TRUE
an an DT DET det TRUE
American american JJ ADJ amod FALSE
politician politician NN NOUN attr FALSE
and and CC CCONJ cc TRUE
attorney attorney NN NOUN conj FALSE
who who WP PRON nsubj TRUE
served serve VBD VERB relcl FALSE
as as IN ADP prep TRUE
the the DT DET det TRUE
44th 44th JJ ADJ amod FALSE
president president NN NOUN pobj FALSE
of of IN ADP prep TRUE
the the DT DET det TRUE
United United NNP PROPN compound FALSE
States States NNP PROPN pobj FALSE
from from IN ADP prep TRUE
2009 2009 CD NUM pobj FALSE
to to IN ADP prep TRUE
2017 2017 CD NUM pobj FALSE
. . . PUNCT punct FALSE
https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
Neo4j, Inc. All rights reserved 2021
• spacy
• Wikipedia Python package
• Google Knowledge Graph
• Pywikibot
• Neo4j
◦ Awesome Procedures on Cypher (APOC)
◦ Graph Data Science (GDS) Library
◦ Cypher
20
An introduction to the tools we will use today
Neo4j, Inc. All rights reserved 2021
Clone the GitHub repository at (OPTIONAL)
https://github.com/cj2001/nodes2021_kg_workshop
Neo4j, Inc. All rights reserved 2021
Method 1: The NLP Only
Approach
22
Neo4j, Inc. All rights reserved 2021
23
Some ways we could get this done: NLP only approach
Advantage: limitless verbs
Drawback: entity disambiguation
Neo4j, Inc. All rights reserved 2021
word2vec
https://www.kdnuggets.com/2019/01/
burkov-self-supervised-learning-word-
embeddings.html
https://medium.com/swlh/word2vec-in-practice-for-
natural-language-processing-a179b3286a21
Neo4j, Inc. All rights reserved 2021
25
Overview of workflow
Neo4j, Inc. All rights reserved 2021
26
NLP workflow
Neo4j, Inc. All rights reserved 2021
27
Create a Google Knowledge API key
https://developers.google.com/knowledge-graph/how-tos/authorizing
Neo4j, Inc. All rights reserved 2021
28
{...
"@type": "ItemList",
"itemListElement": [
{
"@type": "EntitySearchResult",
"result": {
"@id": "kg:/m/0dl567",
"name": "Taylor Swift",
"@type": [
"Thing",
"Person"
],
...
"detailedDescription": {
"articleBody": "Taylor Alison Swift is an American singer-songwriter and
actress. Raised in Wyomissing, Pennsylvania, she moved to Nashville, Tennessee, at the
age of 14 to pursue a career in country music. ",
"url": "http://en.wikipedia.org/wiki/Taylor_Swift",
...
}
Enhance the existing data with Google Knowledge Graph
Neo4j, Inc. All rights reserved 2021
29
Detailed knowledge graph data model
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Method 2: The NLP Lite
Approach
31
Neo4j, Inc. All rights reserved 2021
Some ways we could get this done: NLP “lite”
Advantage: entity disambiguation
Drawback: must specify which
verbs you are interested in
Neo4j, Inc. All rights reserved 2021
Create a PyWikiBot token
https://heardlibrary.github.io/digital-scholarship/host/wikidata/bot/
Neo4j, Inc. All rights reserved 2021
34
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Machine Learning on Graphs
36
Neo4j, Inc. All rights reserved 2021
37
Why machine learning with (knowledge) graphs?
• Traditional ML uses a relational database-type model
◦ All data points are are independent of each other
• Example: churn prediction based on user behavior
• Graphs (and graph databases) treat relationships as a “first class citizen”
◦ Models can include homophily
• Example: churn prediction includes the churn of neighbors within the graph
◦ Models can also include the same data as the traditional, independent data
point models
Example: making a better
recommendation engine
Neo4j, Inc. All rights reserved 2021
word2vec
38
https://www.geeksforgeeks.org/python-word-embedding-using-word2vec/
Note: word2vec typically
creates one vector per word.
The spacy implementation of
vectorization takes a
document (sentence) and
averages the word vectors
across the sentence.
Neo4j, Inc. All rights reserved 2021
node2vec
https://snap.stanford.edu/node2vec/
Neo4j, Inc. All rights reserved 2021
node2vec
40
Embedding dimension: 10
Neo4j, Inc. All rights reserved 2021
Node similarity via embeddings
Embedding dimension: 300
Neo4j, Inc. All rights reserved 2021
Visualizing embeddings with t-SNE
Neo4j, Inc. All rights reserved 2021
Clone the GitHub repository at (OPTIONAL)
https://github.com/cj2001/nodes2021_kg_workshop
Neo4j, Inc. All rights reserved 2021
Where to go from here?
44
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
45
Two Key
Concepts 1. There is no proverbial “silver bullet” with
Natural Language Processing (NLP)
2. The quality of what you get out of a
knowledge graph depends on the quality of
what you put into it
Neo4j, Inc. All rights reserved 2021
What could we do from here?
• Add nodes to the graph!
• Various embedding optimization techniques
• Add data for creating embeddings
◦ Ex: Word vectors from text descriptions
• Different embedding/modeling techniques
◦ GraphSAGE
◦ GNN
Neo4j, Inc. All rights reserved 2021
Problems we could solve
• Community/cluster detection
• Node classification, link prediction
• Graph-to-graph classification
• Unstructured text, NLP
• Question answering systems
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
48
Thank you!
https://medium.com/@cj2001
@CJLovesData1

More Related Content

PDF
1.แบบฝึกหัดลิมิต
PDF
Intro to Neo4j and Graph Databases
PDF
A comprehensive guide to Agentic AI Systems
PDF
Natural Language Processing with Graph Databases and Neo4j
PDF
Investor Coach.pdf
PDF
Super Easy Memory Forensics
 
PDF
แบบฝึกหัด เรื่อง สมการและอสมการพหุนาม ชุดที่ 2
PDF
Alphorm.com Formation Microsoft Azure (AZ-900) : Les Fondamentaux
1.แบบฝึกหัดลิมิต
Intro to Neo4j and Graph Databases
A comprehensive guide to Agentic AI Systems
Natural Language Processing with Graph Databases and Neo4j
Investor Coach.pdf
Super Easy Memory Forensics
 
แบบฝึกหัด เรื่อง สมการและอสมการพหุนาม ชุดที่ 2
Alphorm.com Formation Microsoft Azure (AZ-900) : Les Fondamentaux

What's hot (20)

PDF
Introducing Neo4j
PPTX
Intro to Neo4j
PPTX
Top 10 Cypher Tuning Tips & Tricks
PDF
GPT and Graph Data Science to power your Knowledge Graph
PDF
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
PPTX
ONNX and MLflow
PDF
Introduction of Knowledge Graphs
PDF
Graph database Use Cases
PDF
Knowledge Graph Embeddings for Recommender Systems
PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
PDF
Knowledge Graphs - The Power of Graph-Based Search
PDF
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
PDF
Data engineering zoomcamp introduction
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
PDF
Data Engineering Basics
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PDF
Intro to Graphs and Neo4j
PPTX
Knowledge Graph Introduction
PDF
Knowledge Graphs and Generative AI
PDF
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Introducing Neo4j
Intro to Neo4j
Top 10 Cypher Tuning Tips & Tricks
GPT and Graph Data Science to power your Knowledge Graph
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
ONNX and MLflow
Introduction of Knowledge Graphs
Graph database Use Cases
Knowledge Graph Embeddings for Recommender Systems
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Knowledge Graphs - The Power of Graph-Based Search
Neo4j Generative AI workshop at GraphSummit London 14 Nov 2023.pdf
Data engineering zoomcamp introduction
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Data Engineering Basics
Federated Learning: ML with Privacy on the Edge 11.15.18
Intro to Graphs and Neo4j
Knowledge Graph Introduction
Knowledge Graphs and Generative AI
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Ad

Similar to Training Week: Create a Knowledge Graph: A Simple ML Approach (20)

PDF
Road to NODES Workshop Series - Intro to Neo4j
PDF
Road to NODES - Blazing Fast Ingest with Apache Arrow
PDF
Connecting the Dots for Information Discovery.pdf
PDF
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
PDF
Neo4j: Data Engineering for RAG (retrieval augmented generation)
PPTX
How Graph Data Science can turbocharge your Knowledge Graph
PDF
Remote Desktop Manager Enterprise 2024.3.29
PDF
LDPlayer Free Download (Latest version 2025)
PDF
Apple Logic Pro X for MacOS Free Download
PDF
Capcut Pro Crack For PC Latest 2025 Version
PDF
[KubeCon NA 2018] Effective Kubernetes Develop: Turbocharge Your Dev Loop - P...
PDF
Prepare for the Mobilacalypse
PDF
Atelier - Innover avec l’IA Générative et les graphes de connaissances
PDF
GraphConnect Europe 2016 - NoSQL Polyglot Persistence: Tools and Integrations...
PDF
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
PDF
Large Language Models ❤️ Knowledge Graphs - Michael Hunger
PDF
Linking media, data, and services
PDF
Walter api
PPTX
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
PDF
Software Carpentry for the Geophysical Sciences
Road to NODES Workshop Series - Intro to Neo4j
Road to NODES - Blazing Fast Ingest with Apache Arrow
Connecting the Dots for Information Discovery.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j: Data Engineering for RAG (retrieval augmented generation)
How Graph Data Science can turbocharge your Knowledge Graph
Remote Desktop Manager Enterprise 2024.3.29
LDPlayer Free Download (Latest version 2025)
Apple Logic Pro X for MacOS Free Download
Capcut Pro Crack For PC Latest 2025 Version
[KubeCon NA 2018] Effective Kubernetes Develop: Turbocharge Your Dev Loop - P...
Prepare for the Mobilacalypse
Atelier - Innover avec l’IA Générative et les graphes de connaissances
GraphConnect Europe 2016 - NoSQL Polyglot Persistence: Tools and Integrations...
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Large Language Models ❤️ Knowledge Graphs - Michael Hunger
Linking media, data, and services
Walter api
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
Software Carpentry for the Geophysical Sciences
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
medical staffing services at VALiNTRY
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Essential Infomation Tech presentation.pptx
PPTX
history of c programming in notes for students .pptx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
System and Network Administration Chapter 2
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How Creative Agencies Leverage Project Management Software.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
medical staffing services at VALiNTRY
Navsoft: AI-Powered Business Solutions & Custom Software Development
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
CHAPTER 2 - PM Management and IT Context
Wondershare Filmora 15 Crack With Activation Key [2025
Design an Analysis of Algorithms II-SECS-1021-03
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Softaken Excel to vCard Converter Software.pdf
Understanding Forklifts - TECH EHS Solution
2025 Textile ERP Trends: SAP, Odoo & Oracle
Essential Infomation Tech presentation.pptx
history of c programming in notes for students .pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Internet Downloader Manager (IDM) Crack 6.42 Build 41
System and Network Administration Chapter 2
PTS Company Brochure 2025 (1).pdf.......
How Creative Agencies Leverage Project Management Software.pdf

Training Week: Create a Knowledge Graph: A Simple ML Approach

  • 1. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 1 Creating a Knowledge Graph with Neo4j: A Simple Machine Learning Approach Clair J. Sullivan, PhD Data Science Advocate https://medium.com/@cj2001 @CJLovesData1
  • 2. Neo4j, Inc. All rights reserved 2021
  • 3. Neo4j, Inc. All rights reserved 2021 All materials for this demonstration are available on the workshop GitHub repo: https://github.com/cj2001/nodes2021_kg_workshop 3 (I will put this link up several times!)
  • 4. Neo4j, Inc. All rights reserved 2021 To run today’s code: 1. Jupyter or Google Colab ◦ We will have some dependencies to manage in either ◦ If you are bringing your own Jupyter, you probably want to create a virtual environment for this workshop 2. Neo4j Sandbox ◦ https://dev.neo4j.com/sandbox We can either populate the database manually, or I will show how to download a pre-populated one...
  • 5. Neo4j, Inc. All rights reserved 2021 5
  • 6. Neo4j, Inc. All rights reserved 2021 6 By the end of this workshop you will be able to... Apply data science and machine learning to knowledge graph Vectorize knowledge graph (create graph embeddings) Create a knowledge graph Get some text and extract relevant information
  • 7. Neo4j, Inc. All rights reserved 2021 7 By the end of this workshop you will be able to... Apply data science and machine learning to knowledge graph Vectorize knowledge graph (create graph embeddings) Create a knowledge graph Get some text and extract relevant information Natural Language Processing (NLP) Graph Data Science Library + Basic ML
  • 8. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 8 Two Key Concepts 1. There is no proverbial “silver bullet” with Natural Language Processing (NLP) 2. The quality of what you get out of a knowledge graph depends on the quality of what you put into it
  • 9. Neo4j, Inc. All rights reserved 2021 9 https://yashuseth.blog/2019/10/08/introduction-question-answering-knowledge-graphs-kgqa/
  • 10. Neo4j, Inc. All rights reserved 2021 • “Things not strings” • What knowledge graphs are useful for ◦ Search ◦ Question answering ◦ Recommendation engine • Can be generated a lot of different ways ◦ Co-occurrence ◦ Resource Description Framework (RDF) ◦ Subject-Verb-Object (SVO) 10 Introduction to knowledge graphs
  • 11. Neo4j, Inc. All rights reserved 2021 11 Word co-occurrence https://covid19biblio.com/2020/04/28/keyword-co-occurrence-network-graph-for-the-overall-research-field-on-covid-19-up-to-april-27th-2020/
  • 12. Neo4j, Inc. All rights reserved 2021 12 RDF triples https://en.wikipedia.org/wiki/Resource_Description_Framework#Examples
  • 13. Neo4j, Inc. All rights reserved 2021 13 SVO triples
  • 14. Neo4j, Inc. All rights reserved 2021 • Named Entity Recognition (NER) • SVO / SPO triples ◦ ...but verbs can be difficult to reliably detect via NLP! • Very language dependent • Very topic-area dependent 14 NLP considerations for knowledge graph creation
  • 15. Neo4j, Inc. All rights reserved 2021 15
  • 16. Neo4j, Inc. All rights reserved 2021 16 Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017.
  • 17. Neo4j, Inc. All rights reserved 2021 17 Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. Text Lemma Tag POS DEP is_stop Barack Barack NNP PROPN compound FALSE Hussein Hussein NNP PROPN compound FALSE Obama Obama NNP PROPN compound FALSE II II NNP PROPN nsubj FALSE is be VBZ AUX ROOT TRUE an an DT DET det TRUE American american JJ ADJ amod FALSE politician politician NN NOUN attr FALSE and and CC CCONJ cc TRUE attorney attorney NN NOUN conj FALSE who who WP PRON nsubj TRUE served serve VBD VERB relcl FALSE as as IN ADP prep TRUE the the DT DET det TRUE 44th 44th JJ ADJ amod FALSE president president NN NOUN pobj FALSE of of IN ADP prep TRUE the the DT DET det TRUE United United NNP PROPN compound FALSE States States NNP PROPN pobj FALSE from from IN ADP prep TRUE 2009 2009 CD NUM pobj FALSE to to IN ADP prep TRUE 2017 2017 CD NUM pobj FALSE . . . PUNCT punct FALSE https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
  • 18. Neo4j, Inc. All rights reserved 2021 18 Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. Text Lemma Tag POS DEP is_stop Barack Barack NNP PROPN compound FALSE Hussein Hussein NNP PROPN compound FALSE Obama Obama NNP PROPN compound FALSE II II NNP PROPN nsubj FALSE is be VBZ AUX ROOT TRUE an an DT DET det TRUE American american JJ ADJ amod FALSE politician politician NN NOUN attr FALSE and and CC CCONJ cc TRUE attorney attorney NN NOUN conj FALSE who who WP PRON nsubj TRUE served serve VBD VERB relcl FALSE as as IN ADP prep TRUE the the DT DET det TRUE 44th 44th JJ ADJ amod FALSE president president NN NOUN pobj FALSE of of IN ADP prep TRUE the the DT DET det TRUE United United NNP PROPN compound FALSE States States NNP PROPN pobj FALSE from from IN ADP prep TRUE 2009 2009 CD NUM pobj FALSE to to IN ADP prep TRUE 2017 2017 CD NUM pobj FALSE . . . PUNCT punct FALSE https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
  • 19. Neo4j, Inc. All rights reserved 2021 19 Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. Text Lemma Tag POS DEP is_stop Barack Barack NNP PROPN compound FALSE Hussein Hussein NNP PROPN compound FALSE Obama Obama NNP PROPN compound FALSE II II NNP PROPN nsubj FALSE is be VBZ AUX ROOT TRUE an an DT DET det TRUE American american JJ ADJ amod FALSE politician politician NN NOUN attr FALSE and and CC CCONJ cc TRUE attorney attorney NN NOUN conj FALSE who who WP PRON nsubj TRUE served serve VBD VERB relcl FALSE as as IN ADP prep TRUE the the DT DET det TRUE 44th 44th JJ ADJ amod FALSE president president NN NOUN pobj FALSE of of IN ADP prep TRUE the the DT DET det TRUE United United NNP PROPN compound FALSE States States NNP PROPN pobj FALSE from from IN ADP prep TRUE 2009 2009 CD NUM pobj FALSE to to IN ADP prep TRUE 2017 2017 CD NUM pobj FALSE . . . PUNCT punct FALSE https://github.com/explosion/spaCy/blob/master/spacy/glossary.py
  • 20. Neo4j, Inc. All rights reserved 2021 • spacy • Wikipedia Python package • Google Knowledge Graph • Pywikibot • Neo4j ◦ Awesome Procedures on Cypher (APOC) ◦ Graph Data Science (GDS) Library ◦ Cypher 20 An introduction to the tools we will use today
  • 21. Neo4j, Inc. All rights reserved 2021 Clone the GitHub repository at (OPTIONAL) https://github.com/cj2001/nodes2021_kg_workshop
  • 22. Neo4j, Inc. All rights reserved 2021 Method 1: The NLP Only Approach 22
  • 23. Neo4j, Inc. All rights reserved 2021 23 Some ways we could get this done: NLP only approach Advantage: limitless verbs Drawback: entity disambiguation
  • 24. Neo4j, Inc. All rights reserved 2021 word2vec https://www.kdnuggets.com/2019/01/ burkov-self-supervised-learning-word- embeddings.html https://medium.com/swlh/word2vec-in-practice-for- natural-language-processing-a179b3286a21
  • 25. Neo4j, Inc. All rights reserved 2021 25 Overview of workflow
  • 26. Neo4j, Inc. All rights reserved 2021 26 NLP workflow
  • 27. Neo4j, Inc. All rights reserved 2021 27 Create a Google Knowledge API key https://developers.google.com/knowledge-graph/how-tos/authorizing
  • 28. Neo4j, Inc. All rights reserved 2021 28 {... "@type": "ItemList", "itemListElement": [ { "@type": "EntitySearchResult", "result": { "@id": "kg:/m/0dl567", "name": "Taylor Swift", "@type": [ "Thing", "Person" ], ... "detailedDescription": { "articleBody": "Taylor Alison Swift is an American singer-songwriter and actress. Raised in Wyomissing, Pennsylvania, she moved to Nashville, Tennessee, at the age of 14 to pursue a career in country music. ", "url": "http://en.wikipedia.org/wiki/Taylor_Swift", ... } Enhance the existing data with Google Knowledge Graph
  • 29. Neo4j, Inc. All rights reserved 2021 29 Detailed knowledge graph data model
  • 30. Neo4j, Inc. All rights reserved 2021
  • 31. Neo4j, Inc. All rights reserved 2021 Method 2: The NLP Lite Approach 31
  • 32. Neo4j, Inc. All rights reserved 2021 Some ways we could get this done: NLP “lite” Advantage: entity disambiguation Drawback: must specify which verbs you are interested in
  • 33. Neo4j, Inc. All rights reserved 2021 Create a PyWikiBot token https://heardlibrary.github.io/digital-scholarship/host/wikidata/bot/
  • 34. Neo4j, Inc. All rights reserved 2021 34
  • 35. Neo4j, Inc. All rights reserved 2021
  • 36. Neo4j, Inc. All rights reserved 2021 Machine Learning on Graphs 36
  • 37. Neo4j, Inc. All rights reserved 2021 37 Why machine learning with (knowledge) graphs? • Traditional ML uses a relational database-type model ◦ All data points are are independent of each other • Example: churn prediction based on user behavior • Graphs (and graph databases) treat relationships as a “first class citizen” ◦ Models can include homophily • Example: churn prediction includes the churn of neighbors within the graph ◦ Models can also include the same data as the traditional, independent data point models Example: making a better recommendation engine
  • 38. Neo4j, Inc. All rights reserved 2021 word2vec 38 https://www.geeksforgeeks.org/python-word-embedding-using-word2vec/ Note: word2vec typically creates one vector per word. The spacy implementation of vectorization takes a document (sentence) and averages the word vectors across the sentence.
  • 39. Neo4j, Inc. All rights reserved 2021 node2vec https://snap.stanford.edu/node2vec/
  • 40. Neo4j, Inc. All rights reserved 2021 node2vec 40 Embedding dimension: 10
  • 41. Neo4j, Inc. All rights reserved 2021 Node similarity via embeddings Embedding dimension: 300
  • 42. Neo4j, Inc. All rights reserved 2021 Visualizing embeddings with t-SNE
  • 43. Neo4j, Inc. All rights reserved 2021 Clone the GitHub repository at (OPTIONAL) https://github.com/cj2001/nodes2021_kg_workshop
  • 44. Neo4j, Inc. All rights reserved 2021 Where to go from here? 44
  • 45. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 45 Two Key Concepts 1. There is no proverbial “silver bullet” with Natural Language Processing (NLP) 2. The quality of what you get out of a knowledge graph depends on the quality of what you put into it
  • 46. Neo4j, Inc. All rights reserved 2021 What could we do from here? • Add nodes to the graph! • Various embedding optimization techniques • Add data for creating embeddings ◦ Ex: Word vectors from text descriptions • Different embedding/modeling techniques ◦ GraphSAGE ◦ GNN
  • 47. Neo4j, Inc. All rights reserved 2021 Problems we could solve • Community/cluster detection • Node classification, link prediction • Graph-to-graph classification • Unstructured text, NLP • Question answering systems
  • 48. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 48 Thank you! https://medium.com/@cj2001 @CJLovesData1