12/15/21 Heiko Paulheim 1
New Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim
12/15/21 Heiko Paulheim 2
Brief Introduction
2006 2008 2011 2013 2014 2017
Pre PhD Years PhD Years PostDoc Years Assistant Prof. Full Prof.
SDType
rdf2vec
ReNewRS
Kare§KoKI
MELT
12/15/21 Heiko Paulheim 3
Knowledge Graphs: At a Glance
• Graph shaped knowledge representation
– nodes: entities
– edges: relations
University of Mannheim
Mannheim
Baden-
Württemberg
Germany
Heiko Paulheim
DWS Group
employer
a
f
f
il
i
a
t
io
n
part of
residence
s
t
a
t
e
part of
12/15/21 Heiko Paulheim 4
Knowledge Graphs in Organizations
• Knowledge Graphs are used…
• …in companies and
organizations
– collect, organize,
and integrate knowledge
– link isolated
information sources
– make information
searchable and findable
Masuch et al., 2016
12/15/21 Heiko Paulheim 5
Public Knowledge Graphs
• Knowledge Graphs are used…
• …as (free), public resources
– collect common knowledge
– general purpose, not task specific
– make it easy to build knowledge-intensive applications
12/15/21 Heiko Paulheim 6
Knowledge Graphs: Out of the Dark
Google’s
Announcement
DBpedia
YAGO
ResearchCyc Wikidata
Freebase
NELL
12/15/21 Heiko Paulheim 7
Usage of Public Knowledge Graphs
OK, Google, when will the final
season of Money Heist be on Netflix?
The fifth season of Money Heist
will be released on September 3rd
.
12/15/21 Heiko Paulheim 8
Usage of Public Knowledge Graphs
2021-09-03
2020-04-03
release date
release date
has part
h
a
s
p
a
r
t
OK, Google, when will the final season
Money Heist be on Netflix?
.
.
.
12/15/21 Heiko Paulheim 9
Usage of Public Knowledge Graphs
2021-09-03
2020-04-03
release date
release date
creator
has part
h
a
s
p
a
r
t
cast
c
a
s
t
creator
c
a
s
t
Are there any other series
by the same creator?
creator
cast
cast .
.
.
.
.
.
12/15/21 Heiko Paulheim 10
Use Cases for Knowledge Graphs
• Background Knowledge
– e.g., company data (address, CEO, branch, …)
→ SAP CRM (BSc thesis 2019)
– e.g., geographic regions (demographics)
→ for example, sales data prediction
– data interpretation (e.g., Excel tables, business models)
→ PhD thesis under supervision
• Data Integration
– unified view of different data sources
– relating business entities in different systems
– cross-source data visualization and analytics
12/15/21 Heiko Paulheim 11
Knowledge Graphs in Data Science
• Typical cases:
– predictive modeling, information retrieval, recommendation, …
• For all of those, there’s sophisticated implementations
– but...
?
12/15/21 Heiko Paulheim 12
Wanted: A Bridge between Both Worlds
12/15/21 Heiko Paulheim 13
Wanted: A Bridge between Both Worlds
• Data Science tools for prediction etc.
– Python, Weka, R, RapidMiner, …
– Algorithms that work on vectors, not graphs
• Bridges built over the past years:
– FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015),
Python KG Extension (2021)
?
12/15/21 Heiko Paulheim 14
Wanted: A Bridge between Both Worlds
• Transformation strategies (aka propositionalization)
– e.g., types: type_horror_movie=true
– e.g., data values: year=2011
– e.g., aggregates: nominations=7
?
12/15/21 Heiko Paulheim 15
Wanted: A Bridge between Both Worlds
• Observations with simple propositionalization strategies
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
• Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
• Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
• …
– But
• The search space is enormous!
• Generate first, filter later does not scale well
12/15/21 Heiko Paulheim 16
Wanted: A Bridge between Both Worlds
• Excursion: word embeddings
– word2vec proposed by Mikolov et al. (2013)
– predict a word from its context or vice versa
• Idea: similar words appear in similar contexts, like
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
– usually trained on large text corpora
• projection layer: embedding vectors
12/15/21 Heiko Paulheim 17
From Word Embeddings to Graph Embeddings
• Basic idea:
– extract random walks from an RDF graph:
Mulholland Dr. David Lynch US
– feed walks into word2vec algorithm
• Order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
• Result:
– node embeddings
– most often outperform other propositionalization techniques
director nationality
12/15/21 Heiko Paulheim 18
A First Glance at RDF2vec Embeddings
• Observation: close projection of similar entities
12/15/21 Heiko Paulheim 19
Random vs. non-random
• Maybe random walks are not such a good idea
– They may give too much weight on less-known entities and facts
• Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree
– Prefer nodes with higher PageRank
– …
– They may cover less-known entities and facts too little
• Strategies:
– The opposite of all of the above strategies
• External signals (e.g., human notions of importance)
– generally work better than graph-internal signals
Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings
Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec
Embeddings
12/15/21 Heiko Paulheim 20
Local Embeddings
• Recap: order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
– “Train once, reuse often”
• In some cases, only a small subset (of 6M) is of interest
– RDF2vec light: “train when needed”
– Runtime: minutes instead of days
Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge
Graph Embeddings
12/15/21 Heiko Paulheim 21
RDF2vec: Example Applications
• Data Model Matching with WebIsA and RDF2vec
Portisch et al. (2019): Evaluating ontology matchers on real-world financial services
data models.
12/15/21 Heiko Paulheim 22
RDF2vec: Example Applications
• Entity disambiguation: linking texts to a knowledge graph
Türker et al. (2019): Knowledge-Based Short Text Categorization
Using Entity and Category Embedding
12/15/21 Heiko Paulheim 23
RDF2vec: Example Applications
• Finding related research papers on CoViD-19
Steenwinckel et al. (2020): Facilitating COVID-19 Meta-analysis Through a Literature
Knowledge Graph
12/15/21 Heiko Paulheim 24
RDF2vec: Example Applications
• Table search by keyword
Zhang and Balog (2018): Ad Hoc Table Retrieval using Semantic Similarity.
12/15/21 Heiko Paulheim 25
RDF2vec: Example Applications
• Predicting biological interactions
Sousa et al. (2021): Supervised Semantic Similarity.
12/15/21 Heiko Paulheim 26
RDF2vec: Example Applications
• Zero-Shot Image Classification
Hascoet et al. (2017): Semantic Web and Zero-Shot Learning of Large Scale Visual
Classes.
12/15/21 Heiko Paulheim 27
Embeddings for Link Prediction
• RDF2vec example
– similar instances form clusters, direction of relation is ~stable
– link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing)
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
12/15/21 Heiko Paulheim 28
Embeddings for Link Prediction
• In RDF2vec, relation preservation is a by-product
• TransE (and its descendants): direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
• across all triples <s,p,o>
Σ ||s+p-o|| is minimized
• try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
12/15/21 Heiko Paulheim 29
Link Prediction vs. Node Embedding
• Hypothesis:
– Embeddings for link prediction also cluster similar entities
– Node embeddings can also be used for link prediction
Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding
for Link Prediction - Two Sides of the Same Coin?
12/15/21 Heiko Paulheim 30
Similarity vs. Relatedness
• Closest 10 entities to Angela Merkel in different vector spaces
Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding
for Link Prediction - Two Sides of the Same Coin?
12/15/21 Heiko Paulheim 31
Similarity vs. Relatedness
• (s-)RDF2vec allows an explicit trade off w/ different walk strategies
Mannheim
Baden-
Württemberg
Germany
Adler
Mannheim
SAP Arena
Reiss-
Engelhorn
-Museum
location
location
location
federal
state
country
location
city
stadium
Knowledge Graph
Walk Generation
Adler_Mannheim → city → Mannheim → country → Germany
Adler_Mannheim → stadium → SAP_Arena → location → Mannheim
SAP_Arena → location → Mannheim → country → Germany
...
“Classic” RDF2vec walks
city → Mannheim → country
stadium → SAP_Arena → location
location → Mannheim → country
...
s-RDF2vec walks
+
RDF2vec “union walks”
RDF2vec “classic”
RDF2vec “edge”
concatenated
vector
Global PCA
Test Cases
concatenated
vector
(task-specific
subset)
w
2
w
1
(weighted)
local PCA
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity
Embeddings.
12/15/21 Heiko Paulheim 32
Similarity vs. Relatedness
• s-RDF2vec
– using different walk strategies
– combining different vector spaces (weighted combinations are possible)
• 10 closest neighbors to Mannheim:
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity
Embeddings.
12/15/21 Heiko Paulheim 33
Similarity vs. Relatedness
• Recap word embeddings:
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
• Graph walks:
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
Germany
Angela_Merkel Hamburg
birthPlace
country
leader
Peter_Tschentscher
leader
residence
country
12/15/21 Heiko Paulheim 34
Similarity vs. Relatedness
• Surrounding entities indicate relatedness
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
• Same entities in similar positions indicate similarity
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
• Someone is a leader vs. something has a leader
• Solution approach: use embedding approach that respects positions
– CWINDOW / Structured Skip-ngram
Portisch and Paulheim (2021): Putting RDF2vec in Order.
12/15/21 Heiko Paulheim 35
Similarity vs. Relatedness
• Why bother?
– Use case: table interpretation (a special case of entity disambiguation)
related
similar
12/15/21 Heiko Paulheim 36
Back to Interpretability
• Hot topic: Explainable AI
– Knowledge Graphs are a favorable ingredient
– Human/machine interpretable knowledge → explainable systems
• However:
– Embeddings replace interpretable axioms
with numeric vectors over non-interpretable dimensions
– Where did the semantics go?
Paulheim (2018): Make Embeddings Semantic Again!
12/15/21 Heiko Paulheim 37
The 2009 Semantic Web Layer Cake
12/15/21 Heiko Paulheim 38
The 2018 Semantic Web Layer Cake
Embeddings
12/15/21 Heiko Paulheim 39
Towards Semantic Vector Space Embeddings
cartoon
superhero
Paulheim (2018): Make Embeddings Semantic Again!
12/15/21 Heiko Paulheim 40
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 1: learn interpretation function
• Each dimension of the embedding model
is a target for a separate learning problem
• Learn a function to explain the dimension
• E.g.:
• Just an approximation used for explanations and justifications
y≈−|∃character .Superhero|
12/15/21 Heiko Paulheim 41
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 2: learn inherently
interpretable embeddings
• Step 1: learn typical patterns
that exist in a knowledge graph
– e.g., graph pattern learning
– e.g., Horn clauses
• Step 2a: use those patterns
as embedding dimensions
– probably not low dimensional
• Step 2b: compact the space
– e.g., use dimensions for mutually exclusive patterns
12/15/21 Heiko Paulheim 42
Towards Semantic Vector Space Embeddings
• Different angle: learn interpretation for similarity function
~similar
type
~same
country
~connected
to same
entity
12/15/21 Heiko Paulheim 43
Summary
• Knowledge Graphs are a versatile ingredient for AI
– Integrated view on data
– Large-scale free source of background knowledge
• Knowledge Graph Embeddings
– Effective processing of large-scale knowledge sources
– Encoding of similarity and/or relatedness
• RDF2vec: explicit trade-off is possible!
– Additional insights that are not explicit in the graph
• aka latent semantics
12/15/21 Heiko Paulheim 44
More on RDF2vec
• Collection of
– Implementations
– Pre-trained models
– >40 use cases
in various domains
12/15/21 Heiko Paulheim 45
Thank you!
http://www.heikopaulheim.com
@heikopaulheim
12/15/21 Heiko Paulheim 46
New Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim

More Related Content

PPTX
What is Language and Linguistics?
PPTX
Owl web ontology language
PPTX
5. Language Attitudes Convergence, Divergence and Acts of.pptx
PPT
Pragmatics
PDF
Debunking some “RDF vs. Property Graph” Alternative Facts
PPTX
RDF Data Model
PPTX
Computational linguistics
PDF
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
What is Language and Linguistics?
Owl web ontology language
5. Language Attitudes Convergence, Divergence and Acts of.pptx
Pragmatics
Debunking some “RDF vs. Property Graph” Alternative Facts
RDF Data Model
Computational linguistics
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]

Similar to New Adventures in RDF2vec (20)

PDF
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
PDF
New Adventures in RDF2vec
ODP
Machine Learning with and for Semantic Web Knowledge Graphs
PDF
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
ODP
Knowledge Graphs on the Web
PPTX
Knowledge Graph Introduction
PDF
The discovery of knowledge graphs and their utility in biotech
PDF
Towards Knowledge Graph Profiling
PPTX
Linked Open Data Utrecht University Library
PDF
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
ODP
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
ODP
What the Adoption of schema.org Tells about Linked Open Data
PPTX
First Step in NoSql
PDF
Semantic Web and It's Application - Kabul Kurniawan
PDF
Knowledge Graphs and Their Application.pdf
PPTX
Road Map for Careers in Big Data
PDF
Data Science in 2016: Moving Up
PDF
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Machine Learning & Embeddings for Large Knowledge Graphs
What_do_Knowledge_Graph_Embeddings_Learn.pdf
New Adventures in RDF2vec
Machine Learning with and for Semantic Web Knowledge Graphs
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Knowledge Graphs on the Web
Knowledge Graph Introduction
The discovery of knowledge graphs and their utility in biotech
Towards Knowledge Graph Profiling
Linked Open Data Utrecht University Library
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
What the Adoption of schema.org Tells about Linked Open Data
First Step in NoSql
Semantic Web and It's Application - Kabul Kurniawan
Knowledge Graphs and Their Application.pdf
Road Map for Careers in Big Data
Data Science in 2016: Moving Up
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Ad

More from Heiko Paulheim (20)

PDF
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
PDF
From Wikis to Knowledge Graphs
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
PPT
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
ODP
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
ODP
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
ODP
Make Embeddings Semantic Again!
ODP
How much is a Triple?
ODP
Weakly Supervised Learning for Fake News Detection on Twitter
ODP
Fast Approximate A-box Consistency Checking using Machine Learning
PPT
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
ODP
Combining Ontology Matchers via Anomaly Detection
PPT
Gathering Alternative Surface Forms for DBpedia Entities
ODP
Linked Open Data enhanced Knowledge Discovery
ODP
Mining the Web of Linked Data with RapidMiner
ODP
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
PDF
Detecting Incorrect Numerical Data in DBpedia
PDF
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
ODP
Type Inference on Noisy RDF Data
ODP
Extending DBpedia with Wikipedia List Pages
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
From Wikis to Knowledge Graphs
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Make Embeddings Semantic Again!
How much is a Triple?
Weakly Supervised Learning for Fake News Detection on Twitter
Fast Approximate A-box Consistency Checking using Machine Learning
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Combining Ontology Matchers via Anomaly Detection
Gathering Alternative Surface Forms for DBpedia Entities
Linked Open Data enhanced Knowledge Discovery
Mining the Web of Linked Data with RapidMiner
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Detecting Incorrect Numerical Data in DBpedia
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Type Inference on Noisy RDF Data
Extending DBpedia with Wikipedia List Pages
Ad

Recently uploaded (20)

PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Managing Community Partner Relationships
PPTX
Business_Capability_Map_Collection__pptx
PPT
Image processing and pattern recognition 2.ppt
DOCX
Factor Analysis Word Document Presentation
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Introduction to Data Science and Data Analysis
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Steganography Project Steganography Project .pptx
CYBER SECURITY the Next Warefare Tactics
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Managing Community Partner Relationships
Business_Capability_Map_Collection__pptx
Image processing and pattern recognition 2.ppt
Factor Analysis Word Document Presentation
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Optimise Shopper Experiences with a Strong Data Estate.pdf
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Introduction to Data Science and Data Analysis
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Steganography Project Steganography Project .pptx

New Adventures in RDF2vec

  • 1. 12/15/21 Heiko Paulheim 1 New Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim
  • 2. 12/15/21 Heiko Paulheim 2 Brief Introduction 2006 2008 2011 2013 2014 2017 Pre PhD Years PhD Years PostDoc Years Assistant Prof. Full Prof. SDType rdf2vec ReNewRS Kare§KoKI MELT
  • 3. 12/15/21 Heiko Paulheim 3 Knowledge Graphs: At a Glance • Graph shaped knowledge representation – nodes: entities – edges: relations University of Mannheim Mannheim Baden- Württemberg Germany Heiko Paulheim DWS Group employer a f f il i a t io n part of residence s t a t e part of
  • 4. 12/15/21 Heiko Paulheim 4 Knowledge Graphs in Organizations • Knowledge Graphs are used… • …in companies and organizations – collect, organize, and integrate knowledge – link isolated information sources – make information searchable and findable Masuch et al., 2016
  • 5. 12/15/21 Heiko Paulheim 5 Public Knowledge Graphs • Knowledge Graphs are used… • …as (free), public resources – collect common knowledge – general purpose, not task specific – make it easy to build knowledge-intensive applications
  • 6. 12/15/21 Heiko Paulheim 6 Knowledge Graphs: Out of the Dark Google’s Announcement DBpedia YAGO ResearchCyc Wikidata Freebase NELL
  • 7. 12/15/21 Heiko Paulheim 7 Usage of Public Knowledge Graphs OK, Google, when will the final season of Money Heist be on Netflix? The fifth season of Money Heist will be released on September 3rd .
  • 8. 12/15/21 Heiko Paulheim 8 Usage of Public Knowledge Graphs 2021-09-03 2020-04-03 release date release date has part h a s p a r t OK, Google, when will the final season Money Heist be on Netflix? . . .
  • 9. 12/15/21 Heiko Paulheim 9 Usage of Public Knowledge Graphs 2021-09-03 2020-04-03 release date release date creator has part h a s p a r t cast c a s t creator c a s t Are there any other series by the same creator? creator cast cast . . . . . .
  • 10. 12/15/21 Heiko Paulheim 10 Use Cases for Knowledge Graphs • Background Knowledge – e.g., company data (address, CEO, branch, …) → SAP CRM (BSc thesis 2019) – e.g., geographic regions (demographics) → for example, sales data prediction – data interpretation (e.g., Excel tables, business models) → PhD thesis under supervision • Data Integration – unified view of different data sources – relating business entities in different systems – cross-source data visualization and analytics
  • 11. 12/15/21 Heiko Paulheim 11 Knowledge Graphs in Data Science • Typical cases: – predictive modeling, information retrieval, recommendation, … • For all of those, there’s sophisticated implementations – but... ?
  • 12. 12/15/21 Heiko Paulheim 12 Wanted: A Bridge between Both Worlds
  • 13. 12/15/21 Heiko Paulheim 13 Wanted: A Bridge between Both Worlds • Data Science tools for prediction etc. – Python, Weka, R, RapidMiner, … – Algorithms that work on vectors, not graphs • Bridges built over the past years: – FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015), Python KG Extension (2021) ?
  • 14. 12/15/21 Heiko Paulheim 14 Wanted: A Bridge between Both Worlds • Transformation strategies (aka propositionalization) – e.g., types: type_horror_movie=true – e.g., data values: year=2011 – e.g., aggregates: nominations=7 ?
  • 15. 12/15/21 Heiko Paulheim 15 Wanted: A Bridge between Both Worlds • Observations with simple propositionalization strategies – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements • Combinations of relations and individuals – e.g., movies directed by Steven Spielberg • Combinations of relations and types – e.g., movies directed by Oscar-winning directors • … – But • The search space is enormous! • Generate first, filter later does not scale well
  • 16. 12/15/21 Heiko Paulheim 16 Wanted: A Bridge between Both Worlds • Excursion: word embeddings – word2vec proposed by Mikolov et al. (2013) – predict a word from its context or vice versa • Idea: similar words appear in similar contexts, like – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 – usually trained on large text corpora • projection layer: embedding vectors
  • 17. 12/15/21 Heiko Paulheim 17 From Word Embeddings to Graph Embeddings • Basic idea: – extract random walks from an RDF graph: Mulholland Dr. David Lynch US – feed walks into word2vec algorithm • Order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens • Result: – node embeddings – most often outperform other propositionalization techniques director nationality
  • 18. 12/15/21 Heiko Paulheim 18 A First Glance at RDF2vec Embeddings • Observation: close projection of similar entities
  • 19. 12/15/21 Heiko Paulheim 19 Random vs. non-random • Maybe random walks are not such a good idea – They may give too much weight on less-known entities and facts • Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree – Prefer nodes with higher PageRank – … – They may cover less-known entities and facts too little • Strategies: – The opposite of all of the above strategies • External signals (e.g., human notions of importance) – generally work better than graph-internal signals Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings
  • 20. 12/15/21 Heiko Paulheim 20 Local Embeddings • Recap: order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens – “Train once, reuse often” • In some cases, only a small subset (of 6M) is of interest – RDF2vec light: “train when needed” – Runtime: minutes instead of days Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings
  • 21. 12/15/21 Heiko Paulheim 21 RDF2vec: Example Applications • Data Model Matching with WebIsA and RDF2vec Portisch et al. (2019): Evaluating ontology matchers on real-world financial services data models.
  • 22. 12/15/21 Heiko Paulheim 22 RDF2vec: Example Applications • Entity disambiguation: linking texts to a knowledge graph Türker et al. (2019): Knowledge-Based Short Text Categorization Using Entity and Category Embedding
  • 23. 12/15/21 Heiko Paulheim 23 RDF2vec: Example Applications • Finding related research papers on CoViD-19 Steenwinckel et al. (2020): Facilitating COVID-19 Meta-analysis Through a Literature Knowledge Graph
  • 24. 12/15/21 Heiko Paulheim 24 RDF2vec: Example Applications • Table search by keyword Zhang and Balog (2018): Ad Hoc Table Retrieval using Semantic Similarity.
  • 25. 12/15/21 Heiko Paulheim 25 RDF2vec: Example Applications • Predicting biological interactions Sousa et al. (2021): Supervised Semantic Similarity.
  • 26. 12/15/21 Heiko Paulheim 26 RDF2vec: Example Applications • Zero-Shot Image Classification Hascoet et al. (2017): Semantic Web and Zero-Shot Learning of Large Scale Visual Classes.
  • 27. 12/15/21 Heiko Paulheim 27 Embeddings for Link Prediction • RDF2vec example – similar instances form clusters, direction of relation is ~stable – link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing) Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 28. 12/15/21 Heiko Paulheim 28 Embeddings for Link Prediction • In RDF2vec, relation preservation is a by-product • TransE (and its descendants): direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that • across all triples <s,p,o> Σ ||s+p-o|| is minimized • try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 29. 12/15/21 Heiko Paulheim 29 Link Prediction vs. Node Embedding • Hypothesis: – Embeddings for link prediction also cluster similar entities – Node embeddings can also be used for link prediction Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 30. 12/15/21 Heiko Paulheim 30 Similarity vs. Relatedness • Closest 10 entities to Angela Merkel in different vector spaces Portisch et al. (to appear): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 31. 12/15/21 Heiko Paulheim 31 Similarity vs. Relatedness • (s-)RDF2vec allows an explicit trade off w/ different walk strategies Mannheim Baden- Württemberg Germany Adler Mannheim SAP Arena Reiss- Engelhorn -Museum location location location federal state country location city stadium Knowledge Graph Walk Generation Adler_Mannheim → city → Mannheim → country → Germany Adler_Mannheim → stadium → SAP_Arena → location → Mannheim SAP_Arena → location → Mannheim → country → Germany ... “Classic” RDF2vec walks city → Mannheim → country stadium → SAP_Arena → location location → Mannheim → country ... s-RDF2vec walks + RDF2vec “union walks” RDF2vec “classic” RDF2vec “edge” concatenated vector Global PCA Test Cases concatenated vector (task-specific subset) w 2 w 1 (weighted) local PCA Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 32. 12/15/21 Heiko Paulheim 32 Similarity vs. Relatedness • s-RDF2vec – using different walk strategies – combining different vector spaces (weighted combinations are possible) • 10 closest neighbors to Mannheim: Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 33. 12/15/21 Heiko Paulheim 33 Similarity vs. Relatedness • Recap word embeddings: – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 • Graph walks: – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg Germany Angela_Merkel Hamburg birthPlace country leader Peter_Tschentscher leader residence country
  • 34. 12/15/21 Heiko Paulheim 34 Similarity vs. Relatedness • Surrounding entities indicate relatedness – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg • Same entities in similar positions indicate similarity – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg • Someone is a leader vs. something has a leader • Solution approach: use embedding approach that respects positions – CWINDOW / Structured Skip-ngram Portisch and Paulheim (2021): Putting RDF2vec in Order.
  • 35. 12/15/21 Heiko Paulheim 35 Similarity vs. Relatedness • Why bother? – Use case: table interpretation (a special case of entity disambiguation) related similar
  • 36. 12/15/21 Heiko Paulheim 36 Back to Interpretability • Hot topic: Explainable AI – Knowledge Graphs are a favorable ingredient – Human/machine interpretable knowledge → explainable systems • However: – Embeddings replace interpretable axioms with numeric vectors over non-interpretable dimensions – Where did the semantics go? Paulheim (2018): Make Embeddings Semantic Again!
  • 37. 12/15/21 Heiko Paulheim 37 The 2009 Semantic Web Layer Cake
  • 38. 12/15/21 Heiko Paulheim 38 The 2018 Semantic Web Layer Cake Embeddings
  • 39. 12/15/21 Heiko Paulheim 39 Towards Semantic Vector Space Embeddings cartoon superhero Paulheim (2018): Make Embeddings Semantic Again!
  • 40. 12/15/21 Heiko Paulheim 40 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 1: learn interpretation function • Each dimension of the embedding model is a target for a separate learning problem • Learn a function to explain the dimension • E.g.: • Just an approximation used for explanations and justifications y≈−|∃character .Superhero|
  • 41. 12/15/21 Heiko Paulheim 41 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 2: learn inherently interpretable embeddings • Step 1: learn typical patterns that exist in a knowledge graph – e.g., graph pattern learning – e.g., Horn clauses • Step 2a: use those patterns as embedding dimensions – probably not low dimensional • Step 2b: compact the space – e.g., use dimensions for mutually exclusive patterns
  • 42. 12/15/21 Heiko Paulheim 42 Towards Semantic Vector Space Embeddings • Different angle: learn interpretation for similarity function ~similar type ~same country ~connected to same entity
  • 43. 12/15/21 Heiko Paulheim 43 Summary • Knowledge Graphs are a versatile ingredient for AI – Integrated view on data – Large-scale free source of background knowledge • Knowledge Graph Embeddings – Effective processing of large-scale knowledge sources – Encoding of similarity and/or relatedness • RDF2vec: explicit trade-off is possible! – Additional insights that are not explicit in the graph • aka latent semantics
  • 44. 12/15/21 Heiko Paulheim 44 More on RDF2vec • Collection of – Implementations – Pre-trained models – >40 use cases in various domains
  • 45. 12/15/21 Heiko Paulheim 45 Thank you! http://www.heikopaulheim.com @heikopaulheim
  • 46. 12/15/21 Heiko Paulheim 46 New Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim