SlideShare a Scribd company logo
JKU 2021
Knowledge Matters!
The Role of Knowledge Graphs
in Modern AI Systems
16.11.21
Heiko Paulheim 1
AI Ingredients
OK, Google, when will the final
season of Money Heist be on Netflix?
The fifth season of Money Heist
will be released on September 3rd
and December 3rd
.
AI Ingredients
Are there any other series
by the same creator?
Álex Pina has also created
White Lines, The Pier, and Locked Up.
AI Ingredients
● What does an AI system like Google Assistant need?
– Speech recognition, interpretation, and synthesis
– A knowledge base
– Logical reasoning
– …
● ...there are many more other ingredients to AI
– e.g., machine learning, computer vision, ...
16.11.21
Heiko Paulheim 4
AI Ingredients
●
Four components of AI
required to pass a Turing test [1]:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
16.11.21
Heiko Paulheim 5
[1] Russel, Norvig: Artificial Intelligence, A Modern Approach
It’s an Unequal Field
16.11.21
Heiko Paulheim 6
[1] Google Trends, 2021
Human Intelligence Ingredients
● System 1 (think: 2+2)
– Fast
– Intuitive
– Unconscious
– Prone to biases
● System 2 (think: 342+735)
– Slow
– Explicit
– Conscious
– Tedious (hence: lazy)
[1] Kahnemann: Thinking, Fast and Slow
16.11.21
Heiko Paulheim 7
Fast and Slow AI
●
Kahneman for AI [1]
– System 1: ML, Statistics,
Heuristics
– System 2: Explicit reasoning,
knowledge representation,
explanations
●
Neuro-symbolic or Hybrid AI
uses both components
16.11.21
Heiko Paulheim 8
[1] Booch et al. (AAAI 2021): Thinking Fast and Slow in AI
Knowledge Graphs for AI
16.11.21
Heiko Paulheim 9
2021-09-03
2020-04-03
release date
release date
has part
h
a
s
p
a
r
t
OK, Google, when will the final season
Money Heist be on Netflix?
.
.
.
Knowledge Graphs for AI
16.11.21
Heiko Paulheim 10
2021-09-03
2020-04-03
release date
release date
creator
has part
h
a
s
p
a
r
t
cast
c
a
s
t
creator
c
a
s
t
Are there any other series
by the same creator?
creator
cast
cast .
.
.
.
.
.
AIs on the Shoulders of Giants
●
Current knowledge graphs [1]
– Open data
– Millions of entities
– Billions of facts
●
Facilitates AIs access to
– Large-scale factual knowledge
(note: not common sense knowledge)
– e.g., for explanations
16.11.21
Heiko Paulheim 11
[1] Heist et al. (2021): Knowledge Graphs on the Web – An Overview
Knowledge What?
• Knowledge Graphs on the Web
• Everybody talks about them, but what is a Knowledge
Graph?
16.11.21
Heiko Paulheim 12
Journal Paper Review, (Natasha Noy, Google, June 2015):
“Please define what a knowledge graph is – and what it is not.”
Knowledge Graphs for AI
●
Approaches since the 80s
– CyC (and OpenCyc)
– DBpedia & YAGO
– Wikidata
– Linked Open Data Cloud
16.11.21
Heiko Paulheim 13
Knowledge What?
• Working definition [1]: a Knowledge Graph
– mainly describes instances and their relations in the world
• Unlike an ontology
• Unlike, e.g., WordNet
– Defines possible classes and relations in a schema or ontology
• i.e., we know the types of things that are in our graphs
– Has a flexible schema
• Unlike a relational database
– Covers various domains
• Unlike, e.g., Geonames
16.11.21
Heiko Paulheim 14
[1] Paulheim (2017): Knowledge Graph Refinement – A Survey of Approaches and Evaluation
Methods
Knowledge What?
16.11.21
Heiko Paulheim 15
Knowledge What?
● Google uses the knowledge graph...
– for augmenting and improving search results
– for integrating data from various sources
● Some numbers [1]
– >5 billion entities
– >500 billion facts (i.e., edges)
16.11.21
Heiko Paulheim 16
[1] https://blog.google/products/search/about-knowledge-graph-and-knowledge-panels/
A Bit of History
• CyC (started by Douglas Lenat in 1984)
– Encyclopedic collection of knowledge
– Estimation: 350 person years and 250,000 rules
should do the job
of collecting the essence of the world’s knowledge
• The present (as of June 2017)
– ~1,000 person years, $120M total development cost
– 21M axioms and rules
16.11.21
Heiko Paulheim 17
A Bit of Business
● Does that Scale?
– A few back of an envelope calculations [1]
● Cyc contains...
– 21M statements and rules (roughly: „edges“)
– $120M development costs
→ $5,71 per statement
● Google’s Knowledge Graph
– 500 billion statements
– $2.571 trillion
● (that’s ~15 times Google’s net revenue in 2020)
[1] Paulheim (2018): How much is a Triple? Estimating the Cost of Knowledge Graph Creation.
16.11.21
Heiko Paulheim 18
Crowdsourcing Knowledge Graphs
● Freebase (launched 2007)
– Collaborative editing (like Wikipedia)
– Acquired by Google in 2010
– Shut down in 2016
● Wikidata (launched 2012)
– Free, collaborative
– Collects data from different sources
– Today: one of the largest publicly available,
free knowledge graphs
16.11.21
Heiko Paulheim 19
The Business Side of Crowdsourcing Knowledge Graphs
● Freebase: created by laymen
– Assumption: adding a statement to Freebase
equals adding a sentence to Wikipedia
• English Wikipedia up to April 2011: 41M working hours [1]
• size in April 2011: 3.6M pages, avg. 36.4 sentences each
• Using US minimum wage: $2.25 per sentence
→ $2.25 per statement
● Total cost of creating Freebase: $6.75B
– Acquired by Google for $60-$300M
[1] Geiger, Halfaker (2013): Using edit sessions to measure participation in wikipedia
16.11.21
Heiko Paulheim 20
Towards Automatic Knowledge Graph Construction
● Modern AI needs Massive Amounts of Knowledge
● Manual/crowdsourced creation
– Costly
– Does not work at scale
16.11.21
Heiko Paulheim 21
OK, Google, when will the final
season of Money Heist be on Netflix?
Creating Knowledge Graphs from Wikipedia
● Why start from scratch?
– If we already have (semi-)structured knowledge
at our fingertips
● Structured knowledge in Wikipedia
– Infoboxes (cf. Google’s Knowledge Panels)
– Categories
16.11.21
Heiko Paulheim 22
Turning Wikipedia into a Knowledge Graph
● First Observation:
– Many Wikipedia pages are about an entity
– For example: people, places, organizations, works…
16.11.21
Heiko Paulheim 23
Turning Wikipedia into a Knowledge Graph
● Further Observations:
– Articles are interlinked
– Some links have explicit meaning
– There are also numbers and dates
16.11.21
Heiko Paulheim 24
Turning Wikipedia into a Knowledge Graph
● Putting the Pieces Together
16.11.21
Heiko Paulheim 25
Nine_Inch_Nails
The_Downward
_Spiral
artist
1994-03-08
released
…
Trent_Reznor
member producer
...
Knowledge Graphs based on Wikipedia
● DBpedia: launched 2007
– Mapping infoboxes to node classes (e.g., „Person“, „Album“)
– Mapping infobox keys to edge labels (e.g., „artist“, „member“)
– Crowd-sourced mappings
● YAGO: launched 2008
– Using article categories in Wikipedia as classes
– Mapping infobox keys to edge labels
– Expert-created mappings
– Also contains temporal facts
16.11.21
Heiko Paulheim 26
Again: A Bit of Business
● DBpedia: 4.9M LOC, 2.2M LOC for mappings
– software project development: ~37 LOC per hour
(Devanbu et al., 1996)
– we use German PhD salaries as a cost estimate
→ 1.85c per statement
● We save by a factor of >100!
16.11.21
Heiko Paulheim 27
How Big is Big Enough?
● DBpedia and YAGO
– Constrained by the size (i.e., number of entries)
of Wikipedia
– Currently ~6M
● Commonly used recommender system
benchmarks have a coverage of… [1]
– ...85% for movies
– ...63% for music artists
– ...31% for books
16.11.21
Heiko Paulheim 28
https://grouplens.org/datasets/
[1] Di Noia, et al.: SPRank: Semantic Path-based Ranking for Top-n
Recommendations using Linked Open Data. In: ACM TIST, 2016
Let’s Look Closer...
● Red links and unknown instances
16.11.21
Heiko Paulheim 29
Exploiting More Structure in Wikipedia
● Listings and categories also are
structures
● They commonly share…
– a type (e.g., musician, book, …) and/or
– a common relation
● member of the same band
● book by the same author
● actor playing in the same film
… e.g., to
● the entity that represents the page
● ...or an entity mentioned somewhere
16.11.21
Heiko Paulheim 30
Exploiting More Structure in Wikipedia
● CaLiGraph [1]
– Extracts entities from listings
– Derives definitions from categories and list titles
● e.g., „Death Metal Bands“ → genre = Death_Metal
● 15M entities
– incl. 8M from listings
16.11.21
Heiko Paulheim 31
[1] Heist, Paulheim: Information Extraction from Co-Occurring Similar Entities.
In: The Web Conference, 2021
Beyond Wikipedia
16.11.21
Heiko Paulheim 32
Beyond Wikipedia
● Regarding DBpedia and YAGO as a black box
– Input: a copy of Wikipedia
– Output: a knowledge graph
● If we have that black box
– Can’t we input any Wiki?
16.11.21
Heiko Paulheim 33
Magic ;-)
Beyond Wikipedia
● There’s thousands of Wikis
– Plus farms that host thousands themselves
● One of the largest farms: Fandom
16.11.21
Heiko Paulheim 34
Beyond Wikipedia
● Integration of Information from Multiple Wikis
● Challenges:
– Duplicate detection
– Few conventions
– Contradictions
16.11.21
Heiko Paulheim 35
[1] Hertling, Paulheim (2020): DBkWik: Extracting and Integrating Knowledge from
Thousands of Wikis. Knowledge and Information Systems 62(6): 2169-2190
The Story so Far
● We’ve come from AI building blocks:
– Natural language processing
– Knowledge representation
– Automated reasoning
– Machine learning
● How do we put the blocks together?
16.11.21
Heiko Paulheim 36
Using Knowledge Graphs as an Ingredient in AI
●
Automated Reasoning
– The combination of reasoning and knowledge graphs
has a long tradition
– Think of rules on the knowledge graph
– Example: artists on metal albums are metal artists
<Y artist X>, <Y genre Z> → <X genre Z>
16.11.21
Heiko Paulheim 37
Nine_Inch_Nails
The_Downward
_Spiral
artist
Metal
genre
genre
Using Knowledge Graphs as an Ingredient in AI
●
Knowledge Graphs are graphs
– hence the name ;-)
●
Most learning tools are tabular
16.11.21
Heiko Paulheim 38
Using Knowledge Graphs as an Ingredient in AI
● How to create tabular representations of entities in
knowledge graphs?
– Easy: data values (e.g., release date)
– Easy: edges with single occurences (e.g., birth place)
– Complex: edges with multiple occurences (e.g., starring)
16.11.21
Heiko Paulheim 39
?
Hybrid AI with Knowledge Graphs
●
Graphs to vectors!
– Representation learning aka embeddings
●
Approaches (not limited to)
– Language modeling adaptations
(RDF2vec, KGlove, …)
– Tensor factorization
(RESCAL, DistMult, ...)
– Link prediction
(TransE and its descendants)
– Graph Neural Networks
(e.g., GCN)
16.11.21
Heiko Paulheim 40
Knowledge Graph Embeddings
● A recent hype trend
– Each node (and edge)
in the graph is represented
as a point
– Similar nodes
are close in that space
16.11.21
Heiko Paulheim 41
Knowledge Graph Embeddings
● What do we win?
– Each entity is a
numeric vector
– Learning tools can be used
easily
● What do we lose?
– Dimensions do not
carry meaning anymore
16.11.21
Heiko Paulheim 42
Quo Vadis?
●
Knowledge Graphs are also
consumable for humans
– (think: explainable AI)
– but vectors are not!
●
We are missing
an important building block
– in Kahneman’s terms:
we forged system 2
into a new system 1 instead
– Holy grail: interpretable embeddings
16.11.21
Heiko Paulheim 43
Summary
● AI Ingredients
– AIs need knowledge
– e.g., conversational agents: need to know about entites in the world
● Knowledge Graphs
– One representation paradigm for such knowledge
– There are plenty of freely available KGs
– Can be used for explainable AI
16.11.21
Heiko Paulheim 44
JKU 2021
Knowledge Matters!
The Role of Knowledge Graphs
in Modern AI Systems
16.11.21
Heiko Paulheim 45

More Related Content

PDF
From Wikis to Knowledge Graphs
PDF
New Adventures in RDF2vec
PDF
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
ODP
Machine Learning with and for Semantic Web Knowledge Graphs
PDF
Towards Knowledge Graph Profiling
ODP
Knowledge Graphs on the Web
PDF
New Adventures in RDF2vec
From Wikis to Knowledge Graphs
New Adventures in RDF2vec
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Machine Learning with and for Semantic Web Knowledge Graphs
Towards Knowledge Graph Profiling
Knowledge Graphs on the Web
New Adventures in RDF2vec

What's hot (20)

ODP
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
ODP
Make Embeddings Semantic Again!
ODP
Type Inference on Noisy RDF Data
ODP
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
ODP
What the Adoption of schema.org Tells about Linked Open Data
ODP
How much is a Triple?
PDF
Ld4 dh tutorial
PPT
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
ODP
Fast Approximate A-box Consistency Checking using Machine Learning
ODP
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
PPT
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
PDF
Linking knowledge spaces
PDF
Schema.org: Where did that come from!
PDF
Linked Data in Libraries
PDF
Getting Started with Knowledge Graphs
PDF
An introduction to Linked Open Data
PDF
Schema.org where did that come from?
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Machine Learning & Embeddings for Large Knowledge Graphs
Make Embeddings Semantic Again!
Type Inference on Noisy RDF Data
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
What the Adoption of schema.org Tells about Linked Open Data
How much is a Triple?
Ld4 dh tutorial
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Fast Approximate A-box Consistency Checking using Machine Learning
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
DBpedia: A Public Data Infrastructure for the Web of Data
Linking knowledge spaces
Schema.org: Where did that come from!
Linked Data in Libraries
Getting Started with Knowledge Graphs
An introduction to Linked Open Data
Schema.org where did that come from?
Ad

Similar to Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems (20)

PDF
The discovery of knowledge graphs and their utility in biotech
PDF
Introduction to Knowledge Graphs for Information Architects.pdf
PPTX
Semantics and Machine Learning
PPTX
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
PDF
Enterprise Scale Knowledge Graphs
PDF
ESWC 2017 Tutorial Knowledge Graphs
PDF
Ten myths about knowledge graphs.pdf
PDF
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
PDF
Knowledge graphs + Chatbots with Neo4j
PDF
Benefiting from Semantic AI along the data life cycle
PDF
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
PDF
PoolParty Semantic Classifier
PDF
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
PPTX
The Relevance of the Apache Solr Semantic Knowledge Graph
PPTX
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
PPTX
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
PDF
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
PDF
Provenance in Data Science From Data Models to Context Aware Knowledge Graphs...
PDF
week1 - What_Is_A_Knowledge_Graphs_S.pdf
ODT
Riding The Semantic Wave
The discovery of knowledge graphs and their utility in biotech
Introduction to Knowledge Graphs for Information Architects.pdf
Semantics and Machine Learning
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Enterprise Scale Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Ten myths about knowledge graphs.pdf
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Knowledge graphs + Chatbots with Neo4j
Benefiting from Semantic AI along the data life cycle
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
PoolParty Semantic Classifier
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
The Relevance of the Apache Solr Semantic Knowledge Graph
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
Provenance in Data Science From Data Models to Context Aware Knowledge Graphs...
week1 - What_Is_A_Knowledge_Graphs_S.pdf
Riding The Semantic Wave
Ad

More from Heiko Paulheim (10)

PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
ODP
Weakly Supervised Learning for Fake News Detection on Twitter
ODP
Combining Ontology Matchers via Anomaly Detection
PPT
Gathering Alternative Surface Forms for DBpedia Entities
ODP
Linked Open Data enhanced Knowledge Discovery
ODP
Mining the Web of Linked Data with RapidMiner
ODP
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
PDF
Detecting Incorrect Numerical Data in DBpedia
PDF
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
ODP
Extending DBpedia with Wikipedia List Pages
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Weakly Supervised Learning for Fake News Detection on Twitter
Combining Ontology Matchers via Anomaly Detection
Gathering Alternative Surface Forms for DBpedia Entities
Linked Open Data enhanced Knowledge Discovery
Mining the Web of Linked Data with RapidMiner
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Detecting Incorrect Numerical Data in DBpedia
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Extending DBpedia with Wikipedia List Pages

Recently uploaded (20)

PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Business Analytics and business intelligence.pdf
PDF
Lecture1 pattern recognition............
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
1_Introduction to advance data techniques.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
Acceptance and paychological effects of mandatory extra coach I classes.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Analytics and business intelligence.pdf
Lecture1 pattern recognition............
ISS -ESG Data flows What is ESG and HowHow
1_Introduction to advance data techniques.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to machine learning and Linear Models
Introduction-to-Cloud-ComputingFinal.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Clinical guidelines as a resource for EBP(1).pdf

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

  • 1. JKU 2021 Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems 16.11.21 Heiko Paulheim 1
  • 2. AI Ingredients OK, Google, when will the final season of Money Heist be on Netflix? The fifth season of Money Heist will be released on September 3rd and December 3rd .
  • 3. AI Ingredients Are there any other series by the same creator? Álex Pina has also created White Lines, The Pier, and Locked Up.
  • 4. AI Ingredients ● What does an AI system like Google Assistant need? – Speech recognition, interpretation, and synthesis – A knowledge base – Logical reasoning – … ● ...there are many more other ingredients to AI – e.g., machine learning, computer vision, ... 16.11.21 Heiko Paulheim 4
  • 5. AI Ingredients ● Four components of AI required to pass a Turing test [1]: – Natural language processing – Knowledge representation – Automated reasoning – Machine learning 16.11.21 Heiko Paulheim 5 [1] Russel, Norvig: Artificial Intelligence, A Modern Approach
  • 6. It’s an Unequal Field 16.11.21 Heiko Paulheim 6 [1] Google Trends, 2021
  • 7. Human Intelligence Ingredients ● System 1 (think: 2+2) – Fast – Intuitive – Unconscious – Prone to biases ● System 2 (think: 342+735) – Slow – Explicit – Conscious – Tedious (hence: lazy) [1] Kahnemann: Thinking, Fast and Slow 16.11.21 Heiko Paulheim 7
  • 8. Fast and Slow AI ● Kahneman for AI [1] – System 1: ML, Statistics, Heuristics – System 2: Explicit reasoning, knowledge representation, explanations ● Neuro-symbolic or Hybrid AI uses both components 16.11.21 Heiko Paulheim 8 [1] Booch et al. (AAAI 2021): Thinking Fast and Slow in AI
  • 9. Knowledge Graphs for AI 16.11.21 Heiko Paulheim 9 2021-09-03 2020-04-03 release date release date has part h a s p a r t OK, Google, when will the final season Money Heist be on Netflix? . . .
  • 10. Knowledge Graphs for AI 16.11.21 Heiko Paulheim 10 2021-09-03 2020-04-03 release date release date creator has part h a s p a r t cast c a s t creator c a s t Are there any other series by the same creator? creator cast cast . . . . . .
  • 11. AIs on the Shoulders of Giants ● Current knowledge graphs [1] – Open data – Millions of entities – Billions of facts ● Facilitates AIs access to – Large-scale factual knowledge (note: not common sense knowledge) – e.g., for explanations 16.11.21 Heiko Paulheim 11 [1] Heist et al. (2021): Knowledge Graphs on the Web – An Overview
  • 12. Knowledge What? • Knowledge Graphs on the Web • Everybody talks about them, but what is a Knowledge Graph? 16.11.21 Heiko Paulheim 12 Journal Paper Review, (Natasha Noy, Google, June 2015): “Please define what a knowledge graph is – and what it is not.”
  • 13. Knowledge Graphs for AI ● Approaches since the 80s – CyC (and OpenCyc) – DBpedia & YAGO – Wikidata – Linked Open Data Cloud 16.11.21 Heiko Paulheim 13
  • 14. Knowledge What? • Working definition [1]: a Knowledge Graph – mainly describes instances and their relations in the world • Unlike an ontology • Unlike, e.g., WordNet – Defines possible classes and relations in a schema or ontology • i.e., we know the types of things that are in our graphs – Has a flexible schema • Unlike a relational database – Covers various domains • Unlike, e.g., Geonames 16.11.21 Heiko Paulheim 14 [1] Paulheim (2017): Knowledge Graph Refinement – A Survey of Approaches and Evaluation Methods
  • 16. Knowledge What? ● Google uses the knowledge graph... – for augmenting and improving search results – for integrating data from various sources ● Some numbers [1] – >5 billion entities – >500 billion facts (i.e., edges) 16.11.21 Heiko Paulheim 16 [1] https://blog.google/products/search/about-knowledge-graph-and-knowledge-panels/
  • 17. A Bit of History • CyC (started by Douglas Lenat in 1984) – Encyclopedic collection of knowledge – Estimation: 350 person years and 250,000 rules should do the job of collecting the essence of the world’s knowledge • The present (as of June 2017) – ~1,000 person years, $120M total development cost – 21M axioms and rules 16.11.21 Heiko Paulheim 17
  • 18. A Bit of Business ● Does that Scale? – A few back of an envelope calculations [1] ● Cyc contains... – 21M statements and rules (roughly: „edges“) – $120M development costs → $5,71 per statement ● Google’s Knowledge Graph – 500 billion statements – $2.571 trillion ● (that’s ~15 times Google’s net revenue in 2020) [1] Paulheim (2018): How much is a Triple? Estimating the Cost of Knowledge Graph Creation. 16.11.21 Heiko Paulheim 18
  • 19. Crowdsourcing Knowledge Graphs ● Freebase (launched 2007) – Collaborative editing (like Wikipedia) – Acquired by Google in 2010 – Shut down in 2016 ● Wikidata (launched 2012) – Free, collaborative – Collects data from different sources – Today: one of the largest publicly available, free knowledge graphs 16.11.21 Heiko Paulheim 19
  • 20. The Business Side of Crowdsourcing Knowledge Graphs ● Freebase: created by laymen – Assumption: adding a statement to Freebase equals adding a sentence to Wikipedia • English Wikipedia up to April 2011: 41M working hours [1] • size in April 2011: 3.6M pages, avg. 36.4 sentences each • Using US minimum wage: $2.25 per sentence → $2.25 per statement ● Total cost of creating Freebase: $6.75B – Acquired by Google for $60-$300M [1] Geiger, Halfaker (2013): Using edit sessions to measure participation in wikipedia 16.11.21 Heiko Paulheim 20
  • 21. Towards Automatic Knowledge Graph Construction ● Modern AI needs Massive Amounts of Knowledge ● Manual/crowdsourced creation – Costly – Does not work at scale 16.11.21 Heiko Paulheim 21 OK, Google, when will the final season of Money Heist be on Netflix?
  • 22. Creating Knowledge Graphs from Wikipedia ● Why start from scratch? – If we already have (semi-)structured knowledge at our fingertips ● Structured knowledge in Wikipedia – Infoboxes (cf. Google’s Knowledge Panels) – Categories 16.11.21 Heiko Paulheim 22
  • 23. Turning Wikipedia into a Knowledge Graph ● First Observation: – Many Wikipedia pages are about an entity – For example: people, places, organizations, works… 16.11.21 Heiko Paulheim 23
  • 24. Turning Wikipedia into a Knowledge Graph ● Further Observations: – Articles are interlinked – Some links have explicit meaning – There are also numbers and dates 16.11.21 Heiko Paulheim 24
  • 25. Turning Wikipedia into a Knowledge Graph ● Putting the Pieces Together 16.11.21 Heiko Paulheim 25 Nine_Inch_Nails The_Downward _Spiral artist 1994-03-08 released … Trent_Reznor member producer ...
  • 26. Knowledge Graphs based on Wikipedia ● DBpedia: launched 2007 – Mapping infoboxes to node classes (e.g., „Person“, „Album“) – Mapping infobox keys to edge labels (e.g., „artist“, „member“) – Crowd-sourced mappings ● YAGO: launched 2008 – Using article categories in Wikipedia as classes – Mapping infobox keys to edge labels – Expert-created mappings – Also contains temporal facts 16.11.21 Heiko Paulheim 26
  • 27. Again: A Bit of Business ● DBpedia: 4.9M LOC, 2.2M LOC for mappings – software project development: ~37 LOC per hour (Devanbu et al., 1996) – we use German PhD salaries as a cost estimate → 1.85c per statement ● We save by a factor of >100! 16.11.21 Heiko Paulheim 27
  • 28. How Big is Big Enough? ● DBpedia and YAGO – Constrained by the size (i.e., number of entries) of Wikipedia – Currently ~6M ● Commonly used recommender system benchmarks have a coverage of… [1] – ...85% for movies – ...63% for music artists – ...31% for books 16.11.21 Heiko Paulheim 28 https://grouplens.org/datasets/ [1] Di Noia, et al.: SPRank: Semantic Path-based Ranking for Top-n Recommendations using Linked Open Data. In: ACM TIST, 2016
  • 29. Let’s Look Closer... ● Red links and unknown instances 16.11.21 Heiko Paulheim 29
  • 30. Exploiting More Structure in Wikipedia ● Listings and categories also are structures ● They commonly share… – a type (e.g., musician, book, …) and/or – a common relation ● member of the same band ● book by the same author ● actor playing in the same film … e.g., to ● the entity that represents the page ● ...or an entity mentioned somewhere 16.11.21 Heiko Paulheim 30
  • 31. Exploiting More Structure in Wikipedia ● CaLiGraph [1] – Extracts entities from listings – Derives definitions from categories and list titles ● e.g., „Death Metal Bands“ → genre = Death_Metal ● 15M entities – incl. 8M from listings 16.11.21 Heiko Paulheim 31 [1] Heist, Paulheim: Information Extraction from Co-Occurring Similar Entities. In: The Web Conference, 2021
  • 33. Beyond Wikipedia ● Regarding DBpedia and YAGO as a black box – Input: a copy of Wikipedia – Output: a knowledge graph ● If we have that black box – Can’t we input any Wiki? 16.11.21 Heiko Paulheim 33 Magic ;-)
  • 34. Beyond Wikipedia ● There’s thousands of Wikis – Plus farms that host thousands themselves ● One of the largest farms: Fandom 16.11.21 Heiko Paulheim 34
  • 35. Beyond Wikipedia ● Integration of Information from Multiple Wikis ● Challenges: – Duplicate detection – Few conventions – Contradictions 16.11.21 Heiko Paulheim 35 [1] Hertling, Paulheim (2020): DBkWik: Extracting and Integrating Knowledge from Thousands of Wikis. Knowledge and Information Systems 62(6): 2169-2190
  • 36. The Story so Far ● We’ve come from AI building blocks: – Natural language processing – Knowledge representation – Automated reasoning – Machine learning ● How do we put the blocks together? 16.11.21 Heiko Paulheim 36
  • 37. Using Knowledge Graphs as an Ingredient in AI ● Automated Reasoning – The combination of reasoning and knowledge graphs has a long tradition – Think of rules on the knowledge graph – Example: artists on metal albums are metal artists <Y artist X>, <Y genre Z> → <X genre Z> 16.11.21 Heiko Paulheim 37 Nine_Inch_Nails The_Downward _Spiral artist Metal genre genre
  • 38. Using Knowledge Graphs as an Ingredient in AI ● Knowledge Graphs are graphs – hence the name ;-) ● Most learning tools are tabular 16.11.21 Heiko Paulheim 38
  • 39. Using Knowledge Graphs as an Ingredient in AI ● How to create tabular representations of entities in knowledge graphs? – Easy: data values (e.g., release date) – Easy: edges with single occurences (e.g., birth place) – Complex: edges with multiple occurences (e.g., starring) 16.11.21 Heiko Paulheim 39 ?
  • 40. Hybrid AI with Knowledge Graphs ● Graphs to vectors! – Representation learning aka embeddings ● Approaches (not limited to) – Language modeling adaptations (RDF2vec, KGlove, …) – Tensor factorization (RESCAL, DistMult, ...) – Link prediction (TransE and its descendants) – Graph Neural Networks (e.g., GCN) 16.11.21 Heiko Paulheim 40
  • 41. Knowledge Graph Embeddings ● A recent hype trend – Each node (and edge) in the graph is represented as a point – Similar nodes are close in that space 16.11.21 Heiko Paulheim 41
  • 42. Knowledge Graph Embeddings ● What do we win? – Each entity is a numeric vector – Learning tools can be used easily ● What do we lose? – Dimensions do not carry meaning anymore 16.11.21 Heiko Paulheim 42
  • 43. Quo Vadis? ● Knowledge Graphs are also consumable for humans – (think: explainable AI) – but vectors are not! ● We are missing an important building block – in Kahneman’s terms: we forged system 2 into a new system 1 instead – Holy grail: interpretable embeddings 16.11.21 Heiko Paulheim 43
  • 44. Summary ● AI Ingredients – AIs need knowledge – e.g., conversational agents: need to know about entites in the world ● Knowledge Graphs – One representation paradigm for such knowledge – There are plenty of freely available KGs – Can be used for explainable AI 16.11.21 Heiko Paulheim 44
  • 45. JKU 2021 Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems 16.11.21 Heiko Paulheim 45