SlideShare a Scribd company logo
Empirical Semantics
modelling knowledge as it is,
not as it should be
Frank van Harmelen
Vrije Universiteit Amsterdam
Creative Commons License
CC BY 3.0:
Allowed to copy, redistribute
remix & transform
But must attribute
1
Many thanks to all at
KR&R@VU: Wouter Beek, Joe
Raad, Peter Bloem, Stefan
Schlobach, Zhisheng Huang,
and many others over the years
The ‘K’ in ‘Semantic Web’
stands for ‘Knowledge’
Frank van Harmelen
Vrije Universiteit Amsterdam
Creative Commons License
CC BY 3.0:
Allowed to copy, redistribute
remix & transform
But must attribute
2
Many thanks to all at
KR&R@VU: Wouter Beek, Joe
Raad, Peter Bloem, Stefan
Schlobach, Zhisheng Huang,
and many others over the years
Prescriptive Semantics
versus
Descriptive Semantics
3
Formal Semantics
versus
Empirical Semantics
4
OWL Semantics fits on one A4
OWL Semantics fits on one A4
• The world consists of
– Objects (“individuals”)
– Sets of objects (“types”)
– Pairs of objects (“relations”)
• The world can be described by operations of
these sets: 𝑇1 ∪ 𝑇2, 𝑇1 ∩ 𝑇2, 𝑇1 T2
7
8
Empirical Semantics
requires:
observing knowledge
at scale
9
So we need an
observational
instrument
Empirical Semantics
LOD Laundromat
Beek & Rietveld et al. 2014,
LOD laundromat: a uniform way of
publishing other people's dirty data
http://lodlaundromat.org/pdf/lodla
undry.pdf
HDT
Fernández & Martínez-Prieto &
Gutiérrez, 2013, Binary RDF
representation for publication and
exchange (HDT)
LDF
Verborgh & Vander Sande et al.
2014, Web-Scale Querying through
Linked Data Fragments
LOD-a-lot
1 file
28,362,198,927 unique triples
>650K data documents
LDF queries in real time
Surprisingly efficient
524 GB of disk space
16 GB of RAM
Only 144 secs loading time
Only €305,- hardware cost
Meta-Data for a lot of LOD
http://www.semantic-web-journal.net/content/meta-data-lot-lod-2
http://lod-a-lot.lod.labs.vu.nl/
Insights from
Empirical Semantics:
1. Identity correctness
14
Joe Raad Wouter Beek
owl:sameAs is not optional
15
But in practice
it’s broken under
the formal semantics
Meet our observatory:
http://SameAs.cc
• 559 million owl:sameAs statements
(we created an HDT file in 4 hours on 1 CPU core)
= 4.5GB + 2.2GB index)
• 50 million equivalence classes after inference
(5 hours on 2CPU cores; 9.3Gb disk only(!) RocksDB)
16
The largest equivalence class has 177.749 entities
and contains:
• Albert Einstein
• all countries of the world
• the empty string
Formal Semantics says:
This is obviously broken…. 17
Refl: ∀𝑥: (𝑥 = 𝑥)
Symm: ∀𝑥, 𝑦: (𝑥 = 𝑦) → (𝑦 = 𝑥)
Trans: ∀𝑥, 𝑦, 𝑧: 𝑥 = 𝑦 ∧ 𝑦 = 𝑧 → (𝑥 = 𝑧)
Oldest known
knowledge
graph 
(Pssss, this is not a new problem…)
18
FatherSon
Holy
Spirit
A modern example: Barak Obama
A modern example: Barak Obama
Community 0
1. dbpedia.org/resource/B_hussein_obama
2. dbpedia.org/resource/Barack_H_Obama,_Jr
3. dbpedia.org/resource/Barak_hussein_obama
4. dbpedia.org/resource/President_Barack
5. dbpedia.org/resource/Senator_Barack_Obama
6. dbpedia.org/resource/Obama
…
99. dbpedia.org/resource/Hussein_Obama
Community 3
1. dbpedia.org/resource/Presidency_of_Barack_Obama
2. dbpedia.org/resource/Barack_Obama_Administration
3. dbpedia.org/resource/Barack_Obama_Cabinet
4. dbpedia.org/resource/Obama_White_House
5. dbpedia.org/resource/Obama_regime
6. dbpedia.org/resource/America_under_Obama
…
52. dbpedia.org/resource/Presidential_transition_of_Barac
k_Obama
Debugging identity
by community detection
Communities correspond to roles:
- Person
- Senator
- President
- Government
Message from Empirical
Semantics
It’s not the users that got owl:sameAs wrong,
It’s the formal semantics that got reality wrong
Challenge:
What alternative semantic model of equality
would fit the empirically observed usage better?
Insights from
Empirical Semantics:
2. Meaningful names
23
Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016)
http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
Symbols or words?
(or: blasphemy for logicians)
Formal Semantics says:
Symbol names are supposed to be meaningless
Aspirin headache
analgesic pain
symptomdrug
treats
treats
Measure mutual information content
between URL-string and semantics
E(x) = efficient encoding of x,
If x  y then E(x+y)  E(x) else E(x+y)  E(x)+E(y)
Mutual information content
M(x,y) =E(x) + E(y) – E(x+y)
Take x = symbol name of x as a string
Take 𝑦1 = types of x (≈ semantics of x)
Take 𝑦2 = properties of x (≈ semantics of x)
Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols
in 600k datasets
But URL-strings do encode meaning!
Fraction of datasets with redundancy for types/predicates
at significance level > 0.99
BTW, this is 600.000 datapoints (RDF docs)
Properties
Types
Message from Empirical
Semantics
Users shouldn’t stop using meaningful names,
Formal semantics should capture their meaning
Challenge:
What alternative semantic models
could capture meaningful names?
Zhisheng Huang
(ISWC 2008)
Insights from
Empirical Semantics:
3. Meaningful names
for local consistency
Knowledge will be inconsistent
Because of:
• Homonyms
• Different ontological models
• migration from legacy data
• integration of multiple sources
• ….
Inconsistency through migration
DICE terminology,
in daily use at Amsterdam Medical Centre
for registration of Intensive Care patients
• Brain  CentralNervousSystem
• Brain  BodyPart
• CentralNervousSystem  NervousSystem
• BodyPart  NervousSystem
Inconsistency through automated learning
• Reservoir  Lake
• Lake  WaterRegion
• Reservoir  HydrographicStructure
• HydrographicStrure  Facility
• Disjoint(WaterRegion, Facility),
100% expert agreement
on this disjointness….
Inconsistency through merging
SUMO(1000) + CYC(1.6M) → 6000 inconsistencies…
Local consistency
s(T,,2)s(T,,0)s(T,,1)
But… how to define s(T,𝜙,n)?
Symbols as words
Waterregion
basin Lake
Reservoir
H. structure
Facility
Google Distance
(symbols as words!)
Reservoir  Lake
Lake  WaterRegion
Reservoir  HydrographicStructure
HydrographicStrure  Facility
Disjoint(WaterRegion, Facility)
Google Distance for selection function in
local consistency reasoning
ISWC08
Formal
Semantics
says: this isn’t
supposed to
work!
Insight from
Empirical Semantics
Users shouldn’t stop using meaningful names,
Formal semantics should capture their meaning
Challenge:
What alternative semantic models
would capture meaningful names?
Challenge for
Empirical Semantics:
4. network structures
for different predicates
Tobias Kuhn Wouter Beek
http://ceur-ws.org/Vol-1946/paper-05.pdf
skos:exactMatch
foaf:knows
osspr:contains
Geopolitics:hasborderWith
Message from
Empirical Semantics
None of these patterns have any semantic impact
(you can’t even detect them under the traditional semantics)
Challenge:
What alternative semantic models would
take such different patterns into account?
So what…
So what #1 (pragmatic)
• We now have larger KB’s than ever before
• We now have the instruments
to observe and analyse these very large KB’s
• We can use these insights for better tools:
– query & inference
– publish & maintain
– visualise & explain
– …
My secret hope is that this will help us
to understand the patterns of knowledge:
Not a prescriptive theory of
what knowledge should be,
But a descriptive theory of
what knowledge is actually like
So what #2 (pretentious)

More Related Content

PDF
ESWC 2017 Tutorial Knowledge Graphs
PPTX
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
PPTX
Building a Virtual Data Lake with Apache Arrow
PPTX
Apache Arrow: In Theory, In Practice
PDF
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
PDF
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
PDF
Enterprise Knowledge Graph
ESWC 2017 Tutorial Knowledge Graphs
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Building a Virtual Data Lake with Apache Arrow
Apache Arrow: In Theory, In Practice
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Enterprise Knowledge Graph

What's hot (20)

PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PPTX
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
PPTX
Best Practices Using RTI Connext DDS
PDF
Spark SQL
PDF
Knowledge Graphs - The Power of Graph-Based Search
PPTX
Introduction to Apache Spark
PDF
What Is RDD In Spark? | Edureka
PDF
Elk - An introduction
PDF
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
PPTX
Change data capture
PDF
Accelerating Data Ingestion with Databricks Autoloader
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
Indexing with MongoDB
DOCX
Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)
PDF
Intro to Delta Lake
PPTX
Knowledge Graph Introduction
PDF
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
PPTX
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PPT
CIDOC CRM Tutorial
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Best Practices Using RTI Connext DDS
Spark SQL
Knowledge Graphs - The Power of Graph-Based Search
Introduction to Apache Spark
What Is RDD In Spark? | Edureka
Elk - An introduction
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Change data capture
Accelerating Data Ingestion with Databricks Autoloader
Evening out the uneven: dealing with skew in Flink
Indexing with MongoDB
Multiple Flat Files(CSV) to Target Table in ODI12c(12.2.1.0.0)
Intro to Delta Lake
Knowledge Graph Introduction
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
CIDOC CRM Tutorial
Ad

Similar to Empirical Semantics (20)

PDF
From Research Objects to Reproducible Science Tales
PPTX
Linked Data: principles and examples
PPTX
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
PPT
Describing Everything - Open Web standards and classification
PPT
Natural Language Processing
PPTX
How the Web can change social science research (including yours)
PPTX
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
PPSX
Iconclass aat cidoc 2017 tbilisi
PDF
Lecture: Question Answering
PDF
Interpretation, Context, and Metadata: Examples from Open Context
PPT
Importing life science at a into Neo4j
PPTX
The Empirical Turn in Knowledge Representation
PPTX
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
PDF
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
PPTX
Franz 2017 sols cbs seminar the limits of synthesis for integrative biology
PPT
Using the Semantic Web, and Contributing to it
PDF
Reuse of Ontology Mappings
PPTX
Big Data Case Studies
PDF
A Sightseeing Tour of Provenance in Databases & Workflows
PPT
Hosting public domain chemicals data online for the community – the challenge...
From Research Objects to Reproducible Science Tales
Linked Data: principles and examples
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Describing Everything - Open Web standards and classification
Natural Language Processing
How the Web can change social science research (including yours)
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...
Iconclass aat cidoc 2017 tbilisi
Lecture: Question Answering
Interpretation, Context, and Metadata: Examples from Open Context
Importing life science at a into Neo4j
The Empirical Turn in Knowledge Representation
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Vocabularies as Linked Data: SENESCHAL & HeritageData.org
Franz 2017 sols cbs seminar the limits of synthesis for integrative biology
Using the Semantic Web, and Contributing to it
Reuse of Ontology Mappings
Big Data Case Studies
A Sightseeing Tour of Provenance in Databases & Workflows
Hosting public domain chemicals data online for the community – the challenge...
Ad

More from Frank van Harmelen (20)

PPTX
Neuro-symbolic is not enough, we need neuro-*semantic*
PPTX
The K in "neuro-symbolic" stands for "knowledge"
PPTX
Adoption of Knowledge Graphs, mid 2022 (incomplete)
PPTX
Modular design patterns for systems that learn and reason: a boxology
PPTX
Adoption of Knowledge Graphs, late 2019
PPTX
Adoption of Knowledge Graphs, mid 2019
PPTX
The end of the scientific paper as we know it (or not...)
PPTX
On the nature of AI, and the relation between symbolic and statistical approa...
PPTX
The end of the scientific paper as we know it (in 4 easy steps)
PPTX
Linked Open Data for Medical Guidelines Interactions
PPTX
The Web of Data: do we actually understand what we built?
PPTX
Semantic Web questions we couldn't ask 10 years ago
PPT
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
PPTX
Informatics is a natural science
PPTX
4 Popular Fallacies about the Semantic Web
PPT
PPT
Het slimme Web 3.0
PPT
OWL briefing
PPT
RDF briefing
PPT
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Neuro-symbolic is not enough, we need neuro-*semantic*
The K in "neuro-symbolic" stands for "knowledge"
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Modular design patterns for systems that learn and reason: a boxology
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, mid 2019
The end of the scientific paper as we know it (or not...)
On the nature of AI, and the relation between symbolic and statistical approa...
The end of the scientific paper as we know it (in 4 easy steps)
Linked Open Data for Medical Guidelines Interactions
The Web of Data: do we actually understand what we built?
Semantic Web questions we couldn't ask 10 years ago
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Informatics is a natural science
4 Popular Fallacies about the Semantic Web
Het slimme Web 3.0
OWL briefing
RDF briefing
Semantic Web research anno 2006:main streams, popular falacies, current statu...

Recently uploaded (20)

PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Sciences of Europe No 170 (2025)
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
2. Earth - The Living Planet earth and life
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
2Systematics of Living Organisms t-.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Sciences of Europe No 170 (2025)
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
INTRODUCTION TO EVS | Concept of sustainability
Derivatives of integument scales, beaks, horns,.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
bbec55_b34400a7914c42429908233dbd381773.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Biophysics 2.pdffffffffffffffffffffffffff
2. Earth - The Living Planet earth and life
microscope-Lecturecjchchchchcuvuvhc.pptx
neck nodes and dissection types and lymph nodes levels
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

Empirical Semantics

  • 1. Empirical Semantics modelling knowledge as it is, not as it should be Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License CC BY 3.0: Allowed to copy, redistribute remix & transform But must attribute 1 Many thanks to all at KR&R@VU: Wouter Beek, Joe Raad, Peter Bloem, Stefan Schlobach, Zhisheng Huang, and many others over the years
  • 2. The ‘K’ in ‘Semantic Web’ stands for ‘Knowledge’ Frank van Harmelen Vrije Universiteit Amsterdam Creative Commons License CC BY 3.0: Allowed to copy, redistribute remix & transform But must attribute 2 Many thanks to all at KR&R@VU: Wouter Beek, Joe Raad, Peter Bloem, Stefan Schlobach, Zhisheng Huang, and many others over the years
  • 5. OWL Semantics fits on one A4
  • 6. OWL Semantics fits on one A4 • The world consists of – Objects (“individuals”) – Sets of objects (“types”) – Pairs of objects (“relations”) • The world can be described by operations of these sets: 𝑇1 ∪ 𝑇2, 𝑇1 ∩ 𝑇2, 𝑇1 T2
  • 7. 7
  • 8. 8
  • 10. So we need an observational instrument
  • 12. LOD Laundromat Beek & Rietveld et al. 2014, LOD laundromat: a uniform way of publishing other people's dirty data http://lodlaundromat.org/pdf/lodla undry.pdf HDT Fernández & Martínez-Prieto & Gutiérrez, 2013, Binary RDF representation for publication and exchange (HDT) LDF Verborgh & Vander Sande et al. 2014, Web-Scale Querying through Linked Data Fragments
  • 13. LOD-a-lot 1 file 28,362,198,927 unique triples >650K data documents LDF queries in real time Surprisingly efficient 524 GB of disk space 16 GB of RAM Only 144 secs loading time Only €305,- hardware cost Meta-Data for a lot of LOD http://www.semantic-web-journal.net/content/meta-data-lot-lod-2 http://lod-a-lot.lod.labs.vu.nl/
  • 14. Insights from Empirical Semantics: 1. Identity correctness 14 Joe Raad Wouter Beek
  • 15. owl:sameAs is not optional 15 But in practice it’s broken under the formal semantics
  • 16. Meet our observatory: http://SameAs.cc • 559 million owl:sameAs statements (we created an HDT file in 4 hours on 1 CPU core) = 4.5GB + 2.2GB index) • 50 million equivalence classes after inference (5 hours on 2CPU cores; 9.3Gb disk only(!) RocksDB) 16
  • 17. The largest equivalence class has 177.749 entities and contains: • Albert Einstein • all countries of the world • the empty string Formal Semantics says: This is obviously broken…. 17 Refl: ∀𝑥: (𝑥 = 𝑥) Symm: ∀𝑥, 𝑦: (𝑥 = 𝑦) → (𝑦 = 𝑥) Trans: ∀𝑥, 𝑦, 𝑧: 𝑥 = 𝑦 ∧ 𝑦 = 𝑧 → (𝑥 = 𝑧)
  • 18. Oldest known knowledge graph  (Pssss, this is not a new problem…) 18 FatherSon Holy Spirit
  • 19. A modern example: Barak Obama
  • 20. A modern example: Barak Obama
  • 21. Community 0 1. dbpedia.org/resource/B_hussein_obama 2. dbpedia.org/resource/Barack_H_Obama,_Jr 3. dbpedia.org/resource/Barak_hussein_obama 4. dbpedia.org/resource/President_Barack 5. dbpedia.org/resource/Senator_Barack_Obama 6. dbpedia.org/resource/Obama … 99. dbpedia.org/resource/Hussein_Obama Community 3 1. dbpedia.org/resource/Presidency_of_Barack_Obama 2. dbpedia.org/resource/Barack_Obama_Administration 3. dbpedia.org/resource/Barack_Obama_Cabinet 4. dbpedia.org/resource/Obama_White_House 5. dbpedia.org/resource/Obama_regime 6. dbpedia.org/resource/America_under_Obama … 52. dbpedia.org/resource/Presidential_transition_of_Barac k_Obama Debugging identity by community detection Communities correspond to roles: - Person - Senator - President - Government
  • 22. Message from Empirical Semantics It’s not the users that got owl:sameAs wrong, It’s the formal semantics that got reality wrong Challenge: What alternative semantic model of equality would fit the empirically observed usage better?
  • 23. Insights from Empirical Semantics: 2. Meaningful names 23 Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016) http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
  • 24. Symbols or words? (or: blasphemy for logicians) Formal Semantics says: Symbol names are supposed to be meaningless Aspirin headache analgesic pain symptomdrug treats treats
  • 25. Measure mutual information content between URL-string and semantics E(x) = efficient encoding of x, If x  y then E(x+y)  E(x) else E(x+y)  E(x)+E(y) Mutual information content M(x,y) =E(x) + E(y) – E(x+y) Take x = symbol name of x as a string Take 𝑦1 = types of x (≈ semantics of x) Take 𝑦2 = properties of x (≈ semantics of x) Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols in 600k datasets
  • 26. But URL-strings do encode meaning! Fraction of datasets with redundancy for types/predicates at significance level > 0.99 BTW, this is 600.000 datapoints (RDF docs) Properties Types
  • 27. Message from Empirical Semantics Users shouldn’t stop using meaningful names, Formal semantics should capture their meaning Challenge: What alternative semantic models could capture meaningful names?
  • 28. Zhisheng Huang (ISWC 2008) Insights from Empirical Semantics: 3. Meaningful names for local consistency
  • 29. Knowledge will be inconsistent Because of: • Homonyms • Different ontological models • migration from legacy data • integration of multiple sources • ….
  • 30. Inconsistency through migration DICE terminology, in daily use at Amsterdam Medical Centre for registration of Intensive Care patients • Brain  CentralNervousSystem • Brain  BodyPart • CentralNervousSystem  NervousSystem • BodyPart  NervousSystem
  • 31. Inconsistency through automated learning • Reservoir  Lake • Lake  WaterRegion • Reservoir  HydrographicStructure • HydrographicStrure  Facility • Disjoint(WaterRegion, Facility), 100% expert agreement on this disjointness…. Inconsistency through merging SUMO(1000) + CYC(1.6M) → 6000 inconsistencies…
  • 33. Symbols as words Waterregion basin Lake Reservoir H. structure Facility Google Distance (symbols as words!)
  • 34. Reservoir  Lake Lake  WaterRegion Reservoir  HydrographicStructure HydrographicStrure  Facility Disjoint(WaterRegion, Facility) Google Distance for selection function in local consistency reasoning ISWC08 Formal Semantics says: this isn’t supposed to work!
  • 35. Insight from Empirical Semantics Users shouldn’t stop using meaningful names, Formal semantics should capture their meaning Challenge: What alternative semantic models would capture meaningful names?
  • 36. Challenge for Empirical Semantics: 4. network structures for different predicates Tobias Kuhn Wouter Beek http://ceur-ws.org/Vol-1946/paper-05.pdf
  • 41. Message from Empirical Semantics None of these patterns have any semantic impact (you can’t even detect them under the traditional semantics) Challenge: What alternative semantic models would take such different patterns into account?
  • 43. So what #1 (pragmatic) • We now have larger KB’s than ever before • We now have the instruments to observe and analyse these very large KB’s • We can use these insights for better tools: – query & inference – publish & maintain – visualise & explain – …
  • 44. My secret hope is that this will help us to understand the patterns of knowledge: Not a prescriptive theory of what knowledge should be, But a descriptive theory of what knowledge is actually like So what #2 (pretentious)