SlideShare a Scribd company logo
Linguistic Linked Open Data
LLOD
Challenges, Approaches, Future Work
Sebastian Hellmann
TKE 2016
1
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
AKSW / KILT in Leipzig
Leipzig has become one of the largest Semantic Web centers
AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html
Current position:
- Head of AKSW / KILT research group (8 PhD students)
- Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html
- Project manager for 2 H2020 and 1 German research project (BMWi)
- http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/
- Executive Director of the DBpedia Association http://dbpedia.org
2
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Outline
● The vision behind Linked Data - a technological introduction
● Linguistic Linked Open Data
● Knowledge Modelling vs. Data Encoding
● LIDER
● Challenges and Approaches
3
Linked Data
4
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Web of Data
WWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph
Data on the Web vs. the Web of Data vs. the Semantic Web
RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen
Three ways to publish RDF:
1. Linked Data: resource-level access via HTTP request (next slide)
2. SPARQL: query access via triplestore database
3. Dump: dataset-level access via bulk download
5
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linked Data
Four rules of https://www.w3.org/DesignIssues/LinkedData
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
https://en.wikipedia.org/wiki/Copenhagen vs.
http://dbpedia.org/resource/Copenhagen
Source: https://www.w3.org/DesignIssues/LinkedData.html
6
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Open Data != Open Data
Open Access vs Open License
Open Access means accessible like a web page (often unclear license)
http://opendefinition.org by OKFN:
“Knowledge is open if anyone is free to access, use, modify, and share it —
subject, at most, to measures that preserve provenance and openness.”
7
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
8
http://lod-cloud.net/
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
How is the Linked Data Cloud built?
9
- Open Access as the basis
- 50 links between things required to receive
a dataset link
- http://lov.okfn.org
- http://datahub.io
- Assessing Quantity and Quality of Links Between Linked Data Datasets by Cir
Sebastian Hellmann, Kay Müller, and Martin Brümmer in LDOW 2016 http://ev
org/ldow2016/papers/LDOW2016_paper_09.pdf
Linguistic Linked Open Data
10
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linguistic Linked Open Data
● Movement originated in the context of the Working Group for Open Data in
Linguistics (OWLG) at Open Knowledge Foundation (OKFN)
● Open is supposed to mean Open license
● Join community mailing list at http://linguistics.okfn.org/
● Current information at http://linguistic-lod.org/
maintained by John McCrae
-> Instructions on how to join the LLOD cloud
11
January 2011
12
13
February 2012
Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.
com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and
Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)
August 2012
14
Sept 2012
MLODE
15
Special Issue on Multilingual Linked Open Data (MLOD)
Editors: Sebastian Hellmann, Steven Moran, Martin Brümm
and John McCrae,
Semantic Web, vol. 6, no. 4, pp. 315-317, 2015
Jan 2013
16
Sep 2013
17
LIDER FP7 EU Project
Start: Nov 2013
Duration: 2 years
http://lider-project.eu/
May 2014
18
LIDER FP7 EU Project
Start: Nov 2013
Duration: 2 years
http://lider-project.eu/
Nov 2014
19
LIDER FP7 EU Project
Start: Nov 2013
Duration: 2 years
http://lider-project.eu/
May 2015
20
LIDER FP7 EU Project
Start: Nov 2013
Duration: 2 years
http://lider-project.eu/
May 2016
21
LIDER FP7 EU Project
Start: Nov 2013
Duration: 2 years
http://lider-project.eu/
22
Should we all use Linked Data?
23
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Should we all use Linked Data?
When should we use linked data?
How should we use linked data?
When should we not use it?
24
Knowledge Modeling vs. Data Encoding
25
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Entity Relationship Diagrams and UML
26
The Metadata Ecosystem of the
DataId Ontology, Markus
Freudenberg, submitted to MTSR
Conf 2016
http://dataid.dbpedia.org
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
27
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
28
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
<same> should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation
Apples and oranges
29
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure
mean and what they are used for?
30
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure
mean and what they are used for?
31
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0
http://www.w3.org/TR/its20/
● W3C Recommendation since 29 October 2013
● defines how to embed Machine Translation and Localisation
annotations, so called Data Categories, in (X)HTML and XML
● In addition to the human-readable document two ontologies are referenced
that capture the semantics of the standard.
● ITS Ontology as companion
● NLP Interchange Format (NIF) is the recommended format for RDF
conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-
core
32
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0
33
One of the most efficient and robust ways to annotate HTML in a standardized manner
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NLP Interchange Format 2.0 (old example)
34
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NLP Interchange Format 2.0 (old example)
35
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NIF 2.1 release pending
Join W3C Community Group: https://www.w3.org/community/ld4lt/
NIF useful for:
● Adding semantics to NLP tool output and corpora
● Providing and publishing identifiers for text and annotations
NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ):
● Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per
page) with a zipped size of 180 GB.
● 533 million triples (other formats 7-27% more)
● 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup)
36
LIDER
Towards a linguistic linked data ecosystem
37
Website: http://lider-project.eu
Guidelines: http://lider-project.eu/?q=guidelines
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NIF
38
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
LIDER - Deliverable 2.1.2
39
http://www.lider-project.eu/sites/default/files/D2.1.2-Phase-II.pdf
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
LIDER Reference Architecture Deliverable 3.1.2.
General:
lemon - developed by
40
http://www.lider-project.
eu/sites/default/files/D3.1.2-v2.0.pdf
Challenges and Work in Progress
41
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Identifier management
- Ideal identifiers are stable, i.e. the meaning behind the URI does not change
- Unrealistic for most use cases
- Easier for individuals, i.e. persons, organisations
- Non-trivial for terminology
Proposals:
1. Apply software development practices, i.e. versioning, update scripts http:
//vocol.org , http://github.org , http://aligned-project.eu
2. ??
42
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Knowledge Fusion
- Linking is mostly done manual
- Linking 200 datasets pairwise requires maintenance of 40000 mappings
- Adding one after the other depends on the merge order
- Ideally we would be able to structure all datasets into clusters before linking
Proposals:
1. Under discussion with: Erhard Rahm - The Case for Holistic Data Integration
ADBIS 2016 Keynote: http://adbis2016.vsb.cz/keynote/ (to appear)
2. Apply software development processes: https://github.com/dbpedia/links
43
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
The Metadata Challenge
Where to publish metadata for your data?
- Barrier between data and dataset description
- Stale metadata
- Single point of truth missing
- Metadata too heterogeneous
- Download link missing
- No (sufficiently) complete view over the web of data possible, discovery failure
Proposals:
1. build an index: http://linghub.lider-project.eu/ (Clarin, LRE Map, Metashare, Datahub)
2. create a better schema: http://dataid.dbpedia.org and provide benefits for complying
44
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
MMoOn
- LIDER
- Lemon
- ODRL
- Olia
- NIF
- Morphology quite complex
- Specific to language and to the
linguist
- http://mmoon.org
45
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
The Metadata Challenge 2
● RDF structure is too simple to keep additional metadata
○ Scope
○ Validity
○ Confidence
○ Technical metadata, i.e. collection time
Contextualisation is probably already better researched in lexicography than in Semantic Web.
46
Future work and take home messages
47
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
● Data Quality can be defined and measure with the tools.
● http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf Test-driven
Evaluation of Linked Data Quality by Dimitris Kontokostas, Patrick Westphal,
Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and
Amrapali J. Zaveri in Proceedings of the 23rd International Conference on
World Wide Web
● Current standard:
○ https://www.w3.org/TR/shacl/
Data quality and verification
48
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Open licenses in research
49
Are you willing to publish
your data under an open
license?
Can you make a product
out of your data?
No
Yes
Start
Congratulations, your paper
has been accepted
Yes
Good luck, we wish you all
the best and a high profit
No
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Entity Linking Verification - new translator job profile
● http://www.freme-project.eu/
● Business Case: Integrating semantic enrichment into multilingual content in
translation and localisation
● In the future, translators and lexicographers
might be asked to judge entity linking and
verify data
50
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Should I invest in publishing linked data?
Long-term data strategy, if you:
● Have many expected
inbound links
● Persistent ids
● Long term hosting and curation
Is no problem for you
-> yes (data value increases)
One time thing:
● Interest of externals only in the yellow zone
-> Publish under open license (let someone else do it)
51
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
DBpedia Association
DBpedia+
● Maintain identifier space
● Add open and member data to DBpedia+
● Add data following the LIDER guidelines
● Ability to add your backlinks
DBpedia Community meeting on the 15th of September in Leipzig
52
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Events in 2016
● KEKI 2016 Workshop - Uses of Linguistic Linked Open Data http://keki2016.
linguistic-lod.org/ Deadline is 1st of July, but might be extended
● http://2016.semantics.cc
53
Thank you
hellmann@informatik.uni-leipzig.de
54

More Related Content

PPT
PDF
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
PPTX
LOD2 Webinar Series: 3rd relase of the Stack
ODP
Lod2 review meeting
PPTX
Automated interpretability of linked data ontologies: an evaluation within th...
PDF
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series: 3rd relase of the Stack
Lod2 review meeting
Automated interpretability of linked data ontologies: an evaluation within th...
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)

What's hot (20)

PDF
DBpedia Tutorial - Feb 2015, Dublin
PPTX
The Semantic Data Web, Sören Auer, University of Leipzig
PDF
Swib12 workshop lod_beginners
PDF
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
ODP
Linked Data for Abbreviations and Segmentation
PDF
KEDL DBpedia 2019
PDF
PDF
Industry Ontologies: Case Studies in Creating and Extending Schema.org
PDF
Adoption of the Linked Data Best Practices in Different Topical Domains
PDF
Redlink, The Data Linking API
PDF
Linked data tooling XML
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
PPT
LOD2 Webinar Series: D2R and Sparqlify
PDF
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
PDF
Haystack 2018 apache_tika-eval_tallison
PDF
Lider Reference Model ld4lt session March, 3rd, 2015
PPTX
Publishing "5 star" data: the case for RDF
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia Tutorial - Feb 2015, Dublin
The Semantic Data Web, Sören Auer, University of Leipzig
Swib12 workshop lod_beginners
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Linked Data for Abbreviations and Segmentation
KEDL DBpedia 2019
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Adoption of the Linked Data Best Practices in Different Topical Domains
Redlink, The Data Linking API
Linked data tooling XML
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
Haystack 2018 apache_tika-eval_tallison
Lider Reference Model ld4lt session March, 3rd, 2015
Publishing "5 star" data: the case for RDF
DBpedia: A Public Data Infrastructure for the Web of Data
Ad

Similar to Linguistic Linked Open Data, Challenges, Approaches, Future Work (20)

PDF
CLARIAH Toogdag 2018: A distributed network of digital heritage information
PPT
Uk discovery-jisc-project-showcase
PPT
Microservices in LoCloud
ODP
Integrating NLP using Linked Data
ODP
Incubating Apache Linda (ApacheCon Europe 2012)
PDF
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
PDF
20170501 Distributed Network of Digital Heritage Information
PPTX
Medical Heritage Library (MHL) on ArchiveSpark
PDF
Web Data Engineering - A Technical Perspective on Web Archives
PDF
Present and future of unified, portable and efficient data processing with Ap...
PPTX
GLENNA: The Nordic cloud
PDF
Accelerator Programming Using Directives 8th International Workshop Waccpd 20...
ODP
Linking Open Data
PDF
Local content in a Europeana cloud for small & medium content providers
PPT
Putting the L in front: from Open Data to Linked Open Data
PDF
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
PDF
Towards a Linked Data Publishing Methodology
PDF
Present and future of unified, portable, and efficient data processing with A...
PDF
Model Based Software and Data Integration Communications in Computer and Info...
PPTX
Scaling up Linked Data
CLARIAH Toogdag 2018: A distributed network of digital heritage information
Uk discovery-jisc-project-showcase
Microservices in LoCloud
Integrating NLP using Linked Data
Incubating Apache Linda (ApacheCon Europe 2012)
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
20170501 Distributed Network of Digital Heritage Information
Medical Heritage Library (MHL) on ArchiveSpark
Web Data Engineering - A Technical Perspective on Web Archives
Present and future of unified, portable and efficient data processing with Ap...
GLENNA: The Nordic cloud
Accelerator Programming Using Directives 8th International Workshop Waccpd 20...
Linking Open Data
Local content in a Europeana cloud for small & medium content providers
Putting the L in front: from Open Data to Linked Open Data
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
Towards a Linked Data Publishing Methodology
Present and future of unified, portable, and efficient data processing with A...
Model Based Software and Data Integration Communications in Computer and Info...
Scaling up Linked Data
Ad

More from Sebastian Hellmann (14)

PDF
DBpedia/association Introduction The Hague 12.2.2016
PDF
LD4LT Roadmap session 19_02_2015
ODP
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
ODP
NIF 2.0 Phd thesis intermediate report
ODP
Navigation-induced Knowledge Engineering by Example
ODP
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
PDF
NIF 2.0 draft for Pisa
PDF
Linked Data in Linguistics for NLP and Web Annotation
ODP
Introduction to LDL 2012
ODP
Thesis presentation
ODP
NIF - Version 1.0 - 2011/10/23
PDF
NIF - NLP Interchange Format
PPTX
Tool collection as linkeddata
PPTX
NLP2RDF Wortschatz and Linguistic LOD draft
DBpedia/association Introduction The Hague 12.2.2016
LD4LT Roadmap session 19_02_2015
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NIF 2.0 Phd thesis intermediate report
Navigation-induced Knowledge Engineering by Example
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
NIF 2.0 draft for Pisa
Linked Data in Linguistics for NLP and Web Annotation
Introduction to LDL 2012
Thesis presentation
NIF - Version 1.0 - 2011/10/23
NIF - NLP Interchange Format
Tool collection as linkeddata
NLP2RDF Wortschatz and Linguistic LOD draft

Recently uploaded (20)

PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
artificial intelligence overview of it and more
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PPTX
E -tech empowerment technologies PowerPoint
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
Testing WebRTC applications at scale.pdf
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
introduction about ICD -10 & ICD-11 ppt.pptx
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Cloud-Scale Log Monitoring _ Datadog.pdf
artificial intelligence overview of it and more
INTERNET------BASICS-------UPDATED PPT PRESENTATION
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
E -tech empowerment technologies PowerPoint
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Design_with_Watersergyerge45hrbgre4top (1).ppt
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Testing WebRTC applications at scale.pdf
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
international classification of diseases ICD-10 review PPT.pptx
SAP Ariba Sourcing PPT for learning material
Unit-1 introduction to cyber security discuss about how to secure a system
522797556-Unit-2-Temperature-measurement-1-1.pptx
PptxGenJS_Demo_Chart_20250317130215833.pptx
WebRTC in SignalWire - troubleshooting media negotiation
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
RPKI Status Update, presented by Makito Lay at IDNOG 10

Linguistic Linked Open Data, Challenges, Approaches, Future Work

  • 1. Linguistic Linked Open Data LLOD Challenges, Approaches, Future Work Sebastian Hellmann TKE 2016 1
  • 2. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 AKSW / KILT in Leipzig Leipzig has become one of the largest Semantic Web centers AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html Current position: - Head of AKSW / KILT research group (8 PhD students) - Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html - Project manager for 2 H2020 and 1 German research project (BMWi) - http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/ - Executive Director of the DBpedia Association http://dbpedia.org 2
  • 3. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Outline ● The vision behind Linked Data - a technological introduction ● Linguistic Linked Open Data ● Knowledge Modelling vs. Data Encoding ● LIDER ● Challenges and Approaches 3
  • 5. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Web of Data WWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph Data on the Web vs. the Web of Data vs. the Semantic Web RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen Three ways to publish RDF: 1. Linked Data: resource-level access via HTTP request (next slide) 2. SPARQL: query access via triplestore database 3. Dump: dataset-level access via bulk download 5
  • 6. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Linked Data Four rules of https://www.w3.org/DesignIssues/LinkedData 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. https://en.wikipedia.org/wiki/Copenhagen vs. http://dbpedia.org/resource/Copenhagen Source: https://www.w3.org/DesignIssues/LinkedData.html 6
  • 7. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Open Data != Open Data Open Access vs Open License Open Access means accessible like a web page (often unclear license) http://opendefinition.org by OKFN: “Knowledge is open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness.” 7
  • 8. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 8 http://lod-cloud.net/
  • 9. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 How is the Linked Data Cloud built? 9 - Open Access as the basis - 50 links between things required to receive a dataset link - http://lov.okfn.org - http://datahub.io - Assessing Quantity and Quality of Links Between Linked Data Datasets by Cir Sebastian Hellmann, Kay Müller, and Martin Brümmer in LDOW 2016 http://ev org/ldow2016/papers/LDOW2016_paper_09.pdf
  • 11. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Linguistic Linked Open Data ● Movement originated in the context of the Working Group for Open Data in Linguistics (OWLG) at Open Knowledge Foundation (OKFN) ● Open is supposed to mean Open license ● Join community mailing list at http://linguistics.okfn.org/ ● Current information at http://linguistic-lod.org/ maintained by John McCrae -> Instructions on how to join the LLOD cloud 11
  • 13. 13 February 2012 Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer. com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)
  • 15. Sept 2012 MLODE 15 Special Issue on Multilingual Linked Open Data (MLOD) Editors: Sebastian Hellmann, Steven Moran, Martin Brümm and John McCrae, Semantic Web, vol. 6, no. 4, pp. 315-317, 2015
  • 17. Sep 2013 17 LIDER FP7 EU Project Start: Nov 2013 Duration: 2 years http://lider-project.eu/
  • 18. May 2014 18 LIDER FP7 EU Project Start: Nov 2013 Duration: 2 years http://lider-project.eu/
  • 19. Nov 2014 19 LIDER FP7 EU Project Start: Nov 2013 Duration: 2 years http://lider-project.eu/
  • 20. May 2015 20 LIDER FP7 EU Project Start: Nov 2013 Duration: 2 years http://lider-project.eu/
  • 21. May 2016 21 LIDER FP7 EU Project Start: Nov 2013 Duration: 2 years http://lider-project.eu/
  • 22. 22
  • 23. Should we all use Linked Data? 23
  • 24. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Should we all use Linked Data? When should we use linked data? How should we use linked data? When should we not use it? 24
  • 25. Knowledge Modeling vs. Data Encoding 25
  • 26. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Entity Relationship Diagrams and UML 26 The Metadata Ecosystem of the DataId Ontology, Markus Freudenberg, submitted to MTSR Conf 2016 http://dataid.dbpedia.org
  • 27. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 XML encoding variants 27
  • 28. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 XML encoding variants 28
  • 29. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 XML encoding variants <same> should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation Apples and oranges 29
  • 30. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Who can you ask what XML tags and structure mean and what they are used for? 30
  • 31. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Who can you ask what XML tags and structure mean and what they are used for? 31
  • 32. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Internationalization Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ ● W3C Recommendation since 29 October 2013 ● defines how to embed Machine Translation and Localisation annotations, so called Data Categories, in (X)HTML and XML ● In addition to the human-readable document two ontologies are referenced that capture the semantics of the standard. ● ITS Ontology as companion ● NLP Interchange Format (NIF) is the recommended format for RDF conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif- core 32
  • 33. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Internationalization Tag Set (ITS) 2.0 33 One of the most efficient and robust ways to annotate HTML in a standardized manner
  • 34. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 NLP Interchange Format 2.0 (old example) 34
  • 35. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 NLP Interchange Format 2.0 (old example) 35
  • 36. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 NIF 2.1 release pending Join W3C Community Group: https://www.w3.org/community/ld4lt/ NIF useful for: ● Adding semantics to NLP tool output and corpora ● Providing and publishing identifiers for text and annotations NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ): ● Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per page) with a zipped size of 180 GB. ● 533 million triples (other formats 7-27% more) ● 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup) 36
  • 37. LIDER Towards a linguistic linked data ecosystem 37 Website: http://lider-project.eu Guidelines: http://lider-project.eu/?q=guidelines
  • 38. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 NIF 38
  • 39. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 LIDER - Deliverable 2.1.2 39 http://www.lider-project.eu/sites/default/files/D2.1.2-Phase-II.pdf
  • 40. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 LIDER Reference Architecture Deliverable 3.1.2. General: lemon - developed by 40 http://www.lider-project. eu/sites/default/files/D3.1.2-v2.0.pdf
  • 41. Challenges and Work in Progress 41
  • 42. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Identifier management - Ideal identifiers are stable, i.e. the meaning behind the URI does not change - Unrealistic for most use cases - Easier for individuals, i.e. persons, organisations - Non-trivial for terminology Proposals: 1. Apply software development practices, i.e. versioning, update scripts http: //vocol.org , http://github.org , http://aligned-project.eu 2. ?? 42
  • 43. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Knowledge Fusion - Linking is mostly done manual - Linking 200 datasets pairwise requires maintenance of 40000 mappings - Adding one after the other depends on the merge order - Ideally we would be able to structure all datasets into clusters before linking Proposals: 1. Under discussion with: Erhard Rahm - The Case for Holistic Data Integration ADBIS 2016 Keynote: http://adbis2016.vsb.cz/keynote/ (to appear) 2. Apply software development processes: https://github.com/dbpedia/links 43
  • 44. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 The Metadata Challenge Where to publish metadata for your data? - Barrier between data and dataset description - Stale metadata - Single point of truth missing - Metadata too heterogeneous - Download link missing - No (sufficiently) complete view over the web of data possible, discovery failure Proposals: 1. build an index: http://linghub.lider-project.eu/ (Clarin, LRE Map, Metashare, Datahub) 2. create a better schema: http://dataid.dbpedia.org and provide benefits for complying 44
  • 45. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 MMoOn - LIDER - Lemon - ODRL - Olia - NIF - Morphology quite complex - Specific to language and to the linguist - http://mmoon.org 45
  • 46. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 The Metadata Challenge 2 ● RDF structure is too simple to keep additional metadata ○ Scope ○ Validity ○ Confidence ○ Technical metadata, i.e. collection time Contextualisation is probably already better researched in lexicography than in Semantic Web. 46
  • 47. Future work and take home messages 47
  • 48. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 ● Data Quality can be defined and measure with the tools. ● http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf Test-driven Evaluation of Linked Data Quality by Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali J. Zaveri in Proceedings of the 23rd International Conference on World Wide Web ● Current standard: ○ https://www.w3.org/TR/shacl/ Data quality and verification 48
  • 49. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Open licenses in research 49 Are you willing to publish your data under an open license? Can you make a product out of your data? No Yes Start Congratulations, your paper has been accepted Yes Good luck, we wish you all the best and a high profit No
  • 50. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Entity Linking Verification - new translator job profile ● http://www.freme-project.eu/ ● Business Case: Integrating semantic enrichment into multilingual content in translation and localisation ● In the future, translators and lexicographers might be asked to judge entity linking and verify data 50
  • 51. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Should I invest in publishing linked data? Long-term data strategy, if you: ● Have many expected inbound links ● Persistent ids ● Long term hosting and curation Is no problem for you -> yes (data value increases) One time thing: ● Interest of externals only in the yellow zone -> Publish under open license (let someone else do it) 51
  • 52. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 DBpedia Association DBpedia+ ● Maintain identifier space ● Add open and member data to DBpedia+ ● Add data following the LIDER guidelines ● Ability to add your backlinks DBpedia Community meeting on the 15th of September in Leipzig 52
  • 53. Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 Events in 2016 ● KEKI 2016 Workshop - Uses of Linguistic Linked Open Data http://keki2016. linguistic-lod.org/ Deadline is 1st of July, but might be extended ● http://2016.semantics.cc 53