SlideShare a Scribd company logo
Linked Data in Linguistics
Representing and Connecting Language Data and Language Metadata
              Sebastian Hellmann, Christian Chiarcos, Sebastian Nordhoff


        34th Annual Meeting of the German Linguistic Society (DGfS), AG 2
        Frankfurt/M., Germany, March 7th – 9th, 2012




 If not otherwise noted,
 content is cc-by
Overview


 Technological Background (SH)
 Linked Open Data and Collaborative Research (SH)
 Linked Data for Linguistics (CC)
 Building a Linguistic Linked Open Data Cloud
      Prospects of Linked Data in Linguistics (CC)
      Annotated Corpora (CC)
      Lexical-Semantic Resource (SH)
      Linguistic Databases (SN)
 What to Expect from LDL-2012
March 7th, 2012        Linked Data in Linguistics 2012   2
From Excel to RDF and Linked Data




March 7th, 2012    Linked Data in Linguistics 2012   3
From Excel to RDF and Linked Data


 A data collection about sailing ships:




                             Source http://en.wikipedia.org/wiki/File:Bounty_modified_photo.jpg
March 7th, 2012       Linked Data in Linguistics 2012                                       4
From Excel to RDF and Linked Data


 Add the Gorch Fock




                  Source http://en.wikipedia.org/wiki/File:Gorch_Fock_unter_Segeln_Kieler_Foerde_2006.jpg
March 7th, 2012                    Linked Data in Linguistics 2012                                    5
From Excel to RDF and Linked Data


 Add the auxiliary propulsion of the Gorch Fock




 The field is now irregular




March 7th, 2012       Linked Data in Linguistics 2012   6
From Excel to RDF and Linked Data


 A first empty field is introduced




March 7th, 2012        Linked Data in Linguistics 2012   7
From Excel to RDF and Linked Data

 Entity Attribute Value, data represented in triples




March 7th, 2012       Linked Data in Linguistics 2012   8
From Excel to RDF and Linked Data

 XML does also not produce sparsity or anomalies, but
  what about:


 1. Automatically infer rows (reduces size)
 2. Check consistency (not validity)
 3. Merge two tables (not only syntactically, but
   semantically)
 4. Enrich with external data (also retrieve updates)
 5. Query


March 7th, 2012       Linked Data in Linguistics 2012   9
From Excel to RDF and Linked Data

 XML does also not produce sparsity or anomalies, but
  what about:


 1. Automatically infer rows (reduces size)
 2. Check consistency (not validity)
 3. Merge two tables (not only syntactically, but
   semantically)
 4. Enrich with external data (also retrieve updates)
 5. Query


March 7th, 2012       Linked Data in Linguistics 2012   10
From Excel to RDF and Linked Data

 Description Logic (DL) is a family of formal knowledge
  representation languages
 fragments of first order logic
 usually decidable inference problems
 Well researched complexity
 Basis for the Web Ontology Language (OWL)
 Reasoner implementations available


Franz Baader, Ian Horrocks, and Ulrike Sattler Chapter 3 Description Logics. In Frank van Harmelen,
Vladimir Lifschitz, and Bruce Porter, editors, Handbook of Knowledge Representation. Elsevier, 2007.

March 7th, 2012                      Linked Data in Linguistics 2012                              11
From Excel to RDF and Linked Data

 Description Logic inference




March 7th, 2012      Linked Data in Linguistics 2012   12
From Excel to RDF and Linked Data

 Description Logic constraints




 Possible to detect inconsistencies, i.e. Gorch Fock must
  not be a Sailingship
March 7th, 2012      Linked Data in Linguistics 2012    13
From Excel to RDF and Linked Data

 XML does also not produce sparsity or anomalies, but
  what about:


 1. Automatically infer rows (reduces size)
 2. Check consistency (not validity)
 3. Merge two tables (not only syntactically, but
   semantically)
 4. Enrich with external data (also retrieve updates)
 5. Query


March 7th, 2012       Linked Data in Linguistics 2012   14
Uniform Resource Identifiers (URIs)

Agree on a common vocabulary and names for entities
On the schema level, coherence of properties and types
 is required for data integration
URIs allow for globally unique identifiers:
                          “Gorch Fock”
                                     vs.
          http://en.wikipedia.org/wiki/Gorch_Fock_(1958)
                                     vs.
         http://dbpedia.org/resource/Gorch_Fock_(1958)
                  dbpedia:Gorch_Fock_(1958)
March 7th, 2012          Linked Data in Linguistics 2012   15
From Excel to RDF and Linked Data

 Last table before we get more technical



                                                       4 Types
                                                       of Object




March 7th, 2012      Linked Data in Linguistics 2012               16
From Excel to RDF and Linked Data



                           owl:sameAs                 dbpedia:Gorch_Fock              owl:sameAs
      my:Gorch_                                             _(1958)
        Fock

                                                                                      Other
                                                                                      datasets
                     my:owner


               my:German
                 _Navy                                dbpedia:German_N
                                owl:sameAs                   avy
dbprop:shipLength

                                                                          More data
“81.2”^^xsd:double



   March 7th, 2012                      Linked Data in Linguistics 2012                            17
RDF and OWL - recap

 RDF – Resource Description Framework
      Entity Attribute Value + URIs
      Triples
      Shared Vocabularies
      Graphs
 OWL – Web Ontology Language
      Based on Description Logic and extends RDF
      OWL-DL Reasoning
      Consistency checks
 Both are W3C standards
March 7th, 2012         Linked Data in Linguistics 2012   18
Syntax training
 Presenters will probably show you some code during the
   next days
 On the next slide you will see some syntax examples




March 7th, 2012      Linked Data in Linguistics 2012   19
Serialization: Turtle and XML




March 7th, 2012      Linked Data in Linguistics 2012   20
Serialization: Turtle and XML




March 7th, 2012      Linked Data in Linguistics 2012   21
SPARQL

 Ability to merge data and query it using the W3C
  standard SPARQL (SPARQL Protocol and Query
  Language)
 SPARQL is the SQL of the Semantic Web
     SELECT ?ship WHERE {
       ?ship      rdf:type             my:SailingShip .
       ?ship      my:propulsion        ?engine .
       ?engine my:fuelType             my:Diesel .
       ?ship      dbprop:shipLength ?length .
       Filter (xsd:double (?length) >= 80.0 )
     }


March 7th, 2012        Linked Data in Linguistics 2012    22
Linked Data




March 7th, 2012   Linked Data in Linguistics 2012   23
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   24
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   25
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   26
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   27
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   28
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   29
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   30
Linked Open Data cloud




March 7th, 2012    Linked Data in Linguistics 2012   31
Linked Open Data cloud

   Image of a table with some data




  March 7th, 2012             Linked Data in Linguistics 2012   32
Source http://lod-cloud.net
4 Rules of Linked Data



 Use URIs as names for things
 Use HTTP URIs so that people can look up those
  names.
 When someone looks up a URI, provide useful
  information, using the standards (RDF*, SPARQL)
 Include links to other URIs. so that they can discover
   more things.



                                 http://www.w3.org/DesignIssues/LinkedData.html

March 7th, 2012       Linked Data in Linguistics 2012                      33
Linked Data - Content Negotiation

 Different views for different data consumers: Browser




March 7th, 2012      Linked Data in Linguistics 2012     34
Linked Data - Content Negotiation

 Different views for different data consumers:
   Applications




March 7th, 2012       Linked Data in Linguistics 2012   35
Linked Data

 A dataset is a set of RDF triples that is published,
   maintained or aggregated by a single provider.




 An RDF link is an RDF triple whose subject and object
  are described in different datasets
 A linkset is a collection of such RDF links between two

March 7th, 2012       Linked Data in Linguistics 2012      36
March 7th, 2012   Linked Data in Linguistics 2012   37
Why going for the fifth star?


                                                               Central Contractor
                                                               Registration (CCR)




                                                                           Geonames




        Source: http://webofdata.wordpress.com/2011/05/22/why-we-link/
March 7th, 2012                  Linked Data in Linguistics 2012                      38
Open Licence allow republishing and reuse




 Motivation for collaboration:
       High potential that invested efforts can be reused, i.e.
         data, links, vocabularies, schemas
       (Effortful) feedback: Users complement data, extend
         vocabularies and contribute changes. VoCamps for
         achieving coherence.

 Source: Chiarcos, Hellmann, Nordhoff, Towards a Linguistic Linked Open Data cloud: The
 Open Linguistics Working Group, Traitement Automatique des Langues, to appear
March 7th, 2012                Linked Data in Linguistics 2012                      39
Example DBpedia
 Data is extracted from Wikipedia
 Wikipedia just publishes the
 unstructured data
 Small DBpedia team creates RDF
 Community of stakeholders clean
 the data and create links

                           Estimate:
                     10-20% to consolidate
                        community effort



March 7th, 2012       Linked Data in Linguistics 2012   40
Scalability

 Golden Hammer Anti-Pattern
 Adequacy




March 7th, 2012    Linked Data in Linguistics 2012   41
Linked Data for Linguistics




March 7th, 2012        Linked Data in Linguistics 2012   42
Linked Data for Linguistics

 Representation and modelling
 Structural interoperability
 Integrating distributed resources
 Conceptual interoperability
 Dynamic Import




March 7th, 2012        Linked Data in Linguistics 2012   43
Representation and modelling
                    Structural interoperability




March 7th, 2012          Linked Data in Linguistics 2012   44
Representation and modelling

 Different linguistic subcommunities have developed
   representation standards, e.g.,
      LMF: Lexical Markup Framework (Francopoulo et al. 2009)
            lexical-semantic resources
      GrAF: Graph Annotation Framework (Ide and Suderman 2007)
            for annotated corpora
      based on labelled directed acyclic graphs (feature structures)
 RDF data model: labelled directed (multi-)graphs
      Uniform formalism for different resource types
      Sublanguages (e.g., RDFS, OWL) allow to define domain-
        specific vocabularies

March 7th, 2012               Linked Data in Linguistics 2012          45
Structural interoperability

 With different language resources represented in RDF,
  we can combine both sources of information freely
      cross-resource queries with RDF query languages (e.g.,
        SPARQL)
      Given a corpus with WordNet sense annotations (e.g., the
        Manually Annotated Sub-Corpus MASC) (Ide et al. 2010)
            “Retrieve all sentences that describe locations”
            i.e., sentences containing a token annotated with a
               WordNet sense that is a hyponym of “location”
 Difficult to realize with GrAF or LMF



March 7th, 2012              Linked Data in Linguistics 2012      46
Integrating distributed resources

 SPARQL supports nested subqueries to run on different
  repositories




 No physical integration of resources in a single data
  base required
      Easy to link to centralized repositories of reference
        terminology, etc.

March 7th, 2012           Linked Data in Linguistics 2012     47
Conceptual interoperability

 Resources should specify which vocabulary (e.g., for
  annotation) they use and how it is defined
      By reference to community-maintained terminology
       repositories, e.g.,
            GOLD (Farrar and Langendoen 2010)
            ISOcat (Windhouwer and Wright @ LDL-2012)
      Can be used, e.g., for disambiguation
            If a lexeme in a lexicon has a certain morphosyntactic
               categorization, we can retrieve all sentences from a
               corpus with corresponding annotations
                  e.g., land as a noun, but not as a verb



March 7th, 2012                    Linked Data in Linguistics 2012    48
Dynamic import

 Linking resources implemented with URIs, which can be
   resolved on-the-fly to update and enrich data sets
      For a token in a corpus, additional information can be
        aggregated from different repositories by resolving
        links (retrieving senses from a lexical-semantic
        repository or concepts from a terminology
        repository)
 If the information in the target resource was updated
    since the original annotation was performed, then the
    updates are available at query time
 Inconsistencies can be avoided through versioning

March 7th, 2012         Linked Data in Linguistics 2012     49
Ecosystem, infrastructure and community

 RDF and related standards are maintained by an active
  and relatively large community
      Different fields of application
            Libraries, GeoData, BioMed, ...
      Established W3C standard and technological
       infrastructure
      Linguistically relevant resources already provided
            lexical-semantic resources (e.g., WordNet)
 RDF facilitates distributed development, re-using data,
  and, indirectly, interdisciplinary cooperation


March 7th, 2012              Linked Data in Linguistics 2012   50
Building a Linguistic Linked Open Data cloud




March 7th, 2012            Linked Data in Linguistics 2012   51
Building a Linguistic Linked Open Data cloud


                                              In LOD cloud
                                                Lexical Semantic
                                                 resources
                                                Linguistic meta data

                                              Further relevant types
                                              for linguistic research:
                                                Annotated corpora
                                                Input and output of NLP
                                                  tools
                                                Linguistic data bases
                                                Repositories of linguistic
                                                  terminology
March 7th, 2012     Linked Data in Linguistics 2012                    52
Building a Linguistic Linked Open Data cloud



 Each single provider has different incentives to use
  Linked Data and/or RDF
 Concepts of RDF and Linked Data have been brought
  up to solve open problems in different subcommunities
  of linguistics and neighboring fields
 As an illustration, we briefly introduce three examples




March 7th, 2012       Linked Data in Linguistics 2012      53
Building a Linguistic Linked Open Data cloud

 Annotated corpora
      Underlying problem: structural and conceptual
       interoperability
 Natural Language Processing for the semantic web
      Underlying problem: NLP output represented in
       idiosyncratic formalisms, results to be represented in
       RDF
 Typological data bases
      Underlying problem: globally unique identifiers (not just for
       “languages”, but for dialects, language families, etc.)


March 7th, 2012           Linked Data in Linguistics 2012        54
Annotated corpora
                  Linked Data and Corpus Interoperability




March 7th, 2012              Linked Data in Linguistics 2012   55
Linked Data and Corpus Interoperability


  Linked Data can be used to address interoperability
     issues of annotated corpora
 Corpus: collection of texts developed to analyze
 language and to develop tools for this purpose
 => Annotated corpora
 Different types of annotations, different communities
 involved, different languages
 => Interoperability challenge


March 7th, 2012         Linked Data in Linguistics 2012   56
Linked Data and Corpus Interoperability


  Linked Data can be used to address interoperability
     issues of annotated corpora
 Corpus: collection of texts developed to analyze
 language and to develop tools for this purpose
 => Annotation
           Structural interoperability Interoperable representation form
 Different types of annotations, different communities
 involved, different languages
                            Conceptual interoperability
                  Reference definitions for linguistic categories and features
 => Interoperability challenge


March 7th, 2012           Linked Data in Linguistics 2012               57
Structural Interoperability
 Analyses produced by different researchers / NLP tools
 use different representation formalisms


                                                         word annotations
                                                            (‘tokens‘)




March 7th, 2012        Linked Data in Linguistics 2012                58
Structural Interoperability
 Analyses produced by different researchers / NLP tools
 use different representation formalisms


                                                         word annotations
                                                            (‘tokens‘)

                                                         span annotations
                                                           (‘markables‘)




March 7th, 2012        Linked Data in Linguistics 2012                59
Structural Interoperability
 Analyses produced by different researchers / NLP tools
 use different representation formalisms


                                                         word annotations
                                                            (‘tokens‘)

                                                         span annotations
                                                           (‘markables‘)

                                                              tree-like
                                                            annotations




March 7th, 2012        Linked Data in Linguistics 2012                60
Structural Interoperability
 Analyses produced by different researchers / NLP tools
 use different representation formalisms




                                                          relational
                                                         annotations

March 7th, 2012        Linked Data in Linguistics 2012             61
Structural Interoperability
 Analyses produced by different researchers / NLP tools
 use different representation formalisms




March 7th, 2012        Linked Data in Linguistics 2012   62
Structural Interoperability
State-of-the art approaches
Graph-based data model
Represent data in standoff XML
                         (Ide and Suderman 2007, Chiarcos et al. 2008, Eckart et al. @ LDL)




Presentation of Nancy Ide @ LDL 2012




March 7th, 2012        Linked Data in Linguistics 2012                                   63
XML standoff

 MASC corpus, GrAF format




March 7th, 2012    Linked Data in Linguistics 2012   64
Working with XML standoff

How to store, retrieve and query XML standoff data
 efficiently ?
      Direct use with XML data bases inefficient                          (Eckart 2008)

      Inline XML                                               (e.g., Dipper et al. 2007)

      Relational DB formats                                    (e.g., Eckart et al. @ LDL)

RDF as another possibility                                (e.g., Chiarcos 2012)


      Databases are optimized for graph querying
      Extensive (open source) infrastructure available
      Conceptual interoperability
      Integration with Linked Data resources
March 7th, 2012         Linked Data in Linguistics 2012                                      65
Corpus Interoperability with RDF

Structural Interoperability
      e.g. POWLA - http://purl.org/powla
            lossless transformation to RDF from standoff XML
      Linking to lexical-semantic resources (WordNet)
Conceptual Interoperability
      Cross-Linking to terminology repositories (OLiA,
        GOLD, ISOcat)
      Entity-Linking to metadata (Geodata, LOD cloud)




March 7th, 2012              Linked Data in Linguistics 2012   66
Natural Language Processing Interchange Format
                                      NIF




March 7th, 2012           Linked Data in Linguistics 2012   67
NLP Interchange Format (NIF)

 NIF is an RDF/OWL-based format


 Achieve interoperability for:
      Output of NLP tools
      Linguistic data in RDF
      Text documents
      Web of Data (LOD cloud)




March 7th, 2012        Linked Data in Linguistics 2012   68
March 7th, 2012   Linked Data in Linguistics 2012   69
A Transparent Formalization of Text for
Machines




March 7th, 2012      Linked Data in Linguistics 2012   70
A Transparent Formalization of Text for
Machines


                  Intransparent for machines




March 7th, 2012        Linked Data in Linguistics 2012   71
A Transparent Formalization of Text for
Machines
   Universe of discourse is defined as the words
   over the alphabet of Unicode characters


              URI
  http://example.org/sample                           “The city Berlin is the capital
          #offset_0_42                                        of Germany.”




March 7th, 2012               Linked Data in Linguistics 2012                           72
NLP Interchange Format

 Specification for NIF 1.0 (http://nlp2rdf.org/nif-1-0/)
 different implementations (alpha/beta) are available as
   Open Source (UIMA, Gate Annie, Stanford Parser,
   DBpedia Spotlight)
 Mailing list available at http://nlp2rdf.org
 Demo: http://nlp2rdf.lod2.eu/demo.php

 Poster during the poster session
 Thursday 13:00-14:30


March 7th, 2012         Linked Data in Linguistics 2012    73
Typological databases
                     Glottolog/Langdoc




March 7th, 2012       Linked Data in Linguistics 2012   74
Glottolog/Langdoc

 Two subprojects
      Glottolog provides identifiers and additional
       information for 100k languoids (languages, dialects,
       families)
            main competitor projects:
                  ISO 639-3/Ethnologue
                  Multitree
      Langdoc provides identifiers and additional
        information for 180k references
            main competitor project: OLAC



March 7th, 2012                 Linked Data in Linguistics 2012   75
Problems to address
      existing identifiers are not granular enough (ISO
        636-3: 7k)
      existing identifiers have unclear reference (multitree
        altc refers to both Micro-Altaic and Macro-Altaic)
      existing identifiers have no verifiable empirical basis
 Solutions
      21k identifiers for main tree
      total of 104k identifiers for all nodes of multitree trees


March 7th, 2012          Linked Data in Linguistics 2012           76
RDF

 gl o t t o l o g : 1 2 345 gl : s u b l an g u o i d gl o t t o l o g : 41 2 02 .
 gl o t t o l o g : 1 2 345 gl : s u p e r l an g u o i d gl o t t o l o g : 9421 1 .




March 7th, 2012                           Linked Data in Linguistics 2012               77
Langdoc

 180k references to literature treating (mostly) lesser-
   known languages
 annotated for language, document type, macro-area
 limited full text indexing
  “give me any grammar or grammar sketch from an Afro-
    Asiatic language spoken in Eurasia where the word
    'dual' occurs in the text”




March 7th, 2012         Linked Data in Linguistics 2012    78
RDF

 gl o t t o l o g : 1 2 345 gl : i mme d i at e l yd e s c r i b e d i n l an g d o c : 2 345 6 .




March 7th, 2012                                Linked Data in Linguistics 2012                      79
Position of G/L in the LLOD cloud




March 7th, 2012     Linked Data in Linguistics 2012   80
Availability

 XHTML: http://glottolog.livingsources.org
 RDF: http://glottolog.livingsources.org/sparql




March 7th, 2012       Linked Data in Linguistics 2012   81
Outlook




                         Outlook




March 7th, 2012   Linked Data in Linguistics 2012   82
From OWLG to DGfS


 The Open Knowledge Foundation Working Group for
   Open Data in Linguistics (OKFN-OWLG) was founded
   in late 2010
 We first established a series of meetings and a mailing
  list
  Build the structure, create momentum
  Two workshops: OKCon 2011 in Berlin, this workshop


 This afternoon: Christian Kreutz presents the OKFN


March 7th, 2012      Linked Data in Linguistics 2012       83
Building the Linguistic Linked Data Cloud




March 7th, 2012      Linked Data in Linguistics 2012   84
This workshop

 Exploratory workshop
 Chart domains as to the amount and kind of data which
  can be integrated into the LLOD-cloud
 increase coverage
      more domains
 increase density
      more links between resources
 increase discussion between independent
   subcommunities


March 7th, 2012       Linked Data in Linguistics 2012    85
This workshop




March 7th, 2012   Linked Data in Linguistics 2012   86
Spread the word
 http://linguistics.okfn.org/
 open-linguistics@lists.okfn.org
 poster at DGfS-CL session on Thursday
 start this workshop: first talk:
                         Declerck et al.
           “Towards Linked Language Data (LLD) for Digital
                            Humanities”




March 7th, 2012           Linked Data in Linguistics 2012    87
We would like to thank



 MPI
 Springer
 LOD2




March 7th, 2012     Linked Data in Linguistics 2012   88

More Related Content

PPTX
Jarrar: SPARQL - RDF Query Language
PPTX
CSHALS 2010 W3C Semanic Web Tutorial
PPT
Dublin Core In Practice
PDF
Property graph vs. RDF Triplestore comparison in 2020
PDF
Seminario Cristian Lai, 06-09-2012
PPT
Dublin Core Intro
ODP
Dublin core Presentation
PDF
Debunking some “RDF vs. Property Graph” Alternative Facts
Jarrar: SPARQL - RDF Query Language
CSHALS 2010 W3C Semanic Web Tutorial
Dublin Core In Practice
Property graph vs. RDF Triplestore comparison in 2020
Seminario Cristian Lai, 06-09-2012
Dublin Core Intro
Dublin core Presentation
Debunking some “RDF vs. Property Graph” Alternative Facts

What's hot (19)

PDF
Distributed Query Processing for Federated RDF Data Management
PPT
Ist16-04 An introduction to RDF
PPTX
LDL 2012 - Linking to ISOcat Data Categories
PPT
Understanding RDF: the Resource Description Framework in Context (1999)
PPT
Everything you wanted to know about Dublin Core metadata
PPTX
The Dublin Core 1:1 Principle in the Age of Linked Data
PPTX
Use of ISOcat within CMDI
PDF
Find your way in Graph labyrinths
PDF
Scaling the (evolving) web data –at low cost-
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
PPTX
Resource description framework
PPTX
Jarrar: RDF Stores -Challenges and Solutions
PDF
Efficient Query Answering against Dynamic RDF Databases
PDF
LDQL: A Query Language for the Web of Linked Data
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
PPTX
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
PPT
Querying the Semantic Web with SPARQL
PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
PPT
Jpl presentation
Distributed Query Processing for Federated RDF Data Management
Ist16-04 An introduction to RDF
LDL 2012 - Linking to ISOcat Data Categories
Understanding RDF: the Resource Description Framework in Context (1999)
Everything you wanted to know about Dublin Core metadata
The Dublin Core 1:1 Principle in the Age of Linked Data
Use of ISOcat within CMDI
Find your way in Graph labyrinths
Scaling the (evolving) web data –at low cost-
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Resource description framework
Jarrar: RDF Stores -Challenges and Solutions
Efficient Query Answering against Dynamic RDF Databases
LDQL: A Query Language for the Web of Linked Data
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Querying the Semantic Web with SPARQL
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Jpl presentation
Ad

Similar to Introduction to LDL 2012 (20)

PDF
Linked Open Data
PDF
20110728 datalift-rpi-troy
PDF
Linguistic Linked Open Data, Challenges, Approaches, Future Work
PPTX
Building a Linked Open Data Set
PDF
Linked Data Management
PPTX
Omitola birmingham cityuniv
PDF
The state of the art in Linked Data
PPSX
Linked Data to Improve the OER Experience
ODP
NIF 2.0 Phd thesis intermediate report
PPTX
It19 20140721 linked data personal perspective
PDF
Tue acosta tut_providing_linkeddata
PDF
2018 GIS in Development: Semantic Web
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PPT
Towards Linked Ontologies and Data on the Semantic Web
PPTX
RDF, linked data and semantic web
PPTX
Linked data HHS 2015
PPTX
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
PDF
Semantic web assignment1
PDF
From Linked Data to Semantic Applications
PDF
FAIR data: LOUD for all audiences
Linked Open Data
20110728 datalift-rpi-troy
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Building a Linked Open Data Set
Linked Data Management
Omitola birmingham cityuniv
The state of the art in Linked Data
Linked Data to Improve the OER Experience
NIF 2.0 Phd thesis intermediate report
It19 20140721 linked data personal perspective
Tue acosta tut_providing_linkeddata
2018 GIS in Development: Semantic Web
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Towards Linked Ontologies and Data on the Semantic Web
RDF, linked data and semantic web
Linked data HHS 2015
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
Semantic web assignment1
From Linked Data to Semantic Applications
FAIR data: LOUD for all audiences
Ad

More from Sebastian Hellmann (17)

PDF
KEDL DBpedia 2019
PDF
DBpedia/association Introduction The Hague 12.2.2016
PDF
Lider Reference Model ld4lt session March, 3rd, 2015
PDF
LD4LT Roadmap session 19_02_2015
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
ODP
Integrating NLP using Linked Data
ODP
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
ODP
Linked Data for Abbreviations and Segmentation
ODP
Navigation-induced Knowledge Engineering by Example
ODP
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
PDF
NIF 2.0 draft for Pisa
PDF
Linked Data in Linguistics for NLP and Web Annotation
ODP
Thesis presentation
ODP
NIF - Version 1.0 - 2011/10/23
PDF
NIF - NLP Interchange Format
PPTX
Tool collection as linkeddata
PPTX
NLP2RDF Wortschatz and Linguistic LOD draft
KEDL DBpedia 2019
DBpedia/association Introduction The Hague 12.2.2016
Lider Reference Model ld4lt session March, 3rd, 2015
LD4LT Roadmap session 19_02_2015
DBpedia: A Public Data Infrastructure for the Web of Data
Integrating NLP using Linked Data
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
Linked Data for Abbreviations and Segmentation
Navigation-induced Knowledge Engineering by Example
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
NIF 2.0 draft for Pisa
Linked Data in Linguistics for NLP and Web Annotation
Thesis presentation
NIF - Version 1.0 - 2011/10/23
NIF - NLP Interchange Format
Tool collection as linkeddata
NLP2RDF Wortschatz and Linguistic LOD draft

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Introduction to LDL 2012

  • 1. Linked Data in Linguistics Representing and Connecting Language Data and Language Metadata Sebastian Hellmann, Christian Chiarcos, Sebastian Nordhoff 34th Annual Meeting of the German Linguistic Society (DGfS), AG 2 Frankfurt/M., Germany, March 7th – 9th, 2012 If not otherwise noted, content is cc-by
  • 2. Overview Technological Background (SH) Linked Open Data and Collaborative Research (SH) Linked Data for Linguistics (CC) Building a Linguistic Linked Open Data Cloud Prospects of Linked Data in Linguistics (CC) Annotated Corpora (CC) Lexical-Semantic Resource (SH) Linguistic Databases (SN) What to Expect from LDL-2012 March 7th, 2012 Linked Data in Linguistics 2012 2
  • 3. From Excel to RDF and Linked Data March 7th, 2012 Linked Data in Linguistics 2012 3
  • 4. From Excel to RDF and Linked Data A data collection about sailing ships: Source http://en.wikipedia.org/wiki/File:Bounty_modified_photo.jpg March 7th, 2012 Linked Data in Linguistics 2012 4
  • 5. From Excel to RDF and Linked Data Add the Gorch Fock Source http://en.wikipedia.org/wiki/File:Gorch_Fock_unter_Segeln_Kieler_Foerde_2006.jpg March 7th, 2012 Linked Data in Linguistics 2012 5
  • 6. From Excel to RDF and Linked Data Add the auxiliary propulsion of the Gorch Fock The field is now irregular March 7th, 2012 Linked Data in Linguistics 2012 6
  • 7. From Excel to RDF and Linked Data A first empty field is introduced March 7th, 2012 Linked Data in Linguistics 2012 7
  • 8. From Excel to RDF and Linked Data Entity Attribute Value, data represented in triples March 7th, 2012 Linked Data in Linguistics 2012 8
  • 9. From Excel to RDF and Linked Data XML does also not produce sparsity or anomalies, but what about: 1. Automatically infer rows (reduces size) 2. Check consistency (not validity) 3. Merge two tables (not only syntactically, but semantically) 4. Enrich with external data (also retrieve updates) 5. Query March 7th, 2012 Linked Data in Linguistics 2012 9
  • 10. From Excel to RDF and Linked Data XML does also not produce sparsity or anomalies, but what about: 1. Automatically infer rows (reduces size) 2. Check consistency (not validity) 3. Merge two tables (not only syntactically, but semantically) 4. Enrich with external data (also retrieve updates) 5. Query March 7th, 2012 Linked Data in Linguistics 2012 10
  • 11. From Excel to RDF and Linked Data Description Logic (DL) is a family of formal knowledge representation languages fragments of first order logic usually decidable inference problems Well researched complexity Basis for the Web Ontology Language (OWL) Reasoner implementations available Franz Baader, Ian Horrocks, and Ulrike Sattler Chapter 3 Description Logics. In Frank van Harmelen, Vladimir Lifschitz, and Bruce Porter, editors, Handbook of Knowledge Representation. Elsevier, 2007. March 7th, 2012 Linked Data in Linguistics 2012 11
  • 12. From Excel to RDF and Linked Data Description Logic inference March 7th, 2012 Linked Data in Linguistics 2012 12
  • 13. From Excel to RDF and Linked Data Description Logic constraints Possible to detect inconsistencies, i.e. Gorch Fock must not be a Sailingship March 7th, 2012 Linked Data in Linguistics 2012 13
  • 14. From Excel to RDF and Linked Data XML does also not produce sparsity or anomalies, but what about: 1. Automatically infer rows (reduces size) 2. Check consistency (not validity) 3. Merge two tables (not only syntactically, but semantically) 4. Enrich with external data (also retrieve updates) 5. Query March 7th, 2012 Linked Data in Linguistics 2012 14
  • 15. Uniform Resource Identifiers (URIs) Agree on a common vocabulary and names for entities On the schema level, coherence of properties and types is required for data integration URIs allow for globally unique identifiers: “Gorch Fock” vs. http://en.wikipedia.org/wiki/Gorch_Fock_(1958) vs. http://dbpedia.org/resource/Gorch_Fock_(1958) dbpedia:Gorch_Fock_(1958) March 7th, 2012 Linked Data in Linguistics 2012 15
  • 16. From Excel to RDF and Linked Data Last table before we get more technical 4 Types of Object March 7th, 2012 Linked Data in Linguistics 2012 16
  • 17. From Excel to RDF and Linked Data owl:sameAs dbpedia:Gorch_Fock owl:sameAs my:Gorch_ _(1958) Fock Other datasets my:owner my:German _Navy dbpedia:German_N owl:sameAs avy dbprop:shipLength More data “81.2”^^xsd:double March 7th, 2012 Linked Data in Linguistics 2012 17
  • 18. RDF and OWL - recap RDF – Resource Description Framework Entity Attribute Value + URIs Triples Shared Vocabularies Graphs OWL – Web Ontology Language Based on Description Logic and extends RDF OWL-DL Reasoning Consistency checks Both are W3C standards March 7th, 2012 Linked Data in Linguistics 2012 18
  • 19. Syntax training Presenters will probably show you some code during the next days On the next slide you will see some syntax examples March 7th, 2012 Linked Data in Linguistics 2012 19
  • 20. Serialization: Turtle and XML March 7th, 2012 Linked Data in Linguistics 2012 20
  • 21. Serialization: Turtle and XML March 7th, 2012 Linked Data in Linguistics 2012 21
  • 22. SPARQL Ability to merge data and query it using the W3C standard SPARQL (SPARQL Protocol and Query Language) SPARQL is the SQL of the Semantic Web SELECT ?ship WHERE { ?ship rdf:type my:SailingShip . ?ship my:propulsion ?engine . ?engine my:fuelType my:Diesel . ?ship dbprop:shipLength ?length . Filter (xsd:double (?length) >= 80.0 ) } March 7th, 2012 Linked Data in Linguistics 2012 22
  • 23. Linked Data March 7th, 2012 Linked Data in Linguistics 2012 23
  • 24. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 24
  • 25. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 25
  • 26. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 26
  • 27. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 27
  • 28. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 28
  • 29. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 29
  • 30. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 30
  • 31. Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 31
  • 32. Linked Open Data cloud Image of a table with some data March 7th, 2012 Linked Data in Linguistics 2012 32 Source http://lod-cloud.net
  • 33. 4 Rules of Linked Data Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html March 7th, 2012 Linked Data in Linguistics 2012 33
  • 34. Linked Data - Content Negotiation Different views for different data consumers: Browser March 7th, 2012 Linked Data in Linguistics 2012 34
  • 35. Linked Data - Content Negotiation Different views for different data consumers: Applications March 7th, 2012 Linked Data in Linguistics 2012 35
  • 36. Linked Data A dataset is a set of RDF triples that is published, maintained or aggregated by a single provider. An RDF link is an RDF triple whose subject and object are described in different datasets A linkset is a collection of such RDF links between two March 7th, 2012 Linked Data in Linguistics 2012 36
  • 37. March 7th, 2012 Linked Data in Linguistics 2012 37
  • 38. Why going for the fifth star? Central Contractor Registration (CCR) Geonames Source: http://webofdata.wordpress.com/2011/05/22/why-we-link/ March 7th, 2012 Linked Data in Linguistics 2012 38
  • 39. Open Licence allow republishing and reuse Motivation for collaboration: High potential that invested efforts can be reused, i.e. data, links, vocabularies, schemas (Effortful) feedback: Users complement data, extend vocabularies and contribute changes. VoCamps for achieving coherence. Source: Chiarcos, Hellmann, Nordhoff, Towards a Linguistic Linked Open Data cloud: The Open Linguistics Working Group, Traitement Automatique des Langues, to appear March 7th, 2012 Linked Data in Linguistics 2012 39
  • 40. Example DBpedia Data is extracted from Wikipedia Wikipedia just publishes the unstructured data Small DBpedia team creates RDF Community of stakeholders clean the data and create links Estimate: 10-20% to consolidate community effort March 7th, 2012 Linked Data in Linguistics 2012 40
  • 41. Scalability Golden Hammer Anti-Pattern Adequacy March 7th, 2012 Linked Data in Linguistics 2012 41
  • 42. Linked Data for Linguistics March 7th, 2012 Linked Data in Linguistics 2012 42
  • 43. Linked Data for Linguistics Representation and modelling Structural interoperability Integrating distributed resources Conceptual interoperability Dynamic Import March 7th, 2012 Linked Data in Linguistics 2012 43
  • 44. Representation and modelling Structural interoperability March 7th, 2012 Linked Data in Linguistics 2012 44
  • 45. Representation and modelling Different linguistic subcommunities have developed representation standards, e.g., LMF: Lexical Markup Framework (Francopoulo et al. 2009) lexical-semantic resources GrAF: Graph Annotation Framework (Ide and Suderman 2007) for annotated corpora based on labelled directed acyclic graphs (feature structures) RDF data model: labelled directed (multi-)graphs Uniform formalism for different resource types Sublanguages (e.g., RDFS, OWL) allow to define domain- specific vocabularies March 7th, 2012 Linked Data in Linguistics 2012 45
  • 46. Structural interoperability With different language resources represented in RDF, we can combine both sources of information freely cross-resource queries with RDF query languages (e.g., SPARQL) Given a corpus with WordNet sense annotations (e.g., the Manually Annotated Sub-Corpus MASC) (Ide et al. 2010) “Retrieve all sentences that describe locations” i.e., sentences containing a token annotated with a WordNet sense that is a hyponym of “location” Difficult to realize with GrAF or LMF March 7th, 2012 Linked Data in Linguistics 2012 46
  • 47. Integrating distributed resources SPARQL supports nested subqueries to run on different repositories No physical integration of resources in a single data base required Easy to link to centralized repositories of reference terminology, etc. March 7th, 2012 Linked Data in Linguistics 2012 47
  • 48. Conceptual interoperability Resources should specify which vocabulary (e.g., for annotation) they use and how it is defined By reference to community-maintained terminology repositories, e.g., GOLD (Farrar and Langendoen 2010) ISOcat (Windhouwer and Wright @ LDL-2012) Can be used, e.g., for disambiguation If a lexeme in a lexicon has a certain morphosyntactic categorization, we can retrieve all sentences from a corpus with corresponding annotations e.g., land as a noun, but not as a verb March 7th, 2012 Linked Data in Linguistics 2012 48
  • 49. Dynamic import Linking resources implemented with URIs, which can be resolved on-the-fly to update and enrich data sets For a token in a corpus, additional information can be aggregated from different repositories by resolving links (retrieving senses from a lexical-semantic repository or concepts from a terminology repository) If the information in the target resource was updated since the original annotation was performed, then the updates are available at query time Inconsistencies can be avoided through versioning March 7th, 2012 Linked Data in Linguistics 2012 49
  • 50. Ecosystem, infrastructure and community RDF and related standards are maintained by an active and relatively large community Different fields of application Libraries, GeoData, BioMed, ... Established W3C standard and technological infrastructure Linguistically relevant resources already provided lexical-semantic resources (e.g., WordNet) RDF facilitates distributed development, re-using data, and, indirectly, interdisciplinary cooperation March 7th, 2012 Linked Data in Linguistics 2012 50
  • 51. Building a Linguistic Linked Open Data cloud March 7th, 2012 Linked Data in Linguistics 2012 51
  • 52. Building a Linguistic Linked Open Data cloud In LOD cloud Lexical Semantic resources Linguistic meta data Further relevant types for linguistic research: Annotated corpora Input and output of NLP tools Linguistic data bases Repositories of linguistic terminology March 7th, 2012 Linked Data in Linguistics 2012 52
  • 53. Building a Linguistic Linked Open Data cloud Each single provider has different incentives to use Linked Data and/or RDF Concepts of RDF and Linked Data have been brought up to solve open problems in different subcommunities of linguistics and neighboring fields As an illustration, we briefly introduce three examples March 7th, 2012 Linked Data in Linguistics 2012 53
  • 54. Building a Linguistic Linked Open Data cloud Annotated corpora Underlying problem: structural and conceptual interoperability Natural Language Processing for the semantic web Underlying problem: NLP output represented in idiosyncratic formalisms, results to be represented in RDF Typological data bases Underlying problem: globally unique identifiers (not just for “languages”, but for dialects, language families, etc.) March 7th, 2012 Linked Data in Linguistics 2012 54
  • 55. Annotated corpora Linked Data and Corpus Interoperability March 7th, 2012 Linked Data in Linguistics 2012 55
  • 56. Linked Data and Corpus Interoperability Linked Data can be used to address interoperability issues of annotated corpora Corpus: collection of texts developed to analyze language and to develop tools for this purpose => Annotated corpora Different types of annotations, different communities involved, different languages => Interoperability challenge March 7th, 2012 Linked Data in Linguistics 2012 56
  • 57. Linked Data and Corpus Interoperability Linked Data can be used to address interoperability issues of annotated corpora Corpus: collection of texts developed to analyze language and to develop tools for this purpose => Annotation Structural interoperability Interoperable representation form Different types of annotations, different communities involved, different languages Conceptual interoperability Reference definitions for linguistic categories and features => Interoperability challenge March 7th, 2012 Linked Data in Linguistics 2012 57
  • 58. Structural Interoperability Analyses produced by different researchers / NLP tools use different representation formalisms word annotations (‘tokens‘) March 7th, 2012 Linked Data in Linguistics 2012 58
  • 59. Structural Interoperability Analyses produced by different researchers / NLP tools use different representation formalisms word annotations (‘tokens‘) span annotations (‘markables‘) March 7th, 2012 Linked Data in Linguistics 2012 59
  • 60. Structural Interoperability Analyses produced by different researchers / NLP tools use different representation formalisms word annotations (‘tokens‘) span annotations (‘markables‘) tree-like annotations March 7th, 2012 Linked Data in Linguistics 2012 60
  • 61. Structural Interoperability Analyses produced by different researchers / NLP tools use different representation formalisms relational annotations March 7th, 2012 Linked Data in Linguistics 2012 61
  • 62. Structural Interoperability Analyses produced by different researchers / NLP tools use different representation formalisms March 7th, 2012 Linked Data in Linguistics 2012 62
  • 63. Structural Interoperability State-of-the art approaches Graph-based data model Represent data in standoff XML (Ide and Suderman 2007, Chiarcos et al. 2008, Eckart et al. @ LDL) Presentation of Nancy Ide @ LDL 2012 March 7th, 2012 Linked Data in Linguistics 2012 63
  • 64. XML standoff MASC corpus, GrAF format March 7th, 2012 Linked Data in Linguistics 2012 64
  • 65. Working with XML standoff How to store, retrieve and query XML standoff data efficiently ? Direct use with XML data bases inefficient (Eckart 2008) Inline XML (e.g., Dipper et al. 2007) Relational DB formats (e.g., Eckart et al. @ LDL) RDF as another possibility (e.g., Chiarcos 2012) Databases are optimized for graph querying Extensive (open source) infrastructure available Conceptual interoperability Integration with Linked Data resources March 7th, 2012 Linked Data in Linguistics 2012 65
  • 66. Corpus Interoperability with RDF Structural Interoperability e.g. POWLA - http://purl.org/powla lossless transformation to RDF from standoff XML Linking to lexical-semantic resources (WordNet) Conceptual Interoperability Cross-Linking to terminology repositories (OLiA, GOLD, ISOcat) Entity-Linking to metadata (Geodata, LOD cloud) March 7th, 2012 Linked Data in Linguistics 2012 66
  • 67. Natural Language Processing Interchange Format NIF March 7th, 2012 Linked Data in Linguistics 2012 67
  • 68. NLP Interchange Format (NIF) NIF is an RDF/OWL-based format Achieve interoperability for: Output of NLP tools Linguistic data in RDF Text documents Web of Data (LOD cloud) March 7th, 2012 Linked Data in Linguistics 2012 68
  • 69. March 7th, 2012 Linked Data in Linguistics 2012 69
  • 70. A Transparent Formalization of Text for Machines March 7th, 2012 Linked Data in Linguistics 2012 70
  • 71. A Transparent Formalization of Text for Machines Intransparent for machines March 7th, 2012 Linked Data in Linguistics 2012 71
  • 72. A Transparent Formalization of Text for Machines Universe of discourse is defined as the words over the alphabet of Unicode characters URI http://example.org/sample “The city Berlin is the capital #offset_0_42 of Germany.” March 7th, 2012 Linked Data in Linguistics 2012 72
  • 73. NLP Interchange Format Specification for NIF 1.0 (http://nlp2rdf.org/nif-1-0/) different implementations (alpha/beta) are available as Open Source (UIMA, Gate Annie, Stanford Parser, DBpedia Spotlight) Mailing list available at http://nlp2rdf.org Demo: http://nlp2rdf.lod2.eu/demo.php Poster during the poster session Thursday 13:00-14:30 March 7th, 2012 Linked Data in Linguistics 2012 73
  • 74. Typological databases Glottolog/Langdoc March 7th, 2012 Linked Data in Linguistics 2012 74
  • 75. Glottolog/Langdoc Two subprojects Glottolog provides identifiers and additional information for 100k languoids (languages, dialects, families) main competitor projects: ISO 639-3/Ethnologue Multitree Langdoc provides identifiers and additional information for 180k references main competitor project: OLAC March 7th, 2012 Linked Data in Linguistics 2012 75
  • 76. Problems to address existing identifiers are not granular enough (ISO 636-3: 7k) existing identifiers have unclear reference (multitree altc refers to both Micro-Altaic and Macro-Altaic) existing identifiers have no verifiable empirical basis Solutions 21k identifiers for main tree total of 104k identifiers for all nodes of multitree trees March 7th, 2012 Linked Data in Linguistics 2012 76
  • 77. RDF gl o t t o l o g : 1 2 345 gl : s u b l an g u o i d gl o t t o l o g : 41 2 02 . gl o t t o l o g : 1 2 345 gl : s u p e r l an g u o i d gl o t t o l o g : 9421 1 . March 7th, 2012 Linked Data in Linguistics 2012 77
  • 78. Langdoc 180k references to literature treating (mostly) lesser- known languages annotated for language, document type, macro-area limited full text indexing “give me any grammar or grammar sketch from an Afro- Asiatic language spoken in Eurasia where the word 'dual' occurs in the text” March 7th, 2012 Linked Data in Linguistics 2012 78
  • 79. RDF gl o t t o l o g : 1 2 345 gl : i mme d i at e l yd e s c r i b e d i n l an g d o c : 2 345 6 . March 7th, 2012 Linked Data in Linguistics 2012 79
  • 80. Position of G/L in the LLOD cloud March 7th, 2012 Linked Data in Linguistics 2012 80
  • 81. Availability XHTML: http://glottolog.livingsources.org RDF: http://glottolog.livingsources.org/sparql March 7th, 2012 Linked Data in Linguistics 2012 81
  • 82. Outlook Outlook March 7th, 2012 Linked Data in Linguistics 2012 82
  • 83. From OWLG to DGfS The Open Knowledge Foundation Working Group for Open Data in Linguistics (OKFN-OWLG) was founded in late 2010 We first established a series of meetings and a mailing list Build the structure, create momentum Two workshops: OKCon 2011 in Berlin, this workshop This afternoon: Christian Kreutz presents the OKFN March 7th, 2012 Linked Data in Linguistics 2012 83
  • 84. Building the Linguistic Linked Data Cloud March 7th, 2012 Linked Data in Linguistics 2012 84
  • 85. This workshop Exploratory workshop Chart domains as to the amount and kind of data which can be integrated into the LLOD-cloud increase coverage more domains increase density more links between resources increase discussion between independent subcommunities March 7th, 2012 Linked Data in Linguistics 2012 85
  • 86. This workshop March 7th, 2012 Linked Data in Linguistics 2012 86
  • 87. Spread the word http://linguistics.okfn.org/ open-linguistics@lists.okfn.org poster at DGfS-CL session on Thursday start this workshop: first talk: Declerck et al. “Towards Linked Language Data (LLD) for Digital Humanities” March 7th, 2012 Linked Data in Linguistics 2012 87
  • 88. We would like to thank MPI Springer LOD2 March 7th, 2012 Linked Data in Linguistics 2012 88

Editor's Notes

  • #2: NB: An asterisk in these notes, like this: * Indicates a transition on the slide.
  • #57: @book{mcenery2001corpus, title={Corpus linguistics: an introduction}, author={McEnery, T. and Wilson, A.}, year={2001}, publisher={Edinburgh Univ Pr} } @book{tognini2001corpus, title={Corpus linguistics at work}, author={Tognini-Bonelli, E.}, volume={6}, year={2001}, publisher={John Benjamins Publishing Co} } @inproceedings{brewster2004data, title={Data driven ontology evaluation}, author={Brewster, C. and Alani, H. and Dasmahapatra, S. and Wilks, Y.}, booktitle={Proceedings of LREC}, volume={2004}, year={2004}, organization={Citeseer} } @inproceedings{mahesh1995semantic, title={Semantic classification for practical natural language processing}, author={Mahesh, K. and Nirenburg, S.}, booktitle={Proceedings of the Sixth ASlS SIG/CR Classification Research Workshop: An Interdisciplinary Meeting. Chicago, IL}, year={1995}, organization={Citeseer} }
  • #58: @book{mcenery2001corpus, title={Corpus linguistics: an introduction}, author={McEnery, T. and Wilson, A.}, year={2001}, publisher={Edinburgh Univ Pr} } @book{tognini2001corpus, title={Corpus linguistics at work}, author={Tognini-Bonelli, E.}, volume={6}, year={2001}, publisher={John Benjamins Publishing Co} } @inproceedings{brewster2004data, title={Data driven ontology evaluation}, author={Brewster, C. and Alani, H. and Dasmahapatra, S. and Wilks, Y.}, booktitle={Proceedings of LREC}, volume={2004}, year={2004}, organization={Citeseer} } @inproceedings{mahesh1995semantic, title={Semantic classification for practical natural language processing}, author={Mahesh, K. and Nirenburg, S.}, booktitle={Proceedings of the Sixth ASlS SIG/CR Classification Research Workshop: An Interdisciplinary Meeting. Chicago, IL}, year={1995}, organization={Citeseer} }