SlideShare a Scribd company logo
Creating Knowledge out of Interlinked Data




         NIF – NLP Interchange Format




                                                    Sebastian Hellmann
                                                          AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page                                   http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




           Problem:
            • Currently NLP software is organized in pipelines
            • Integration is done „hard-wired“
               – For each tool and each framework an adapter has to be created
                 (n*m)
            • Difficult to exchange single components




                            2
Open Linguistics@OKCon 30.6.2011   2                                     http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format



           Overview:
            • NLP tools can be integrated via a common output format (Common
              pattern in Enterprise Application Integration)
            • For each tool a wrapper needs to be created, that reads NIF and
              produces NIF
            • The combination of tools can be adhoc, i.e. it is not a pipeline that
              needs to be configured
            • Multi-layer and overlapping annotations are possible
            • Ontologies provide interfaces for each layer and for applications



                            3
Open Linguistics@OKCon 30.6.2011   3                                         http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




      • First Challenge: Representing Strings in RDF
             • How to give a part of a document or text an identifier (URI)?
             • What properties can such URIs have?




                            4
Open Linguistics@OKCon 30.6.2011   4                                           http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




                            5
LOD2 Event . 06.09.2010 . Page   5                              http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format

                                   Example URIs for annotating „Semantic Web“




                            6
Open Linguistics@OKCon 30.6.2011   6                                            http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




      • First Challenge: Representing Strings in RDF
             • How to give a part of a document or text an identifier (URI)?
             • What properties can such URIs have?




                            7
Open Linguistics@OKCon 30.6.2011   7                                           http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




      • URIs are used to integrate output. RDF merges naturally, if the URIs
           are the same (or convertible using a certain recipe)




                            8
Open Linguistics@OKCon 30.6.2011   8                                    http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




      • Second challenge: Output of each layer is required to be stable.
             • Components and layers can be interchanged
             • OLiA provides an ontological interface




                            9
Open Linguistics@OKCon 30.6.2011   9                                       http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




                            10
LOD2 Event . 06.09.2010 . Page   10                             http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




                            11
LOD2 Event . 06.09.2010 . Page   11                             http://lod2.eu
Creating Knowledge out of Interlinked Data




NIF – NLP Interchange Format




                            12
LOD2 Event . 06.09.2010 . Page   12                             http://lod2.eu
Creating Knowledge out of Interlinked Data




Workplan




      • EU Deliverable almost finished
      • Integration of SnowballStemming and the Stanford Parser
      • Next step: Integration of Knowledge Extraction tools (Zemanta,
           DBpedia Spotlight, Alchemy, OpenCalais)

      • Web Service that read NIF and Output NIF
      • Google Code Project: http://code.google.com/p/nlp2rdf/



                            13
Open Linguistics@OKCon 30.6.2011   13                                    http://lod2.eu
Creating Knowledge out of Interlinked Data




Future




      • NIF allows to represent NLP output using Knowledge Representation
           Formalisms (RDF/OWL)

      • It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
      • Good foundation to optimize machine learning:
             • Choose the best algortihms
             • Choose the best data




                            14
Open Linguistics@OKCon 30.6.2011   14                                    http://lod2.eu
Creating Knowledge out of Interlinked Data




Reasons for Open Data

      • Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation
           Extraction from Texts“
            • POS-Tags and Dependency Trees in First-Order-Logic
            • ILP Machine Learning Approach

      • TIDES Extraction (ACE) 2003 Multilingual Training Data
             • closed licence
             • about 3000 US $

      • Barrier for reproduction of results
             • Authors could send me a (p)(r)e-print, but not a copy of the
               benchmarkTM


                            15
Open Linguistics@OKCon 30.6.2011   15                                         http://lod2.eu
Creating Knowledge out of Interlinked Data




         Thank you for your attention!




LOD2 Presentation . 02.09.2010 . Page                     http://lod2.eu

More Related Content

PPT
Lemon at-mlw3
PPTX
Linked Open Data Cloud
PPTX
Corpus Annotation with Linked Open Data
PDF
Semantic Technologies in ST&DL
PDF
Knowledge Patterns for the Web: extraction, transformation, and reuse
PDF
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
PPTX
Sheldon challenge
Lemon at-mlw3
Linked Open Data Cloud
Corpus Annotation with Linked Open Data
Semantic Technologies in ST&DL
Knowledge Patterns for the Web: extraction, transformation, and reuse
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
Sheldon challenge

What's hot (6)

PPT
LOD2 Webinar Series: D2R and Sparqlify
PPTX
LDL 2012 - Linking to ISOcat Data Categories
PPTX
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
ODP
Lod2 review meeting
PPT
Arcomem training entities-and-events_advanced
PPTX
FrameNet development for Latvian
LOD2 Webinar Series: D2R and Sparqlify
LDL 2012 - Linking to ISOcat Data Categories
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
Lod2 review meeting
Arcomem training entities-and-events_advanced
FrameNet development for Latvian
Ad

Similar to NIF - NLP Interchange Format (20)

PDF
LOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIF
PPTX
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
PDF
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
PPTX
Oc wg-nif-20130711
ODP
NIF - Version 1.0 - 2011/10/23
PDF
Linked Data in Linguistics for NLP and Web Annotation
PDF
NIF 2.0 draft for Pisa
ODP
Integrating NLP using Linked Data
ODP
NIF 2.0 Phd thesis intermediate report
PDF
OntoWiki Application Framework & Erfurt API
PDF
LOD2 Webinar Series: Zemanta / Open refine
ODP
Incubating Apache Linda (ApacheCon Europe 2012)
PDF
PPTX
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
PPTX
Freme at feisgiltt 2015 freme & linked data & localisers
PDF
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
PPT
PDF
Free Webinar: LOD2 Stack - 1st release
LOD2: State of Play WP3B - Knowledge Extraction, NLP2RDF + NIF
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
Oc wg-nif-20130711
NIF - Version 1.0 - 2011/10/23
Linked Data in Linguistics for NLP and Web Annotation
NIF 2.0 draft for Pisa
Integrating NLP using Linked Data
NIF 2.0 Phd thesis intermediate report
OntoWiki Application Framework & Erfurt API
LOD2 Webinar Series: Zemanta / Open refine
Incubating Apache Linda (ApacheCon Europe 2012)
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Freme at feisgiltt 2015 freme & linked data & localisers
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
Free Webinar: LOD2 Stack - 1st release
Ad

More from Sebastian Hellmann (14)

PDF
KEDL DBpedia 2019
PDF
Linguistic Linked Open Data, Challenges, Approaches, Future Work
PDF
DBpedia/association Introduction The Hague 12.2.2016
PDF
Lider Reference Model ld4lt session March, 3rd, 2015
PDF
LD4LT Roadmap session 19_02_2015
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
ODP
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
ODP
Linked Data for Abbreviations and Segmentation
ODP
Navigation-induced Knowledge Engineering by Example
ODP
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
ODP
Introduction to LDL 2012
ODP
Thesis presentation
PPTX
Tool collection as linkeddata
PPTX
NLP2RDF Wortschatz and Linguistic LOD draft
KEDL DBpedia 2019
Linguistic Linked Open Data, Challenges, Approaches, Future Work
DBpedia/association Introduction The Hague 12.2.2016
Lider Reference Model ld4lt session March, 3rd, 2015
LD4LT Roadmap session 19_02_2015
DBpedia: A Public Data Infrastructure for the Web of Data
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
Linked Data for Abbreviations and Segmentation
Navigation-induced Knowledge Engineering by Example
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Introduction to LDL 2012
Thesis presentation
Tool collection as linkeddata
NLP2RDF Wortschatz and Linguistic LOD draft

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
gpt5_lecture_notes_comprehensive_20250812015547.pdf

NIF - NLP Interchange Format

  • 1. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Sebastian Hellmann AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  • 2. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Problem: • Currently NLP software is organized in pipelines • Integration is done „hard-wired“ – For each tool and each framework an adapter has to be created (n*m) • Difficult to exchange single components 2 Open Linguistics@OKCon 30.6.2011 2 http://lod2.eu
  • 3. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Overview: • NLP tools can be integrated via a common output format (Common pattern in Enterprise Application Integration) • For each tool a wrapper needs to be created, that reads NIF and produces NIF • The combination of tools can be adhoc, i.e. it is not a pipeline that needs to be configured • Multi-layer and overlapping annotations are possible • Ontologies provide interfaces for each layer and for applications 3 Open Linguistics@OKCon 30.6.2011 3 http://lod2.eu
  • 4. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • First Challenge: Representing Strings in RDF • How to give a part of a document or text an identifier (URI)? • What properties can such URIs have? 4 Open Linguistics@OKCon 30.6.2011 4 http://lod2.eu
  • 5. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 5 LOD2 Event . 06.09.2010 . Page 5 http://lod2.eu
  • 6. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format Example URIs for annotating „Semantic Web“ 6 Open Linguistics@OKCon 30.6.2011 6 http://lod2.eu
  • 7. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • First Challenge: Representing Strings in RDF • How to give a part of a document or text an identifier (URI)? • What properties can such URIs have? 7 Open Linguistics@OKCon 30.6.2011 7 http://lod2.eu
  • 8. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe) 8 Open Linguistics@OKCon 30.6.2011 8 http://lod2.eu
  • 9. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format • Second challenge: Output of each layer is required to be stable. • Components and layers can be interchanged • OLiA provides an ontological interface 9 Open Linguistics@OKCon 30.6.2011 9 http://lod2.eu
  • 10. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 10 LOD2 Event . 06.09.2010 . Page 10 http://lod2.eu
  • 11. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 11 LOD2 Event . 06.09.2010 . Page 11 http://lod2.eu
  • 12. Creating Knowledge out of Interlinked Data NIF – NLP Interchange Format 12 LOD2 Event . 06.09.2010 . Page 12 http://lod2.eu
  • 13. Creating Knowledge out of Interlinked Data Workplan • EU Deliverable almost finished • Integration of SnowballStemming and the Stanford Parser • Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais) • Web Service that read NIF and Output NIF • Google Code Project: http://code.google.com/p/nlp2rdf/ 13 Open Linguistics@OKCon 30.6.2011 13 http://lod2.eu
  • 14. Creating Knowledge out of Interlinked Data Future • NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL) • It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia) • Good foundation to optimize machine learning: • Choose the best algortihms • Choose the best data 14 Open Linguistics@OKCon 30.6.2011 14 http://lod2.eu
  • 15. Creating Knowledge out of Interlinked Data Reasons for Open Data • Horváth et. al. (ILP 2009): „A Logic-Based Approach to Relation Extraction from Texts“ • POS-Tags and Dependency Trees in First-Order-Logic • ILP Machine Learning Approach • TIDES Extraction (ACE) 2003 Multilingual Training Data • closed licence • about 3000 US $ • Barrier for reproduction of results • Authors could send me a (p)(r)e-print, but not a copy of the benchmarkTM 15 Open Linguistics@OKCon 30.6.2011 15 http://lod2.eu
  • 16. Creating Knowledge out of Interlinked Data Thank you for your attention! LOD2 Presentation . 02.09.2010 . Page http://lod2.eu