SlideShare a Scribd company logo
Creating Knowledge out of Interlinked Data
          JIST 2012 – Page 1                                                      http://lod2.eu




  Improving the Performance of the
 DL-Learner SPARQL Component for
     Semantic Web Applications
                  Didier Cherix, Sebastian Hellmann, Jens Lehmann

                         http://slideshare.net/kurzum


                                                          http://dl-learner.org
                                                             http://lod2.eu


                                                            AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page                                      http://lod2.eu
JIST 2012 – Page 2                                               http://lod2.eu




              Motivation: 2007 - 2012

DL-Learner was developed in parallel to DBpedia at University Leipzig since 2007

DL-Learner is a tool for learning concepts in Description Logics (DLs) from user-
provided examples.

Worked very well for small to medium sized data sets, e.g. Carcinogenesis an other
ML problems from the UCI ML repository

Limit is the capacity of current OWL-DL reasoners

Challenge was (and is) to do reasoning-based, supervized Machine Learning on
the DBpedia Dataset (> 200 Mio triples) or larger datasets
JIST 2012 – Page 3         http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 4                                    http://lod2.eu




 Introduction DL-Learner




                           Very large search space

                           Reasoner instance checks
JIST 2012 – Page 5         http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 6                                             http://lod2.eu




              Introduction DL-Learner
DL-Learner heavily relies on instance checks for machine learning, so the OWL
Reasoner is the bottle neck

Underlying idea:
Only select relevant data for the Machine Learning Problem based on user-given
examples

→ Reduces the amount of triples that have to be given to a reasoner
→ Reduces complexity and size of the OWL schema

Brute-force approach:
Load all data into the OWL Reasoner, then do instance checks
→ infeasible for Dbpedia

Iterative approach (old component):
Iterate over all instances and fetch the data recursively
→ inefficient even with caching
JIST 2012 – Page 7         http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 8         http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 9         http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 10        http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 11        http://lod2.eu




 Introduction DL-Learner
JIST 2012 – Page 12                                              http://lod2.eu




 Introduction DL-Learner




                           Challenge:
   What is the most efficient way to retrieve such a fragment?
JIST 2012 – Page 13                           http://lod2.eu




           Improvements of the New Component

•   Step 1: Indexing the T-Box:
     • Download the OWL Schema and index it in memory
     • either via SPARQL or OWL file
JIST 2012 – Page 14                                             http://lod2.eu




              Improvements of the New Component

 •   Step 2: A-Box Queries




Parameter recursion depth:
Retrieve newly discovered bindings to ?o until a certain depth is reached.
JIST 2012 – Page 15                  http://lod2.eu




           Improvements of the New Component

•   Step 3: Typing the retrieved instances
JIST 2012 – Page 16                                              http://lod2.eu




           Improvements of the New Component

•   Step 4: T-Box Index:
    All “relevant” T-Box information is added via the index to the fragment.
    For each class already in the fragment. all superclasses and their
    equivalentClass axioms are added
JIST 2012 – Page 17       http://lod2.eu




             Benchmarking - Speed

For each class in DBpedia Ontology:
- 30 instances as positives
- 30 negatives from a sister class
JIST 2012 – Page 18                                     http://lod2.eu




 Benchmarking – F-Measure on the training data




             70% of the results for each class
     had an F-measure of 90-100% on the training data
JIST 2012 – Page 19                                              http://lod2.eu




              SPARQL Retrieval Component Impact

•    DL-Learner – http://dl-learner.org
•    DBpedia Navigator
•    Tiger Corpus Navigator
•    AutoSPARQL - http://autosparql.dl-learner.org/
•    HANNE – http://hanne.aksw.org
•    ORE - http://aksw.org/Projects/ORE


    Sebastian Hellmann, Jens Lehmann und Sören Auer:
    Learning of OWL Class Descriptions on Very Large Knowledge Bases
    In: International Journal on Semantic Web and Information Systems, 2009


     Web Applications
     Active Learning → User Interaction and Feedback
JIST 2012 – Page 20                                                 http://lod2.eu




           Future Work

•   Research Paper in Session 4b (tomorrow at 15:10)
    Navigation-induced Knowledge Engineering by Example
•   Caching + more sophisticated options
•   Large scale learning problems


                          http://slideshare.net/kurzum


                                Homepage: http://dl-learner.org
                                Source code:
                                http://sourceforge.net/projects/dl-learner/
JIST 2012 – Page 21                                                               http://lod2.eu




             Example




Sebastian Hellmann, Jens Lehmann, Jörg Unbehauen, Claus Stadler, Thanh Nghia Lam und Markus
Strohmaier: Navigation-induced Knowledge Engineering by Example
In: JIST 2012
JIST 2012 – Page 22                                              http://lod2.eu




          Example




Sebastian Hellmann, Jens Lehmann und Sören Auer:
Learning of OWL Class Descriptions on Very Large Knowledge Bases
In: International Journal on Semantic Web and Information Systems, 2009

More Related Content

PPT
Lomtologies - issues and challenges in maintaining simple LOM-related vocabul...
PPT
Jorum Open Archives Initiative (OAI) Interface
PDF
Closing the Gap: Data Models for Documentary Linguistics
PPT
Jorum: Increasing Access to Institutional e-Learning
PDF
Merging Models with the Epsilon Merging Language - A Decade Later
PPT
Map of the CETIS metadata and digital repository interoperability domain
PDF
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
PDF
Embedding Metadata In Word Processing Documents
Lomtologies - issues and challenges in maintaining simple LOM-related vocabul...
Jorum Open Archives Initiative (OAI) Interface
Closing the Gap: Data Models for Documentary Linguistics
Jorum: Increasing Access to Institutional e-Learning
Merging Models with the Epsilon Merging Language - A Decade Later
Map of the CETIS metadata and digital repository interoperability domain
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
Embedding Metadata In Word Processing Documents

Viewers also liked (7)

PDF
Linked Data in Linguistics for NLP and Web Annotation
ODP
Introduction to LDL 2012
PPTX
Tool collection as linkeddata
PDF
Lider Reference Model ld4lt session March, 3rd, 2015
ODP
NIF - Version 1.0 - 2011/10/23
PPTX
NLP2RDF Wortschatz and Linguistic LOD draft
ODP
Linked Data for Abbreviations and Segmentation
Linked Data in Linguistics for NLP and Web Annotation
Introduction to LDL 2012
Tool collection as linkeddata
Lider Reference Model ld4lt session March, 3rd, 2015
NIF - Version 1.0 - 2011/10/23
NLP2RDF Wortschatz and Linguistic LOD draft
Linked Data for Abbreviations and Segmentation
Ad

Similar to Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications (20)

ODP
Navigation-induced Knowledge Engineering by Example
PDF
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
PPT
ODP
Integrating NLP using Linked Data
ODP
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
PDF
NoTube: Models & Semantics
PPTX
LOD2 Webinar Series: 3rd relase of the Stack
PDF
LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion
PPT
LOD2 Webinar Series: LOD2 in information and publishing industry
PPT
Learning Outcomes & Learner Achievements Management in Higher Ed
PDF
Linked Open Data Visualization
PDF
Free Webinar: LOD2 Stack - 1st release
PDF
NIF 2.0 draft for Pisa
PDF
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
PDF
Pal gov.tutorial2.session14.lab rdf-dataintegration
PDF
Pal gov.tutorial2.session16.lab rd-fa
PDF
Pal gov.tutorial2.session8.lab owl
PDF
LOD2 Webinar Series: Zemanta / Open refine
PDF
Milleks meile õpitehnoloogia standardid?
Navigation-induced Knowledge Engineering by Example
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
Integrating NLP using Linked Data
NIF 2.0 Tutorial: Content Analysis and the Semantic Web
NoTube: Models & Semantics
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion
LOD2 Webinar Series: LOD2 in information and publishing industry
Learning Outcomes & Learner Achievements Management in Higher Ed
Linked Open Data Visualization
Free Webinar: LOD2 Stack - 1st release
NIF 2.0 draft for Pisa
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
Pal gov.tutorial2.session14.lab rdf-dataintegration
Pal gov.tutorial2.session16.lab rd-fa
Pal gov.tutorial2.session8.lab owl
LOD2 Webinar Series: Zemanta / Open refine
Milleks meile õpitehnoloogia standardid?
Ad

More from Sebastian Hellmann (8)

PDF
KEDL DBpedia 2019
PDF
Linguistic Linked Open Data, Challenges, Approaches, Future Work
PDF
DBpedia/association Introduction The Hague 12.2.2016
PDF
LD4LT Roadmap session 19_02_2015
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
ODP
NIF 2.0 Phd thesis intermediate report
ODP
Thesis presentation
PDF
NIF - NLP Interchange Format
KEDL DBpedia 2019
Linguistic Linked Open Data, Challenges, Approaches, Future Work
DBpedia/association Introduction The Hague 12.2.2016
LD4LT Roadmap session 19_02_2015
DBpedia: A Public Data Infrastructure for the Web of Data
NIF 2.0 Phd thesis intermediate report
Thesis presentation
NIF - NLP Interchange Format

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
A Presentation on Artificial Intelligence
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
A Presentation on Artificial Intelligence
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Per capita expenditure prediction using model stacking based on satellite ima...

Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

  • 1. Creating Knowledge out of Interlinked Data JIST 2012 – Page 1 http://lod2.eu Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications Didier Cherix, Sebastian Hellmann, Jens Lehmann http://slideshare.net/kurzum http://dl-learner.org http://lod2.eu AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
  • 2. JIST 2012 – Page 2 http://lod2.eu Motivation: 2007 - 2012 DL-Learner was developed in parallel to DBpedia at University Leipzig since 2007 DL-Learner is a tool for learning concepts in Description Logics (DLs) from user- provided examples. Worked very well for small to medium sized data sets, e.g. Carcinogenesis an other ML problems from the UCI ML repository Limit is the capacity of current OWL-DL reasoners Challenge was (and is) to do reasoning-based, supervized Machine Learning on the DBpedia Dataset (> 200 Mio triples) or larger datasets
  • 3. JIST 2012 – Page 3 http://lod2.eu Introduction DL-Learner
  • 4. JIST 2012 – Page 4 http://lod2.eu Introduction DL-Learner Very large search space Reasoner instance checks
  • 5. JIST 2012 – Page 5 http://lod2.eu Introduction DL-Learner
  • 6. JIST 2012 – Page 6 http://lod2.eu Introduction DL-Learner DL-Learner heavily relies on instance checks for machine learning, so the OWL Reasoner is the bottle neck Underlying idea: Only select relevant data for the Machine Learning Problem based on user-given examples → Reduces the amount of triples that have to be given to a reasoner → Reduces complexity and size of the OWL schema Brute-force approach: Load all data into the OWL Reasoner, then do instance checks → infeasible for Dbpedia Iterative approach (old component): Iterate over all instances and fetch the data recursively → inefficient even with caching
  • 7. JIST 2012 – Page 7 http://lod2.eu Introduction DL-Learner
  • 8. JIST 2012 – Page 8 http://lod2.eu Introduction DL-Learner
  • 9. JIST 2012 – Page 9 http://lod2.eu Introduction DL-Learner
  • 10. JIST 2012 – Page 10 http://lod2.eu Introduction DL-Learner
  • 11. JIST 2012 – Page 11 http://lod2.eu Introduction DL-Learner
  • 12. JIST 2012 – Page 12 http://lod2.eu Introduction DL-Learner Challenge: What is the most efficient way to retrieve such a fragment?
  • 13. JIST 2012 – Page 13 http://lod2.eu Improvements of the New Component • Step 1: Indexing the T-Box: • Download the OWL Schema and index it in memory • either via SPARQL or OWL file
  • 14. JIST 2012 – Page 14 http://lod2.eu Improvements of the New Component • Step 2: A-Box Queries Parameter recursion depth: Retrieve newly discovered bindings to ?o until a certain depth is reached.
  • 15. JIST 2012 – Page 15 http://lod2.eu Improvements of the New Component • Step 3: Typing the retrieved instances
  • 16. JIST 2012 – Page 16 http://lod2.eu Improvements of the New Component • Step 4: T-Box Index: All “relevant” T-Box information is added via the index to the fragment. For each class already in the fragment. all superclasses and their equivalentClass axioms are added
  • 17. JIST 2012 – Page 17 http://lod2.eu Benchmarking - Speed For each class in DBpedia Ontology: - 30 instances as positives - 30 negatives from a sister class
  • 18. JIST 2012 – Page 18 http://lod2.eu Benchmarking – F-Measure on the training data 70% of the results for each class had an F-measure of 90-100% on the training data
  • 19. JIST 2012 – Page 19 http://lod2.eu SPARQL Retrieval Component Impact • DL-Learner – http://dl-learner.org • DBpedia Navigator • Tiger Corpus Navigator • AutoSPARQL - http://autosparql.dl-learner.org/ • HANNE – http://hanne.aksw.org • ORE - http://aksw.org/Projects/ORE Sebastian Hellmann, Jens Lehmann und Sören Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases In: International Journal on Semantic Web and Information Systems, 2009 Web Applications Active Learning → User Interaction and Feedback
  • 20. JIST 2012 – Page 20 http://lod2.eu Future Work • Research Paper in Session 4b (tomorrow at 15:10) Navigation-induced Knowledge Engineering by Example • Caching + more sophisticated options • Large scale learning problems http://slideshare.net/kurzum Homepage: http://dl-learner.org Source code: http://sourceforge.net/projects/dl-learner/
  • 21. JIST 2012 – Page 21 http://lod2.eu Example Sebastian Hellmann, Jens Lehmann, Jörg Unbehauen, Claus Stadler, Thanh Nghia Lam und Markus Strohmaier: Navigation-induced Knowledge Engineering by Example In: JIST 2012
  • 22. JIST 2012 – Page 22 http://lod2.eu Example Sebastian Hellmann, Jens Lehmann und Sören Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases In: International Journal on Semantic Web and Information Systems, 2009