SlideShare a Scribd company logo
Institute for Web Science & Technologies – WeST



Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012

               Sunday, 11 November 2012
Finding Good URLs: Aligning Entities in
  Knowledge Bases with Public Web
      Document Representations


         Christian Hachenberg and Thomas Gottron
Mapping Documents to Entities




                                   dbpedia.org:Rob_Roy_(film)


Finding Good URLs        Thomas Gottron             WoLE Workshop 2012 2
Mapping Entities to Documents

 dbpedia.org:Rob_Roy_(film)




  Align entities in KB with public
  documents

  • Publish knowledge base
  • Propagate changes
  • Human readable
    representation


Finding Good URLs                    Thomas Gottron   WoLE Workshop 2012 3
Task Definition

                                George Lucas
              type: director



                  dbpedia:George_Lucas




                                      type: movie             ???

dbpedia:Star_Wars_Episode_IV:_A_New_Hope




                           Star Wars IV: A New Hope




                                                    3 types of information:
   dbpedia:Harrison_Ford

                                                    • Labels
    type: actor                                     • Link structure
                      Harrison Ford
                                                    • Types


Finding Good URLs                                    Thomas Gottron           WoLE Workshop 2012 4
Label Search (using Web Search Engine)

                                George Lucas
              type: director                                                             SW4


                  dbpedia:George_Lucas




                                      type: movie

                                                                      SW4
dbpedia:Star_Wars_Episode_IV:_A_New_Hope




                           Star Wars IV: A New Hope




                                                                                     SW4
   dbpedia:Harrison_Ford

                                                    Implementation:
    type: actor
                      Harrison Ford
                                                    • Bing


Finding Good URLs                                    Thomas Gottron         WoLE Workshop 2012 5
Exploiting Link Structure

                                George Lucas
              type: director                                          GL                 SW4


                  dbpedia:George_Lucas




                                      type: movie

                                                                      SW4
dbpedia:Star_Wars_Episode_IV:_A_New_Hope




                           Star Wars IV: A New Hope



                                                                            HF
                                                    Implementation:                  SW4
   dbpedia:Harrison_Ford


                                                    • In-degree
    type: actor
                      Harrison Ford
                                                    • PageRank
                                                    • HITS

Finding Good URLs                                    Thomas Gottron         WoLE Workshop 2012 6
Type Filtering

      Gran Torino         type: movie
                                                                                         SW4
           dbpedia:Gran_Torino_(film)



                                                                               GT

                               type: movie

                                                                      SW4
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
                                                                                           RR


                       Star Wars IV: A New Hope



         Rob Roy        type: movie
                                                                                     SW4

          dbpedia:Rob_Roy_(film)                  Implementation:

                                                  • Borda Count for
                                                    domain ranking

Finding Good URLs                             Thomas Gottron                WoLE Workshop 2012 7
Experimental Setup

 100 Entities
    4 domains (cities, companies, persons, movies)
    Stratified by little, medium and large representation on the
     web
    Complete network of linked entities

 Application of label search and link structure approaches
 Type-filtering as post-process

 User evaluation (Cranfield setup, pooling)
   Graded relevance judgements
   High juror agreement (Krippendorff's Alpha >0.67)


Finding Good URLs          Thomas Gottron          WoLE Workshop 2012 8
Evaluation Metrics




Finding Good URLs    Thomas Gottron   WoLE Workshop 2012 9
Evaluation: Results




                                       Statistically significant , p=0.05

Finding Good URLs     Thomas Gottron          WoLE Workshop 2012 10
Evaluation: Results (Domain, Stratum)




Finding Good URLs         Thomas Gottron   WoLE Workshop 2012 11
Evaluation: Results (Filtering)




Finding Good URLs           Thomas Gottron   WoLE Workshop 2012 12
Conclusions and Next Steps

 Novel task: Mapping entities to public web URLs

    – Evaluated 9 link analysis and web search methods (+1 post-
      processing using Borda counts)

    – Best methods: Label Search and Focussed HITS
       • Semantic Typing boosts all results

 Next steps: Investigate domain-dependent performance of methods




Finding Good URLs           Thomas Gottron          WoLE Workshop 2012 13
Thank you!




Contact:
    WeST – Institute for Web Science and Technologies
    Universität Koblenz-Landau
    gottron@uni-koblenz.de


Finding Good URLs                    Thomas Gottron     WoLE Workshop 2012 14

More Related Content

PPTX
Paper Organization (Progressions)
PDF
E0942531
PDF
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...
PDF
L01246974
PDF
Scalability Enhancement of Push/Pull Server functions by converting Stateless...
PDF
A Video Watermarking Scheme to Hinder Camcorder Piracy
PDF
An Overview of TRIZ Problem-Solving Methodology and its Applications
PPTX
J o b s new sky 1 unit 36
Paper Organization (Progressions)
E0942531
Formal Modeling and Verification of Trusted OLSR Protocol Using I-SPIN Model...
L01246974
Scalability Enhancement of Push/Pull Server functions by converting Stateless...
A Video Watermarking Scheme to Hinder Camcorder Piracy
An Overview of TRIZ Problem-Solving Methodology and its Applications
J o b s new sky 1 unit 36

Viewers also liked (14)

KEY
History platform
PPTX
Get the Google Feeling! Supporting Users in Finding Relevant Sources
PPTX
Stedsnavn og geotagging
PPT
Almentariak hodei ehiztari
PDF
I0524953
PDF
C0511318
PDF
B0461015
PDF
De-virtualizing virtual Function Calls using various Type Analysis Technique...
PDF
E0512833
PPTX
Reimagine
PDF
D0562428
PDF
A Tailored Anti-Forensic Approach for Bitmap Compression in Medical Images
PDF
AACT: Anonymous and Accountable communication topology for Wireless Mesh Net...
PDF
Secondary Distribution for Grid Interconnected Nine-level Inverter using PV s...
History platform
Get the Google Feeling! Supporting Users in Finding Relevant Sources
Stedsnavn og geotagging
Almentariak hodei ehiztari
I0524953
C0511318
B0461015
De-virtualizing virtual Function Calls using various Type Analysis Technique...
E0512833
Reimagine
D0562428
A Tailored Anti-Forensic Approach for Bitmap Compression in Medical Images
AACT: Anonymous and Accountable communication topology for Wireless Mesh Net...
Secondary Distribution for Grid Interconnected Nine-level Inverter using PV s...
Ad

More from Thomas Gottron (9)

PDF
Focused Exploration of Geospatial Context on Linked Open Data
PDF
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
PPTX
Perplexity of Index Models over Evolving Linked Data
PPTX
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
PPTX
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
PDF
Making Use of the Linked Data Cloud: The Role of Index Structures
PPTX
 Challenges in Managing Online Business Communities
PPTX
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
PPTX
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Perplexity of Index Models over Evolving Linked Data
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Making Use of the Linked Data Cloud: The Role of Index Structures
 Challenges in Managing Online Business Communities
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Ad

Recently uploaded (20)

PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Sciences of Europe No 170 (2025)
PPTX
2. Earth - The Living Planet earth and life
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Comparative Structure of Integument in Vertebrates.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Introduction to Cardiovascular system_structure and functions-1
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Sciences of Europe No 170 (2025)
2. Earth - The Living Planet earth and life
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
. Radiology Case Scenariosssssssssssssss
AlphaEarth Foundations and the Satellite Embedding dataset
TOTAL hIP ARTHROPLASTY Presentation.pptx

Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

  • 1. Institute for Web Science & Technologies – WeST Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012 Sunday, 11 November 2012 Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations Christian Hachenberg and Thomas Gottron
  • 2. Mapping Documents to Entities dbpedia.org:Rob_Roy_(film) Finding Good URLs Thomas Gottron WoLE Workshop 2012 2
  • 3. Mapping Entities to Documents dbpedia.org:Rob_Roy_(film) Align entities in KB with public documents • Publish knowledge base • Propagate changes • Human readable representation Finding Good URLs Thomas Gottron WoLE Workshop 2012 3
  • 4. Task Definition George Lucas type: director dbpedia:George_Lucas type: movie ??? dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope 3 types of information: dbpedia:Harrison_Ford • Labels type: actor • Link structure Harrison Ford • Types Finding Good URLs Thomas Gottron WoLE Workshop 2012 4
  • 5. Label Search (using Web Search Engine) George Lucas type: director SW4 dbpedia:George_Lucas type: movie SW4 dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope SW4 dbpedia:Harrison_Ford Implementation: type: actor Harrison Ford • Bing Finding Good URLs Thomas Gottron WoLE Workshop 2012 5
  • 6. Exploiting Link Structure George Lucas type: director GL SW4 dbpedia:George_Lucas type: movie SW4 dbpedia:Star_Wars_Episode_IV:_A_New_Hope Star Wars IV: A New Hope HF Implementation: SW4 dbpedia:Harrison_Ford • In-degree type: actor Harrison Ford • PageRank • HITS Finding Good URLs Thomas Gottron WoLE Workshop 2012 6
  • 7. Type Filtering Gran Torino type: movie SW4 dbpedia:Gran_Torino_(film) GT type: movie SW4 dbpedia:Star_Wars_Episode_IV:_A_New_Hope RR Star Wars IV: A New Hope Rob Roy type: movie SW4 dbpedia:Rob_Roy_(film) Implementation: • Borda Count for domain ranking Finding Good URLs Thomas Gottron WoLE Workshop 2012 7
  • 8. Experimental Setup  100 Entities  4 domains (cities, companies, persons, movies)  Stratified by little, medium and large representation on the web  Complete network of linked entities  Application of label search and link structure approaches  Type-filtering as post-process  User evaluation (Cranfield setup, pooling)  Graded relevance judgements  High juror agreement (Krippendorff's Alpha >0.67) Finding Good URLs Thomas Gottron WoLE Workshop 2012 8
  • 9. Evaluation Metrics Finding Good URLs Thomas Gottron WoLE Workshop 2012 9
  • 10. Evaluation: Results Statistically significant , p=0.05 Finding Good URLs Thomas Gottron WoLE Workshop 2012 10
  • 11. Evaluation: Results (Domain, Stratum) Finding Good URLs Thomas Gottron WoLE Workshop 2012 11
  • 12. Evaluation: Results (Filtering) Finding Good URLs Thomas Gottron WoLE Workshop 2012 12
  • 13. Conclusions and Next Steps  Novel task: Mapping entities to public web URLs – Evaluated 9 link analysis and web search methods (+1 post- processing using Borda counts) – Best methods: Label Search and Focussed HITS • Semantic Typing boosts all results  Next steps: Investigate domain-dependent performance of methods Finding Good URLs Thomas Gottron WoLE Workshop 2012 13
  • 14. Thank you! Contact: WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de Finding Good URLs Thomas Gottron WoLE Workshop 2012 14