SlideShare a Scribd company logo
PATHSenrich: A Web Service Prototype
for Automatic Cultural Heritage Item
Enrichment
Eneko Agirre, Ander Barrena, Kike Fernandez, Esther Miranda,
Arantxa Otegi, and Aitor Soroa
IXA NLP Group, University of the Basque Country UPV/EHU
arantza.otegi@ehu.es
Abstract. Large amounts of cultural heritage material are nowadays
available through online digital library portals. Most of these cultural
items have short descriptions and lack rich contextual information. The
PATHS project has developed experimental enrichment services. As a
proof of concept, this paper presents a web service prototype which allows
independent content providers to enrich cultural heritage items with a
subset of the full functionality: links to related items in the collection
and links to related Wikipedia articles. In the future we plan to provide
more advanced functionality, as available offline for PATHS.

1

Introduction

Large amounts of cultural heritage (CH) material are now available through
online digital library portals, such as Europeana1. Europeana hosts millions of
books, paintings, films, museum objects and archival records that have been digitised throughout Europe. Europeana collects contextual information or metadata
about different types of content, which the users can use for their searches.
The main strength of Europeana lays in the vast number of items it contains.
Sometimes, though, this quantity comes at the cost of a restricted amount of
metadata, with many items having very short descriptions and a lack of rich
contextual information. One of the goals of the PATHS project2 is precisely to
enrich CH items, using a selected subset of Europeana as a testbed[1].
Whithin the project, this enrichment will make possible to create a system
that acts as an interactive personalised tour guide through Europeana collections, offering suggestions about items to look at and assist in their interpretation by providing relevant contextual information from related items within
Europeana and items from external sources like Wikipedia. Users of such digital
libraries may require information for purposes such as learning and seeking answers to questions. This additional information supports users in fulfilling their
information need, as the evaluation of the first PATHS prototype shows [2].
In this paper we present a web service prototype which allows independent
content providers to enrich CH items. Specifically, the service enriches the items
1
2

http://www.europeana.eu/portal/
http://www.paths-project.eu

T. Aalberg et al. (Eds.): TPDL 2013, LNCS 8092, pp. 462–465, 2013.
c Springer-Verlag Berlin Heidelberg 2013
PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment

463

with two types of information. On the one hand, the item will be linked to
similar items within the collection. On the other hand, the item will be linked
to Wikipedia articles which are related to it.
There have been many attempts to automatically enrich cultural heritage
metadata. Some projects (for instance, MIMO-DB3 or MERLIN4 ) relate CH
objects with terms of an external authority or vocabulary. Some others (like
MACE5 or YUMA 6 ) adopt a collaborative annotation paradigm for metadata
enrichment. To our knowledge, PATHS is the first project using semantic NLP
processing to link CH items to similar items or external Wikipedia articles.
The current service has limited bandwidth, and provides a selected subset
of the enrichment functionality available internally in the PATHS project. The
quality of the links produce is also slightly lower, although we plan to improve it
in the short future. However, we think that the prototype is useful to demonstrate
the potential to construct a web service for automatically enriching CH items
with high quality information.

2

Demo Description

The web service takes as input one CH item represented following the Europeana
Data Model (EDM) in JSON format, as exported by the Europeana API v2.07 (a
sample record is provided in the interface). The web service returns the following:
– A list of 10 closely related items within the collection.
– A list of Wikipedia pages which are related to the target item.
Figure 1 shows a snapshot of the web service. The service is publicly accessible
following the URL http://ixa2.si.ehu.es/paths_wp2/paths_wp2.pl.
The enrichment is performed by analyzing the metadata associated with the
item, i.e., the title of the item, its description, etc. The next sections briefly
describe how this enrichment is performed.
2.1

Related Items within the Collection

The list of related items is obtained by first creating a query with the content
of the title, subject and description fields (stopwords are removed). The query
is then posted to a SOLR search engine8 . The SOLR search engine accesses an
index created with the subset of Europeana items already enriched offline within
the PATHS project. In that way, the most related Europeana items in the subset
are obtained, and the identifiers of those related items are listed. Note that the
related items used internally in the PATHS project are produced using more
sophisticated methods. Please refer to [1] for further details.
3
4
5
6
7
8

http://www.mimo-international.com
http://www.ucl.ac.uk/ls/merlin
http://www.mace-project.eu
http://dme.ait.ac.at/annotation
http://preview.europeana.eu/portal/api-introduction.html
http://lucene.apache.org/solr/
464

E. Agirre et al.

Fig. 1. Web service interface. It consists of a text area to introduce the input item
in JSON format (top). The “Get EDM JSON example” button can be used to get an
input example. Once a JSON record is typed, click “Process” button to get the output.
The output (bottom) consists on a list of related items and background links.

2.2

Related Wikipedia Articles

For linking the items to Wikipedia articles we follow an implementation similar
to the method described in [3]. This method creates a dictionary, an association
between string mentions with all possible articles the mention can refer to. Our
dictionary is constructed using the title of the Wikipedia article, the redirect
pages, the disambiguation pages and the anchor texts from Wikipedia links.
Mentions are lower-cased and all text between parenthesis is removed. If the
mention links to a disambiguation page, it is associated with all possible articles
the disambiguation page points to. Besides, each association between a mention
and article is scored with the prior probability, estimated as the number of
times that the mention occurs in the anchor text of an article. Note that such
dictionaries can disambiguate any mention, just returning the highest-scoring
article for this particular mention.
Once the dictionary is built, the web service analyzes the title, subject and
description fields of the CH item and matches the longest substring within those
fields with entries in the dictionary. When a match is found, the Wikipedia article
with highest score for this entry is returned. Note that the links to Wikipedia
in the PATHS project are produced using more sophisticated methods. Please
refer to [1] for further details.
PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment

3

465

Conclusions and Future Work

This paper presents a web service prototype which automatically enriches CH
items with metadata. The web service is inspired in the enrichment work carried
out in the PATHS project, but, contrary to the batch methodology used in the
project, this enrichment is performed online. The prototype has been designed
for demonstration purposes, to showcase the feasibility of providing full-fledged
automatic enrichment.
Our plans for the future include moving the offline enrichment services which
are currently being evaluated in the PATHS project to the web service. In the
case of related Wikipedia articles, we will take into account the context of the
matched entities, which improves the quality of the links [4], and we will include
a filtering algorithm to discard entities that are not relevant. Regarding related
items, we will classify them according to the type of relation [5]. In addition we
plan to automatically organize the items hierarchically, according to a Wikipediabased vocabulary [6].
Acknowledgements. The research leading to these results was carried out as
part of the PATHS project (http://www.paths-project.eu) funded by European Communitys Seventh Framework Programme (FP7/2007- 2013) under
grant agreement no. 270082. The work has been also funded by the Basque
Government (project IBILBIDE, SAIOTEK S-PE12UN089).

References
1. Otegi, A., Agirre, E., Soroa, A., Aletras, N., Chandrinos, C., Fernando, S., GonzalezAgirre, A.: Report accompanying D2.2: Processing and Representation of Content
for Second Prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/content/download/2489/18113/version/2/
file/D2.2.Content+Processing-2nd+Prototype-revised.v2.pdf
2. Griffiths, J., Goodale, P., Minelli, S., de Polo, A., Agerri, R., Soroa, A., Hall, M.,
Bergheim, S.R., Chandrinos, K., Chryssochoidis, G., Fernie, K., Usher, T.: D5.1:
Evaluation of the first PATHS prototype. PATHS Project Deliverable (2012),
http://www.paths-project.eu/eng/Resources/
D5.1-Evaluation-of-the-1st-PATHS-Prototype
3. Chang, A.X., Spitkovsky, V.I., Yeh, E., Agirre, E., Manning, C.D.: Stanford-UBC
entity linking at TAC-KBP. In: Proceedings of TAC 2010, Gaithersburg, Maryland,
USA (2010)
4. Han, X., Sun, L.: A Generative Entity-Mention Model for Linking Entities with
Knowledge Base. In: Proceedings of the ACL, Portland, Oregon, USA (2011)
5. Agirre, E., Aletras, N., Gonzalez-Agirre, A., Rigau, G., Stevenson, M.: UBC UOSTYPED: Regression for typed-similarity. In: Second Joint Conference on Lexical
and Computational Semantics (*SEM), Atlanta, Georgia, USA (2013)
6. Fernando, S., Hall, M., Agirre, E., Soroa, A., Clough, P., Stevenson, M.: Comparing Taxonomies for Organising Collections of Documents. In: Proceedings of
COLING 2012, Mumbai, India (2013)

More Related Content

PDF
Implementing Recommendations in the PATHS system, SUEDL 2013
PDF
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
PDF
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
PDF
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
PDF
Semantic Enrichment of Cultural Heritage content in PATHS
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
PPTX
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
PDF
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
Implementing Recommendations in the PATHS system, SUEDL 2013
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Semantic Enrichment of Cultural Heritage content in PATHS
euclid_linkedup WWW tutorial (Besnik Fetahu)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned

What's hot (20)

PDF
Cluster Based Web Search Using Support Vector Machine
PDF
Ji cv6n2
PPT
Object models and object representation
PPTX
Annotations chicago
PPT
Linked Data as a new environment for Learning Analytics and education
ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
KEY
LIBRIS - Linked Library Data
PPTX
Current metadata landscape in the library world (Getaneh Alemu)
PPTX
Linked Data in Libraries
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPTX
Metadata enriching and filtering for enhanced collection discoverability
PPTX
Big Linked Data - Creating Training Curricula
PDF
CS6010 Social Network Analysis Unit II
PPTX
Metadata enriching and discovery at Solent University Library
PPTX
Interaction with Linked Data
PPTX
Metadata for digital humanities
PPTX
Building Linked Data Applications
PPT
Linked library data
PPTX
Linked Data for African Libraries
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Cluster Based Web Search Using Support Vector Machine
Ji cv6n2
Object models and object representation
Annotations chicago
Linked Data as a new environment for Learning Analytics and education
Intro to Linked Open Data in Libraries, Archives & Museums
LIBRIS - Linked Library Data
Current metadata landscape in the library world (Getaneh Alemu)
Linked Data in Libraries
Usage of Linked Data: Introduction and Application Scenarios
Metadata enriching and filtering for enhanced collection discoverability
Big Linked Data - Creating Training Curricula
CS6010 Social Network Analysis Unit II
Metadata enriching and discovery at Solent University Library
Interaction with Linked Data
Metadata for digital humanities
Building Linked Data Applications
Linked library data
Linked Data for African Libraries
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Ad

Viewers also liked (11)

PPT
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
PDF
TAM-2012-07 R C PS Malayakulam -
PDF
GUJ-2012-12 Fazalpur Prathmik Shala No 1
PPTX
презентация:)
PDF
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
PDF
The old exchange environment versus modern exchange environment part 02#36
PDF
My E-mail appears as spam - Troubleshooting path | Part 11#17
PPTX
Feg chapter 04 - present perfect azar
PDF
How does sender verification work how we identify spoof mail) spf, dkim dmar...
PPTX
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
PDF
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
Величко М.В. (2014.02.26) — О Майдане и перспективах Украины и России
TAM-2012-07 R C PS Malayakulam -
GUJ-2012-12 Fazalpur Prathmik Shala No 1
презентация:)
Exchange In-Place eDiscovery & Hold | Introduction | 5#7
The old exchange environment versus modern exchange environment part 02#36
My E-mail appears as spam - Troubleshooting path | Part 11#17
Feg chapter 04 - present perfect azar
How does sender verification work how we identify spoof mail) spf, dkim dmar...
IND-2012-255 PUPS Subramaniapuram, Tenkasi -Pioneer to save Earthworm and use...
IND-2012-300 Mother's Pet Kindergarten Nagpur - A U trurn for traffic Rules
Ad

Similar to PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment, @TPDL 2013 (20)

PPT
Of Cataloging & Context
PDF
Recommendations for the automatic enrichment of digital library content using...
PPT
The European Portal for documents and Archives: the APEnet Project
PPT
EuropeanaConnect - Enhancing User Access to European Digital Heritage
PPT
Portrait Of Europeana As An Api
PPT
EuropeanaLocal: what’s it all about?
PDF
Europeana Creative. EDM Endpoint. Custom Views
PDF
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
PDF
77. newsletter d andrea2012
PPT
Europeana Connect All-Staff Meeting
PDF
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
PDF
LoCloud - D3.3: Metadata Enrichment services
PPT
Europeana vision - Web as Literature 2013
PPT
Europeana and Researchers
PDF
Case Study: Europeana API Implementation in Polish Digital Libraries
PPT
Lodlam presentation v1.0 final al20151104
PDF
Documents, services, and data on the web
PDF
Citizen Science Open Data
PDF
Institutional Services and Tools for Content, Metadata and IPR Management
PPT
AAC Education Session
Of Cataloging & Context
Recommendations for the automatic enrichment of digital library content using...
The European Portal for documents and Archives: the APEnet Project
EuropeanaConnect - Enhancing User Access to European Digital Heritage
Portrait Of Europeana As An Api
EuropeanaLocal: what’s it all about?
Europeana Creative. EDM Endpoint. Custom Views
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
77. newsletter d andrea2012
Europeana Connect All-Staff Meeting
Enhancing scholarly publishing, jankowski, tatum, tatum, & scharnhorst, pkp c...
LoCloud - D3.3: Metadata Enrichment services
Europeana vision - Web as Literature 2013
Europeana and Researchers
Case Study: Europeana API Implementation in Polish Digital Libraries
Lodlam presentation v1.0 final al20151104
Documents, services, and data on the web
Citizen Science Open Data
Institutional Services and Tools for Content, Metadata and IPR Management
AAC Education Session

More from pathsproject (20)

PDF
Generating Paths through Cultural Heritage Collections Latech2013 paper
PDF
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
PDF
PATHS state of the art monitoring report
PDF
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
PPT
PATHS @ LATECH 2013
PDF
PATHS at the eChallenges conference
PDF
PATHS at the EAA conference 2013
PDF
PATHS at the eCult dialogue day 2013
PDF
Comparing taxonomies for organising collections of documents presentation
PDF
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
PDF
A pilot on Semantic Textual Similarity
PDF
Comparing taxonomies for organising collections of documents
PDF
PATHS Final prototype interface design v1.0
PDF
PATHS Evaluation of the 1st paths prototype
PDF
PATHS Second prototype-functional-spec
PDF
PATHS Final state of art monitoring report v0_4
PDF
PATHS first paths prototype
PDF
PATHS Content processing 2nd prototype-revised.v2
PDF
PATHS Content processing 1st prototype
PDF
PATHS system architecture
Generating Paths through Cultural Heritage Collections Latech2013 paper
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
PATHS state of the art monitoring report
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
PATHS @ LATECH 2013
PATHS at the eChallenges conference
PATHS at the EAA conference 2013
PATHS at the eCult dialogue day 2013
Comparing taxonomies for organising collections of documents presentation
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
Comparing taxonomies for organising collections of documents
PATHS Final prototype interface design v1.0
PATHS Evaluation of the 1st paths prototype
PATHS Second prototype-functional-spec
PATHS Final state of art monitoring report v0_4
PATHS first paths prototype
PATHS Content processing 2nd prototype-revised.v2
PATHS Content processing 1st prototype
PATHS system architecture

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
sap open course for s4hana steps from ECC to s4
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
sap open course for s4hana steps from ECC to s4
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Spectral efficient network and resource selection model in 5G networks

PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment, @TPDL 2013

  • 1. PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enrichment Eneko Agirre, Ander Barrena, Kike Fernandez, Esther Miranda, Arantxa Otegi, and Aitor Soroa IXA NLP Group, University of the Basque Country UPV/EHU arantza.otegi@ehu.es Abstract. Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS. 1 Introduction Large amounts of cultural heritage (CH) material are now available through online digital library portals, such as Europeana1. Europeana hosts millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe. Europeana collects contextual information or metadata about different types of content, which the users can use for their searches. The main strength of Europeana lays in the vast number of items it contains. Sometimes, though, this quantity comes at the cost of a restricted amount of metadata, with many items having very short descriptions and a lack of rich contextual information. One of the goals of the PATHS project2 is precisely to enrich CH items, using a selected subset of Europeana as a testbed[1]. Whithin the project, this enrichment will make possible to create a system that acts as an interactive personalised tour guide through Europeana collections, offering suggestions about items to look at and assist in their interpretation by providing relevant contextual information from related items within Europeana and items from external sources like Wikipedia. Users of such digital libraries may require information for purposes such as learning and seeking answers to questions. This additional information supports users in fulfilling their information need, as the evaluation of the first PATHS prototype shows [2]. In this paper we present a web service prototype which allows independent content providers to enrich CH items. Specifically, the service enriches the items 1 2 http://www.europeana.eu/portal/ http://www.paths-project.eu T. Aalberg et al. (Eds.): TPDL 2013, LNCS 8092, pp. 462–465, 2013. c Springer-Verlag Berlin Heidelberg 2013
  • 2. PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment 463 with two types of information. On the one hand, the item will be linked to similar items within the collection. On the other hand, the item will be linked to Wikipedia articles which are related to it. There have been many attempts to automatically enrich cultural heritage metadata. Some projects (for instance, MIMO-DB3 or MERLIN4 ) relate CH objects with terms of an external authority or vocabulary. Some others (like MACE5 or YUMA 6 ) adopt a collaborative annotation paradigm for metadata enrichment. To our knowledge, PATHS is the first project using semantic NLP processing to link CH items to similar items or external Wikipedia articles. The current service has limited bandwidth, and provides a selected subset of the enrichment functionality available internally in the PATHS project. The quality of the links produce is also slightly lower, although we plan to improve it in the short future. However, we think that the prototype is useful to demonstrate the potential to construct a web service for automatically enriching CH items with high quality information. 2 Demo Description The web service takes as input one CH item represented following the Europeana Data Model (EDM) in JSON format, as exported by the Europeana API v2.07 (a sample record is provided in the interface). The web service returns the following: – A list of 10 closely related items within the collection. – A list of Wikipedia pages which are related to the target item. Figure 1 shows a snapshot of the web service. The service is publicly accessible following the URL http://ixa2.si.ehu.es/paths_wp2/paths_wp2.pl. The enrichment is performed by analyzing the metadata associated with the item, i.e., the title of the item, its description, etc. The next sections briefly describe how this enrichment is performed. 2.1 Related Items within the Collection The list of related items is obtained by first creating a query with the content of the title, subject and description fields (stopwords are removed). The query is then posted to a SOLR search engine8 . The SOLR search engine accesses an index created with the subset of Europeana items already enriched offline within the PATHS project. In that way, the most related Europeana items in the subset are obtained, and the identifiers of those related items are listed. Note that the related items used internally in the PATHS project are produced using more sophisticated methods. Please refer to [1] for further details. 3 4 5 6 7 8 http://www.mimo-international.com http://www.ucl.ac.uk/ls/merlin http://www.mace-project.eu http://dme.ait.ac.at/annotation http://preview.europeana.eu/portal/api-introduction.html http://lucene.apache.org/solr/
  • 3. 464 E. Agirre et al. Fig. 1. Web service interface. It consists of a text area to introduce the input item in JSON format (top). The “Get EDM JSON example” button can be used to get an input example. Once a JSON record is typed, click “Process” button to get the output. The output (bottom) consists on a list of related items and background links. 2.2 Related Wikipedia Articles For linking the items to Wikipedia articles we follow an implementation similar to the method described in [3]. This method creates a dictionary, an association between string mentions with all possible articles the mention can refer to. Our dictionary is constructed using the title of the Wikipedia article, the redirect pages, the disambiguation pages and the anchor texts from Wikipedia links. Mentions are lower-cased and all text between parenthesis is removed. If the mention links to a disambiguation page, it is associated with all possible articles the disambiguation page points to. Besides, each association between a mention and article is scored with the prior probability, estimated as the number of times that the mention occurs in the anchor text of an article. Note that such dictionaries can disambiguate any mention, just returning the highest-scoring article for this particular mention. Once the dictionary is built, the web service analyzes the title, subject and description fields of the CH item and matches the longest substring within those fields with entries in the dictionary. When a match is found, the Wikipedia article with highest score for this entry is returned. Note that the links to Wikipedia in the PATHS project are produced using more sophisticated methods. Please refer to [1] for further details.
  • 4. PATHSenrich: A Web Service Prototype for Automatic CH Item Enrichment 3 465 Conclusions and Future Work This paper presents a web service prototype which automatically enriches CH items with metadata. The web service is inspired in the enrichment work carried out in the PATHS project, but, contrary to the batch methodology used in the project, this enrichment is performed online. The prototype has been designed for demonstration purposes, to showcase the feasibility of providing full-fledged automatic enrichment. Our plans for the future include moving the offline enrichment services which are currently being evaluated in the PATHS project to the web service. In the case of related Wikipedia articles, we will take into account the context of the matched entities, which improves the quality of the links [4], and we will include a filtering algorithm to discard entities that are not relevant. Regarding related items, we will classify them according to the type of relation [5]. In addition we plan to automatically organize the items hierarchically, according to a Wikipediabased vocabulary [6]. Acknowledgements. The research leading to these results was carried out as part of the PATHS project (http://www.paths-project.eu) funded by European Communitys Seventh Framework Programme (FP7/2007- 2013) under grant agreement no. 270082. The work has been also funded by the Basque Government (project IBILBIDE, SAIOTEK S-PE12UN089). References 1. Otegi, A., Agirre, E., Soroa, A., Aletras, N., Chandrinos, C., Fernando, S., GonzalezAgirre, A.: Report accompanying D2.2: Processing and Representation of Content for Second Prototype. PATHS Project Deliverable (2012), http://www.paths-project.eu/eng/content/download/2489/18113/version/2/ file/D2.2.Content+Processing-2nd+Prototype-revised.v2.pdf 2. Griffiths, J., Goodale, P., Minelli, S., de Polo, A., Agerri, R., Soroa, A., Hall, M., Bergheim, S.R., Chandrinos, K., Chryssochoidis, G., Fernie, K., Usher, T.: D5.1: Evaluation of the first PATHS prototype. PATHS Project Deliverable (2012), http://www.paths-project.eu/eng/Resources/ D5.1-Evaluation-of-the-1st-PATHS-Prototype 3. Chang, A.X., Spitkovsky, V.I., Yeh, E., Agirre, E., Manning, C.D.: Stanford-UBC entity linking at TAC-KBP. In: Proceedings of TAC 2010, Gaithersburg, Maryland, USA (2010) 4. Han, X., Sun, L.: A Generative Entity-Mention Model for Linking Entities with Knowledge Base. In: Proceedings of the ACL, Portland, Oregon, USA (2011) 5. Agirre, E., Aletras, N., Gonzalez-Agirre, A., Rigau, G., Stevenson, M.: UBC UOSTYPED: Regression for typed-similarity. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Atlanta, Georgia, USA (2013) 6. Fernando, S., Hall, M., Agirre, E., Soroa, A., Clough, P., Stevenson, M.: Comparing Taxonomies for Organising Collections of Documents. In: Proceedings of COLING 2012, Mumbai, India (2013)