SlideShare a Scribd company logo
Enabling Networked Knowledge
Digital Enterprise Research Institute
Entity Detection and Consolidation:
How to Make Your Content Smarter?
Bianca Pereira, Paul Buitelaar
Unit for Natural Language Processing
Digital Enterprise Research Institute, National University of Ireland, Galway
Acknowledgements: This work has been funded by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).
Motivation:
Information available online can be acquired both through human reading and computer processing. Despite this, the
majority of data on the Web does not allow both types of reading.
Research Questions:
•  How to identify entity mentions from text?
Entity Detection
•  How to identify which is the real-world entity
mentioned on the text? And find the same entity
through diverse texts?
Entity Consolidation
Research Contribution:
•  Quality assessment of some linked data datasets currently available on the Web.
•  Identification of common classes and properties used for Named Entities (entities identified by Proper Names) in Linked
Data datasets.
•  Development of a framework adaptable to different linked data datasets.
Aim:
Link human readable and computer processing content in order to enable machine understanding of the content of a
given text and enable humans to track entities across texts.
Proposed Solution:
•  Identification of different mentions to real-world
entities in natural language text and their unified,
non-ambiguos linking to an external database.
•  Use the available, and growing, linked data cloud
as background database.
• Development of AELA, a framework for entity
detection and consolidation.
Future Research:
•  Detection of entities mentioned by generalized names (genes, diseases or words such as ambulance, coffee machine,
airplane, etc.).
•  Application of AELA in texts in different domains.
•  Evaluation of other current methods when applied to AELA.
AELA:
AELA Framework
•  Experiments on films and music domains.
•  Adaptive to the semantic structure of the Linked
Data (LD) dataset.
Preliminary Results:
•  Music Domain (Jamendo dataset)
F-Score: 0.54
•  Films Domain (Linked Movie Database dataset)
F-Score: 0.87

More Related Content

PPTX
Research Data Services Best Practices by Dalal Rahme
PPTX
INFORMATION RETRIEVAL Anandraj.L
PPT
Information retrieval
PPT
Information retrieval system
PPTX
DICTIONARY
PDF
Information Storage and Retrieval : A Case Study
PPTX
Web search vs ir
PPTX
Information retrieval s
Research Data Services Best Practices by Dalal Rahme
INFORMATION RETRIEVAL Anandraj.L
Information retrieval
Information retrieval system
DICTIONARY
Information Storage and Retrieval : A Case Study
Web search vs ir
Information retrieval s

What's hot (20)

PPTX
Functions of information retrival system(1)
PDF
Information storage and retrieval
PDF
Introduction to Scholarly Communication and the CSCDC
PDF
Research Data Management: What is it and why is the Library & Archives Servic...
PPTX
Information Storage and Retrieval system (ISRS)
PDF
Di d dlf_handout
PPT
Bioinformatioc: Information Retrieval - II
PPTX
The role of virtual research environments (VRE's) within the context of an e-...
PDF
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
PPSX
DOMAINS OF USER STUDIES (User Studies and User Education)
PPSX
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
PDF
Semantic Web-Linked Data and Libraries
PPTX
EDRM - OLP
PPT
PPTX
Software Sustainability: Better Software Better Science
PPTX
Information storage and retrieval
PPTX
Csci 6530 2016 spring presentation
PPTX
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Functions of information retrival system(1)
Information storage and retrieval
Introduction to Scholarly Communication and the CSCDC
Research Data Management: What is it and why is the Library & Archives Servic...
Information Storage and Retrieval system (ISRS)
Di d dlf_handout
Bioinformatioc: Information Retrieval - II
The role of virtual research environments (VRE's) within the context of an e-...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
DOMAINS OF USER STUDIES (User Studies and User Education)
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
Semantic Web-Linked Data and Libraries
EDRM - OLP
Software Sustainability: Better Software Better Science
Information storage and retrieval
Csci 6530 2016 spring presentation
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Ad

Viewers also liked (9)

PDF
Examen final de computación I
DOCX
Caratula
PDF
Revista osadia nº 0 para web
PPTX
Front cover analysis
PDF
El hombre y la mujer en el principio
PPT
适合中小学教师开展虚拟教研的网络技术与平台举例
DOCX
3.7 new radio, new parties
PPTX
Rph pi thn 1 (istinjak)
PPTX
Ii peru.competência tributária
Examen final de computación I
Caratula
Revista osadia nº 0 para web
Front cover analysis
El hombre y la mujer en el principio
适合中小学教师开展虚拟教研的网络技术与平台举例
3.7 new radio, new parties
Rph pi thn 1 (istinjak)
Ii peru.competência tributária
Ad

Similar to How to Make Your Content Smarter (20)

PDF
AELA: An Adaptive Entity Linking Approach
PDF
PDF
CS8080_IRT__UNIT_I_NOTES.pdf
PPTX
Data-knowledge transition zones within the biomedical research ecosystem
PPTX
Introduction to Information Retrieval (concepts and principles)
DOCX
Post 1What is text analytics How does it differ from text mini
DOCX
Post 1What is text analytics How does it differ from text mini.docx
PDF
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
PPTX
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
PPTX
Neuroinformatics Databases Ontologies Federated Database.pptx
PDF
Session 0.0 poster minutes madness
PPT
Semantic technologies for the Internet of Things
PPTX
Linked Open Data_mlanet13
DOCX
Web Mining
PPTX
Impact of Covid-19 on Learning and Education
PDF
From Linked Data to Semantic Applications
PPTX
Building a Public Research Center for the HathiTrust Digital Library
ODT
Riding The Semantic Wave
PPTX
Understanding the Critical Relationship Between Hadoop, Big Data, and Deep Le...
PDF
from local/regional OER Silos towards an OER Global Dataspace
AELA: An Adaptive Entity Linking Approach
CS8080_IRT__UNIT_I_NOTES.pdf
Data-knowledge transition zones within the biomedical research ecosystem
Introduction to Information Retrieval (concepts and principles)
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini.docx
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
Neuroinformatics Databases Ontologies Federated Database.pptx
Session 0.0 poster minutes madness
Semantic technologies for the Internet of Things
Linked Open Data_mlanet13
Web Mining
Impact of Covid-19 on Learning and Education
From Linked Data to Semantic Applications
Building a Public Research Center for the HathiTrust Digital Library
Riding The Semantic Wave
Understanding the Critical Relationship Between Hadoop, Big Data, and Deep Le...
from local/regional OER Silos towards an OER Global Dataspace

More from Bianca Pereira (16)

PDF
Dealing with writer's block
PDF
HCI Challenges in Crowd4Access Citizen Science project
PDF
Taxonomy Extraction for Customer Service Knowledge Base Construction
PDF
How to build your topic?
PDF
Dealing with writer's block
PPT
Smart Futures presentation at St. Raphael's College
PPTX
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
PPTX
Tutorial de Web Semântica - CompSem 2015
PDF
DBpedia as Gaeilge Chapter
PPTX
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
PDF
PhD Day: Adaptive Entity Linking
PPTX
PhD Day: Entity Linking using Generic Linked Data Datasets
PDF
PhD Day: Entity Linking using Ontology Modularization
PPTX
NUIG Research Showcase 2014
PPTX
Reading Group 2013 (DERI NUIG)
PPTX
Reading Group 2014 (Insight NUIG)
Dealing with writer's block
HCI Challenges in Crowd4Access Citizen Science project
Taxonomy Extraction for Customer Service Knowledge Base Construction
How to build your topic?
Dealing with writer's block
Smart Futures presentation at St. Raphael's College
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Tutorial de Web Semântica - CompSem 2015
DBpedia as Gaeilge Chapter
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
PhD Day: Adaptive Entity Linking
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Ontology Modularization
NUIG Research Showcase 2014
Reading Group 2013 (DERI NUIG)
Reading Group 2014 (Insight NUIG)

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
August Patch Tuesday
PDF
Approach and Philosophy of On baking technology
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
A Presentation on Artificial Intelligence
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Touch Screen Technology
PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Tartificialntelligence_presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
August Patch Tuesday
Approach and Philosophy of On baking technology
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
OMC Textile Division Presentation 2021.pptx
TLE Review Electricity (Electricity).pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Enhancing emotion recognition model for a student engagement use case through...
WOOl fibre morphology and structure.pdf for textiles
A Presentation on Artificial Intelligence
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
1. Introduction to Computer Programming.pptx
Programs and apps: productivity, graphics, security and other tools
1 - Historical Antecedents, Social Consideration.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Touch Screen Technology
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

How to Make Your Content Smarter

  • 1. Enabling Networked Knowledge Digital Enterprise Research Institute Entity Detection and Consolidation: How to Make Your Content Smarter? Bianca Pereira, Paul Buitelaar Unit for Natural Language Processing Digital Enterprise Research Institute, National University of Ireland, Galway Acknowledgements: This work has been funded by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2). Motivation: Information available online can be acquired both through human reading and computer processing. Despite this, the majority of data on the Web does not allow both types of reading. Research Questions: •  How to identify entity mentions from text? Entity Detection •  How to identify which is the real-world entity mentioned on the text? And find the same entity through diverse texts? Entity Consolidation Research Contribution: •  Quality assessment of some linked data datasets currently available on the Web. •  Identification of common classes and properties used for Named Entities (entities identified by Proper Names) in Linked Data datasets. •  Development of a framework adaptable to different linked data datasets. Aim: Link human readable and computer processing content in order to enable machine understanding of the content of a given text and enable humans to track entities across texts. Proposed Solution: •  Identification of different mentions to real-world entities in natural language text and their unified, non-ambiguos linking to an external database. •  Use the available, and growing, linked data cloud as background database. • Development of AELA, a framework for entity detection and consolidation. Future Research: •  Detection of entities mentioned by generalized names (genes, diseases or words such as ambulance, coffee machine, airplane, etc.). •  Application of AELA in texts in different domains. •  Evaluation of other current methods when applied to AELA. AELA: AELA Framework •  Experiments on films and music domains. •  Adaptive to the semantic structure of the Linked Data (LD) dataset. Preliminary Results: •  Music Domain (Jamendo dataset) F-Score: 0.54 •  Films Domain (Linked Movie Database dataset) F-Score: 0.87