SlideShare a Scribd company logo
On the Semantic Representation and
Extraction of Complex Category
Descriptors
André Freitas, Rafael Vieira, Edward Curry, Danilo
Carvalho, João C. Pereira da Silva
Insight Centre for Data Analytics
NLDB 2014
Montpellier, France
Outline
 Motivation
 Extracting Natural Language Category Descriptors (NLCDs)
 Evaluation
 Summary
2
Motivation
3
Big Data
 Vision: More complete data-based picture of the world for
systems and users.
4
“Schema” Growth & Complexity
 Fundamental shift in the database landscape
 How to build large ‘schemas’?
10s-100s attributes
1,000s-1,000,000s attributes
5
Target Motivational Scenario: Wikipedia
 Decentralized content generation
 300,000 editors have edited Wikipedia more than 10 times
 > 280,000 distinct Natural Language Category Descriptors
(NLCDs)
6
Natural Language Category Descriptors
(NLCDs)
7
NLCDs
 Natural Language Category Descriptors (NLCDs) are
natural language descriptors for sets
 Simple NLCDs:
- ‘People’
- ‘Countries’
- ‘Films’
 Complex NLCDs:
- ‘French Senators Of The Second Empire’
- ‘United Kingdom Parliamentary Constituencies Represented
By A Sitting Prime Minister’
 Goal:
- Parse NLCDs into an integrated structured graph
8
Assumptions
N
L
C
D
 NLCDs as a more syntactically tractable subset of natural
language
 NLCDs as a low effort interface for structuring a domain of
discourse
IE
9
Formality vs. Usability Spectrum
NLCDss NLCD graphss
Information Extraction
10
NLCD graphss
Applications
 Database Creation
 Semantic Annotation
 Entity/Semantic Search
11
Other Examples
 IFRS and US GAAP
- ‘Partially owned properties’
- ‘Residential portfolio segment’
- ‘Assets arising from exploration for and evaluation of mineral
resources’
- ‘Key management personnel compensation’
- ‘Other long-term employee benefits’
12
Extracting Natural Language
Category Descriptors (NLCDs)
13
Natural Language Category
Descriptors
What is Big Data?
14
Core Features
 Manual analysis of 10,000 NLCDs.
15
Features/Core Lexical Categories
Distribution
16
Number of distinct POS Tag patterns
17
Graph Representation Model
18
Focus of the Representation
 Taxonomic Structure
 Context Representation (Open Relation Extraction)
- Reification-based
Examples
20
Examples
21
Examples
22
Examples
23
NLCD Extractor
24
NLCD Extractor: POS Tagging
25
NLCD Extractor: Segmentation
26
NLCD Extractor: Named Entity
Recognition
27
NLCD Extractor: Core Detection
28
NLCD Extractor: WSD
29
NLCD Extractor: Entity Linking
30
Dbpedia
NLCD Extractor: RDF Representation
31
Dbpedia
RDF Representation
32
Evaluation
33
Evaluation Setup
 Total of 287,957 English Wikipedia categories (Open Domain
scenario)
 Selected random sample of 2,696 categories
 Manual evaluation of the core extraction features
- Entity segmentation
- Relation identification
- Unary operators
- Specialization relations
- Category core identification
- Entity core identification
- Word Sense Disambiguation (WordNet)
- Entity linking (DBpedia)
34
Results
 Performance:
- (i) graph extraction time: 9.8 ms per graph
- (ii) word sense disambiguation: 121.0 ms per word
- (iii) entity linking: 530.0 ms per link
* i5-3317U (1.70GHz) CPU computer with 4GB RAM (4 core, 2 threads per core).
35
Summary
 NLCDs can provide a more tractable (from the IE perspective)
natural language interface for structuring large KBs
 We developed an approach for the representation, extraction
and integration of NLCDs
- ~75% extraction accuracy
 Limitations:
- Need for a more principled and formal definition for a NLCD
- Need for a better entity recognition and linking approach
 Future Work: evaluation under a domain-specific scenario
36

More Related Content

PDF
Let's do data research work: the creation of a portal with research informati...
PPT
Using Semantic Web Technologies to Facilitate XBRL-based Financial Data Compa...
PDF
Extending DCAM for Metadata Provenance
PDF
The Catalan Research portal: collecting information from Catalan universities...
PPTX
iRODS: Interoperability in Data Management
PDF
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
PPT
Introduction to DataCite and its Infrastructure for new Members
PDF
Nobel Prizes as Linked Open Data
Let's do data research work: the creation of a portal with research informati...
Using Semantic Web Technologies to Facilitate XBRL-based Financial Data Compa...
Extending DCAM for Metadata Provenance
The Catalan Research portal: collecting information from Catalan universities...
iRODS: Interoperability in Data Management
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
Introduction to DataCite and its Infrastructure for new Members
Nobel Prizes as Linked Open Data

Similar to On the Semantic Representation and Extraction of Complex Category Descriptors (20)

PPTX
Different Semantic Perspectives for Question Answering Systems
PDF
Perspectives on mining knowledge graphs from text
PDF
Big Data Analytics course: Named Entities and Deep Learning for NLP
PPTX
Semantic Perspectives for Contemporary Question Answering Systems
PDF
NERD: an open source platform for extracting and disambiguating named entitie...
PDF
Recent Advances In Natural Language Processing Iv Selected Papers From Ranlp ...
PDF
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
PDF
Computational Linguistics in the Netherlands 2000 Jorn Veenstra
PPTX
WISS QA Do it yourself Question answering over Linked Data
PDF
Effective Semantics for Engineering NLP Systems
PDF
An in-depth review on News Classification through NLP
PDF
From Ambition to Go Live
PDF
From Ambition to Go Live SWIB.pdf
PDF
Entity Linking, Link Prediction, and Knowledge Graph Completion
PPT
NLP Introduction.ppt machine learning presentation
PDF
Dirk Goldhahn: Introduction to the German Wortschatz Project
PDF
Natural Language Processing with Python
PDF
Babak Rasolzadeh: The importance of entities
PDF
Computational Linguistics in the Netherlands 2000 Jorn Veenstra
PPT
Text Analytics: Yesterday, Today and Tomorrow
Different Semantic Perspectives for Question Answering Systems
Perspectives on mining knowledge graphs from text
Big Data Analytics course: Named Entities and Deep Learning for NLP
Semantic Perspectives for Contemporary Question Answering Systems
NERD: an open source platform for extracting and disambiguating named entitie...
Recent Advances In Natural Language Processing Iv Selected Papers From Ranlp ...
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
Computational Linguistics in the Netherlands 2000 Jorn Veenstra
WISS QA Do it yourself Question answering over Linked Data
Effective Semantics for Engineering NLP Systems
An in-depth review on News Classification through NLP
From Ambition to Go Live
From Ambition to Go Live SWIB.pdf
Entity Linking, Link Prediction, and Knowledge Graph Completion
NLP Introduction.ppt machine learning presentation
Dirk Goldhahn: Introduction to the German Wortschatz Project
Natural Language Processing with Python
Babak Rasolzadeh: The importance of entities
Computational Linguistics in the Netherlands 2000 Jorn Veenstra
Text Analytics: Yesterday, Today and Tomorrow
Ad

More from Andre Freitas (20)

PDF
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
PDF
AI Systems @ Manchester
PDF
AI Beyond Deep Learning
PPTX
Building AI Applications using Knowledge Graphs
PDF
Open IE tutorial 2018
PPTX
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
PPTX
Semantic Relation Classification: Task Formalisation and Refinement
PPTX
Categorization of Semantic Roles for Dictionary Definitions
PPTX
Word Tagging with Foundational Ontology Classes
PPTX
WiSS Challenge - Day 2
PDF
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
PPTX
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
PDF
Semantics at Scale: A Distributional Approach
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
PDF
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
PDF
How Semantic Technologies can help to cure Hearing Loss?
PPTX
Towards a Distributional Semantic Web Stack
PPTX
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
PPTX
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
PDF
Question Answering over Linked Data (Reasoning Web Summer School)
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI Systems @ Manchester
AI Beyond Deep Learning
Building AI Applications using Knowledge Graphs
Open IE tutorial 2018
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
Semantic Relation Classification: Task Formalisation and Refinement
Categorization of Semantic Roles for Dictionary Definitions
Word Tagging with Foundational Ontology Classes
WiSS Challenge - Day 2
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
Semantics at Scale: A Distributional Approach
Schema-agnositc queries over large-schema databases: a distributional semanti...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
How Semantic Technologies can help to cure Hearing Loss?
Towards a Distributional Semantic Web Stack
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Question Answering over Linked Data (Reasoning Web Summer School)
Ad

Recently uploaded (20)

PPTX
famous lake in india and its disturibution and importance
PDF
An interstellar mission to test astrophysical black holes
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
famous lake in india and its disturibution and importance
An interstellar mission to test astrophysical black holes
neck nodes and dissection types and lymph nodes levels
bbec55_b34400a7914c42429908233dbd381773.pdf
microscope-Lecturecjchchchchcuvuvhc.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Phytochemical Investigation of Miliusa longipes.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
INTRODUCTION TO EVS | Concept of sustainability
Introduction to Cardiovascular system_structure and functions-1
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
2. Earth - The Living Planet Module 2ELS
Placing the Near-Earth Object Impact Probability in Context
Biophysics 2.pdffffffffffffffffffffffffff
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5

On the Semantic Representation and Extraction of Complex Category Descriptors