SlideShare a Scribd company logo
An Ensemble Approach for Entity Type
Prediction over Linked Data
Guangyuan Piao, John G. Breslin
Insight Centre for Data Analytics @NUI Galway, Ireland
Unit for Social Software
The Data Challenge at 5th Joint International Semantic Technology Conference
Yichang, China, 11/11/2015
Contents
• Introduction of the Data Challenge
• Overall Approach
• Results
2
• the main task of the challenge is to predict labels of
entities/resources in Zhishi.me1
• 1,897 entity URLs are provided and 1,397 of them are provided
with label information. Information related to the entities:
• abstracts of entities
• infobox properties
• external links
• related pages
• 13 participated teams
3
Introduction of the Data Challenge
1. http://zhishi.apexlab.org/
• features for predicting entity types
1. all distinct properties of entities in the dataset
2. semantic similarities between the entity and all labels (i.e., insect,
novel etc.)
3. a bag of Named Entities (NEs) created from all abstracts of entities in
the dataset
• feature selection
• in total, there were 1,888 features
• filter out irrelevant features using GainRatioAttributeEval method in
Weka1 (1,888  458 features)
• prediction strategy
• Random Forest as the classification method (using 100 trees)
4
Overall Approach
1. http://www.cs.waikato.ac.nz/ml/weka/
1. all distinct properties of entities in the dataset
• the value of the property is 1 if the entity has the property, 0 if not
1. semantic similarities between the entity and all labels (i.e.,
insect , novel etc.)
• RESIM(ei, ej)1: a measure for calculating the semantic similarity
between two entities in the context of a Linked Data graph
• |lu|: the total # of entities of label lu
5
Overall Approach - Features
1. Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes, Piao et al., JIST2015
3. a bag of Named Entities (NEs) created from all abstracts
of entities in the dataset
• entities appeared at the beginning of an abstract and appeared
frequently in the abstract can have higher weights.
6
Overall Approach - Features
abstract Stanford NER1
segmented NEs
• nei : a NE appeared more than 10 times
• pos(nei, a): the position of nei in a
• n: the # of NEs in a
1. http://nlp.stanford.edu/software/CRF-NER.shtml
• performance on the provided training set (10-fold cross-
validation)
Classifier Precision Recall F-score
Decision Tree 0.942 0.942 0.942
SVM 0.920 0.910 0.912
Random Forest 0.970 0.969 0.969
Stacking 0.949 0.948 0.948
7
Results
• performance on the provided test set (4th among 13 teams)
Team Precision Recall F-score
4PKUICL 0.985439 0.987461 0.986449
1CBrain 0.983633 0.985069 0.98435
3FRDC_ML 0.978096 0.977348 0.977722
6pgy 0.977442 0.977247 0.977344
JIST2015-data challenge

More Related Content

PDF
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
PPTX
Quality Metrics for Linked Open Data
PDF
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
PDF
Recommender Systems and Linked Open Data
PDF
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
PPTX
MUDROD - Ranking
PPTX
A Knowledge Discovery Framework for Planetary Defense
PPTX
Social Phrases Having Impact in Altmetrics - SOPHIA
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
Quality Metrics for Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
Recommender Systems and Linked Open Data
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
MUDROD - Ranking
A Knowledge Discovery Framework for Planetary Defense
Social Phrases Having Impact in Altmetrics - SOPHIA

What's hot (20)

PDF
Extracting and Making Use of Materials Data from Millions of Journal Articles...
PPT
Domain Ontology Usage Analysis Framework (OUSAF)
PPTX
Information retrieval 10 vector and probabilistic models
PDF
PhD Defense Slides
PPTX
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
PPTX
Naive Bayes | Statistics
PPTX
The Web of Data: do we actually understand what we built?
PDF
Linked Data Quality Assessment: A Survey
PPTX
Linked Data Quality Assessment – daQ and Luzzu
PPTX
Machine learning module 2
PPT
Real Time Competitive Marketing Intelligence
PPT
Recommendation and Information Retrieval: Two Sides of the Same Coin?
PPT
Mining Product Reputations On the Web
PDF
Machine Learning Introduction
PDF
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
PDF
Sybrandt Thesis Proposal Presentation
PPTX
Crystallization classification semisupervised
PDF
Mappings Validation
PPTX
From TREC to Watson: is open domain question answering a solved problem?
PPTX
Crowdsourcing Linked Data Quality Assessment
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Domain Ontology Usage Analysis Framework (OUSAF)
Information retrieval 10 vector and probabilistic models
PhD Defense Slides
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Naive Bayes | Statistics
The Web of Data: do we actually understand what we built?
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment – daQ and Luzzu
Machine learning module 2
Real Time Competitive Marketing Intelligence
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Mining Product Reputations On the Web
Machine Learning Introduction
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Sybrandt Thesis Proposal Presentation
Crystallization classification semisupervised
Mappings Validation
From TREC to Watson: is open domain question answering a solved problem?
Crowdsourcing Linked Data Quality Assessment
Ad

Similar to JIST2015-data challenge (20)

PPTX
Epistemic networks for Epistemic Commitments
PPTX
Semi-automated Exploration and Extraction of Data in Scientific Tables
PDF
Towards Incidental Collaboratories; Research Data Services
PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
PDF
Searching for Interestingness in Wikipedia and Yahoo! Answers
PPTX
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
PDF
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
PDF
GARNet workshop on Integrating Large Data into Plant Science
PDF
Metadata Analyser: measuring metadata quality
PDF
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
PPTX
Semantic Technologies for Big Sciences including Astrophysics
PDF
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
PDF
The Genopolis Microarray database
DOCX
01 Nature of Biology
PDF
AELA: An Adaptive Entity Linking Approach
PPTX
Studying archives of online behavior
PPTX
Chapter-OBDD.pptx
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PDF
Unit 1 Information Storage and Retrieval
Epistemic networks for Epistemic Commitments
Semi-automated Exploration and Extraction of Data in Scientific Tables
Towards Incidental Collaboratories; Research Data Services
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
Searching for Interestingness in Wikipedia and Yahoo! Answers
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
GARNet workshop on Integrating Large Data into Plant Science
Metadata Analyser: measuring metadata quality
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Semantic Technologies for Big Sciences including Astrophysics
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
The Genopolis Microarray database
01 Nature of Biology
AELA: An Adaptive Entity Linking Approach
Studying archives of online behavior
Chapter-OBDD.pptx
NE7012- SOCIAL NETWORK ANALYSIS
Unit 1 Information Storage and Retrieval
Ad

More from GUANGYUAN PIAO (18)

PDF
Env2Vec: Accelerating VNF Testing with Deep Learning
PDF
Domain-Aware Sentiment Classification with GRUs and CNNs
PDF
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
PDF
Retweet Prediction with Attention-based Deep Neural Network
PDF
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
PDF
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
PDF
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
PDF
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
PDF
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
PDF
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
PDF
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
PDF
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
PPTX
Analyzing User Modeling on Twitter for Personalized News Recommendations
PPT
RDFa Basics
PPTX
Owl 2.0 Overview
PPTX
OWL 2.0 Primer Part01
PPTX
OWL2.0 Primer Part02
PPTX
Hdd industry
Env2Vec: Accelerating VNF Testing with Deep Learning
Domain-Aware Sentiment Classification with GRUs and CNNs
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
Retweet Prediction with Attention-based Deep Neural Network
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
Analyzing User Modeling on Twitter for Personalized News Recommendations
RDFa Basics
Owl 2.0 Overview
OWL 2.0 Primer Part01
OWL2.0 Primer Part02
Hdd industry

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
composite construction of structures.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PPT on Performance Review to get promotions
PPTX
CH1 Production IntroductoryConcepts.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
composite construction of structures.pdf
Foundation to blockchain - A guide to Blockchain Tech
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components
PPT on Performance Review to get promotions
CH1 Production IntroductoryConcepts.pptx

JIST2015-data challenge

  • 1. An Ensemble Approach for Entity Type Prediction over Linked Data Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics @NUI Galway, Ireland Unit for Social Software The Data Challenge at 5th Joint International Semantic Technology Conference Yichang, China, 11/11/2015
  • 2. Contents • Introduction of the Data Challenge • Overall Approach • Results 2
  • 3. • the main task of the challenge is to predict labels of entities/resources in Zhishi.me1 • 1,897 entity URLs are provided and 1,397 of them are provided with label information. Information related to the entities: • abstracts of entities • infobox properties • external links • related pages • 13 participated teams 3 Introduction of the Data Challenge 1. http://zhishi.apexlab.org/
  • 4. • features for predicting entity types 1. all distinct properties of entities in the dataset 2. semantic similarities between the entity and all labels (i.e., insect, novel etc.) 3. a bag of Named Entities (NEs) created from all abstracts of entities in the dataset • feature selection • in total, there were 1,888 features • filter out irrelevant features using GainRatioAttributeEval method in Weka1 (1,888  458 features) • prediction strategy • Random Forest as the classification method (using 100 trees) 4 Overall Approach 1. http://www.cs.waikato.ac.nz/ml/weka/
  • 5. 1. all distinct properties of entities in the dataset • the value of the property is 1 if the entity has the property, 0 if not 1. semantic similarities between the entity and all labels (i.e., insect , novel etc.) • RESIM(ei, ej)1: a measure for calculating the semantic similarity between two entities in the context of a Linked Data graph • |lu|: the total # of entities of label lu 5 Overall Approach - Features 1. Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes, Piao et al., JIST2015
  • 6. 3. a bag of Named Entities (NEs) created from all abstracts of entities in the dataset • entities appeared at the beginning of an abstract and appeared frequently in the abstract can have higher weights. 6 Overall Approach - Features abstract Stanford NER1 segmented NEs • nei : a NE appeared more than 10 times • pos(nei, a): the position of nei in a • n: the # of NEs in a 1. http://nlp.stanford.edu/software/CRF-NER.shtml
  • 7. • performance on the provided training set (10-fold cross- validation) Classifier Precision Recall F-score Decision Tree 0.942 0.942 0.942 SVM 0.920 0.910 0.912 Random Forest 0.970 0.969 0.969 Stacking 0.949 0.948 0.948 7 Results • performance on the provided test set (4th among 13 teams) Team Precision Recall F-score 4PKUICL 0.985439 0.987461 0.986449 1CBrain 0.983633 0.985069 0.98435 3FRDC_ML 0.978096 0.977348 0.977722 6pgy 0.977442 0.977247 0.977344