SlideShare a Scribd company logo
EVALUATING NAMED ENTITY
RECOGNITION AND DISAMBIGUATION
IN NEWS AND TWEETS
Giuseppe Rizzo	

 	

 	

 Università degli studi di Torino	

Marieke van Erp 	

 	

 VU University Amsterdam	

Raphaël Troncy
	

 	

 EURECOM
EVALUATING NER & NED
•

NER typically an NLP task (MUC, CoNLL, ACE)	


•

NED took flight with availability of large structured
resources (Wikipedia, DBpedia, Freebase)	


•

Tools for NER and NED have started popping up outside
regular research outlets (TextRazor, DBpedia Spotlight,
AlchemyAPI)	

•

Unclear how well these tools perform
THIS WORK
•

Evaluation & comparison of 10 out-of-the-box
NER and NED tools through NERD API as well as
a combination of the tools in NERD-ML	


•

Two types of data: Newswire & Tweets
•

http://nerd.eurecom.fr 	


•

Ontology, REST API & UI 	


•

Uniform access to 12 different extractors/linkers:
AlchemyAPI, DBpedia Spotlight, Extractiv, Lupedia,
OpenCalais, Saplo, SemiTags, TextRazor, THD, Wikimeta,
Yahoo! Content Analysis, Zemanta
NERD-ML
•

The aim of NERD-ML is to combine the
knowledge of the different extractors into a better
named entity recogniser	


•

Uses NERD predictions, Stanford NER & extra
features 	


•

Naive Bayes, k-NN, SMO
DATA
•

CoNLL 2003 English NER with AIDA CoNLL-YAGO
links to Wikipedia (5,648 NEs/4,485 links in test set)	


•

Making Sense of Microposts 2013 (MSM’13) for
NER on Twitter domain + 62 randomly selected
tweets from Ritter et al.’s corpus with links to
DBpedia resources (MSM: 1,538 NEs/Ritter: 177
links in test set)
PER

LOC

ORG

Precision
Precision

MISC

OVERALL

100
90
80
70
60
50
40
30
20
10
0

0

10

20

30

40

50

60

70

80

90

100

RESULTS NER NEWSWIRE

PER

LOC

ORG
Recall

Recall

MISC

OVERALL

PER

LOC

ORG

MISC

OVERALL

F1

F1

AlchemyAPI
DBpedia Spotlight
Extractiv
Lupedia
OpenCalais
Saplo
Textrazor
Yahoo
Wikimeta
Zemanta
Stanford NER
NERD-ML Run01
NERD-ML Run02
NERD-ML Run03
Upper Limit
PER

LOC

ORG
Precision

Precision

MISC

OVERALL

100
0

10

20

30

40

50

60

70

80

90

100
90
80
70
60
50
40
30
20
10
0

0

10

20

30

40

50

60

70

80

90

100

RESULTS NER MSM

PER

LOC

ORG
Precision

Recall

MISC

OVERALL

PER

LOC

ORG

MISC

OVERALL

Precision

F1

AlchemyAPI
DBpedia Spotlight
Extractiv
Lupedia
OpenCalais
Saplo
Textrazor
Wikimeta
Zemanta
Ritter et al.
Stanford NER
NERD-ML Run01
NERD-ML Run02
NERD-ML Run03
Upper Limit
RESULTS NED
DBpedia
AlchemyAPI
Extractiv Lupedia Textrazor Yahoo Zemanta
Spotlight
AIDAYAGO

70.63	


26.93

51.31

57.98

49.21

0.0

35.58

TWEETS

53.85

25.13

74.07

65.38

58.14

76.00

48.57
DISCUSSION
•

Still a ways to go, but for certain classes NER is
getting close to really good results	


•

MISC class is (and probably always will be?) hard	


•

Bigger datasets needed (for tweets and NED)	


•

NED task can use standardisation
THANK YOU FOR LISTENING

•

Try out our code at:
https://github.com/giusepperizzo/nerdml
ACKNOWLEDGEMENTS
This research is funded through the LinkedTV
and NewsReader projects, both funded by the
European Union’s 7th Framework Programme
grants GA 287911 and ICT-316404).

More Related Content

PDF
Sharing massive data analysis: from provenance to linked experiment reports
PPT
Ngsp
PDF
ICAR 2015 Workshop - Agnes Chan
PDF
Scaling out federated queries for Life Sciences Data In Production
PDF
The Galaxy bioinformatics workflow environment
PDF
Big data from the LHC commissioning: practical lessons from big science - Sim...
PPTX
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
PDF
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Sharing massive data analysis: from provenance to linked experiment reports
Ngsp
ICAR 2015 Workshop - Agnes Chan
Scaling out federated queries for Life Sciences Data In Production
The Galaxy bioinformatics workflow environment
Big data from the LHC commissioning: practical lessons from big science - Sim...
Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Ni...
Assessing Galaxy's ability to express scientific workflows in bioinformatics

What's hot (17)

PPTX
PPTX
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
PPTX
Micropublication WormBase Workshop International Worm Meeting 2015
PPTX
Andrzej bialecki lr-2013-dublin
PPT
eScience Resources for the Chemistry Community from the Royal Society of Chem...
PPTX
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
PPTX
UKCORR members day 2019: If you’ve got it, flaunt it: Repository improvements...
PDF
Adding Transparency and Automation into the Galaxy Tool Installation Process
PPT
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
PPTX
VariantSpark a library for genomics by Lynn Langit
PDF
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
PDF
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
PPT
Wis2011_presentation_Realtime_Events_on_LOD
PDF
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
PDF
Lightning fast genomics with Spark, Adam and Scala
PDF
Open science 2014
PPTX
RELIANCE ROHub hackathon
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Micropublication WormBase Workshop International Worm Meeting 2015
Andrzej bialecki lr-2013-dublin
eScience Resources for the Chemistry Community from the Royal Society of Chem...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
UKCORR members day 2019: If you’ve got it, flaunt it: Repository improvements...
Adding Transparency and Automation into the Galaxy Tool Installation Process
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
VariantSpark a library for genomics by Lynn Langit
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Wis2011_presentation_Realtime_Events_on_LOD
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
Lightning fast genomics with Spark, Adam and Scala
Open science 2014
RELIANCE ROHub hackathon
Ad

Viewers also liked (20)

PDF
Linked Data Fragments
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
PPT
Gathering Alternative Surface Forms for DBpedia Entities
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
PDF
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
PDF
DBpedia InsideOut
PPTX
NLP todo
PPTX
Federated SPARQL query processing over the Web of Data
PDF
LDQL: A Query Language for the Web of Linked Data
ODP
Fast Approximate A-box Consistency Checking using Machine Learning
PDF
Applying Linked Open Data to Public Procurement
PDF
Exploiting the query structure for efficient join ordering in SPARQL queries
ODP
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
PPTX
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
PDF
Unsupervised Extraction of Attributes and Their Values from Product Description
PPTX
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
PDF
RDF Tutorial - SPARQL 20091031
PDF
Querying Linked Data with SPARQL
PDF
The Future is Federated
Linked Data Fragments
DBpedia: A Public Data Infrastructure for the Web of Data
Gathering Alternative Surface Forms for DBpedia Entities
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
DBpedia InsideOut
NLP todo
Federated SPARQL query processing over the Web of Data
LDQL: A Query Language for the Web of Linked Data
Fast Approximate A-box Consistency Checking using Machine Learning
Applying Linked Open Data to Public Procurement
Exploiting the query structure for efficient join ordering in SPARQL queries
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Unsupervised Extraction of Attributes and Their Values from Product Description
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
RDF Tutorial - SPARQL 20091031
Querying Linked Data with SPARQL
The Future is Federated
Ad

Similar to Evaluating Named Entity Recognition and Disambiguation in News and Tweets (20)

PDF
NERD: an open source platform for extracting and disambiguating named entitie...
PDF
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
PDF
Perspectives on mining knowledge graphs from text
PPTX
Introduction to Named Entity Recognition
PDF
Big Data Analytics course: Named Entities and Deep Learning for NLP
PDF
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
PDF
Domain Specific Named Entity Recognition Using Supervised Approach
PDF
Babak Rasolzadeh: The importance of entities
PPTX
Named Entity Recognition - ACL 2011 Presentation
PDF
Named Entity Recognition using Bi-LSTM and Tenserflow Model
PPTX
JIST2015-data challenge
PDF
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
PPTX
Using Embeddings for Both Entity Recognition and Linking on Tweets
PPTX
asdrfasdfasdf
PDF
Ands 2020 - Disease Recognizer
PPTX
Named Entity RecognitionDatascience.pptx
PPTX
Building Named Entity Recognition Models Efficiently using NERDS
PDF
Named entity recognition using web document corpus
PDF
Inductive Entity Typing Alignment
PDF
Named Entity Recognition Using Web Document Corpus
NERD: an open source platform for extracting and disambiguating named entitie...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Perspectives on mining knowledge graphs from text
Introduction to Named Entity Recognition
Big Data Analytics course: Named Entities and Deep Learning for NLP
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Domain Specific Named Entity Recognition Using Supervised Approach
Babak Rasolzadeh: The importance of entities
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition using Bi-LSTM and Tenserflow Model
JIST2015-data challenge
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Using Embeddings for Both Entity Recognition and Linking on Tweets
asdrfasdfasdf
Ands 2020 - Disease Recognizer
Named Entity RecognitionDatascience.pptx
Building Named Entity Recognition Models Efficiently using NERDS
Named entity recognition using web document corpus
Inductive Entity Typing Alignment
Named Entity Recognition Using Web Document Corpus

More from Marieke van Erp (20)

PDF
Towards Culturally Aware AI Systems - TSDH Symposium
PDF
A Polyvocal and Contextualised Semantic Web
PDF
AI x Digital Humanities = > Inclusiviteit
PDF
Computationally Tracing Concepts Through Time and Space
PDF
The Hitchhiker's Guide to the Future of Digital Humanities
PDF
Why language technology can’t handle Game of Thrones (yet)
PDF
(Beyond) Combining Text and Tables for qualitative and quantitative research
PDF
Finding common ground between text, maps, and tables for quantitative and qua...
PDF
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
PDF
Good Lynx, bad Lynx: Document enrichment for historical ecologists
PDF
Towards Semantic Enrichment of Newspapers: a historical ecology use case
PDF
Natural Language Processing en Named Entity Recognition
PDF
HuC lecture - Digital and Humanities: Continuing the Conversation
PDF
Multilingual Fine-grained Entity Typing
PDF
Entity Typing Using Distributional Semantics and DBpedia
PDF
Entity Typing and Event Extraction
PDF
The domain as unifier, how focusing on social history can bring technical fie...
PDF
Evaluating entity linking an analysis of current benchmark datasets and a ro...
PDF
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
PDF
Orientation EBC 2013: Digitising Natural History
Towards Culturally Aware AI Systems - TSDH Symposium
A Polyvocal and Contextualised Semantic Web
AI x Digital Humanities = > Inclusiviteit
Computationally Tracing Concepts Through Time and Space
The Hitchhiker's Guide to the Future of Digital Humanities
Why language technology can’t handle Game of Thrones (yet)
(Beyond) Combining Text and Tables for qualitative and quantitative research
Finding common ground between text, maps, and tables for quantitative and qua...
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Natural Language Processing en Named Entity Recognition
HuC lecture - Digital and Humanities: Continuing the Conversation
Multilingual Fine-grained Entity Typing
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing and Event Extraction
The domain as unifier, how focusing on social history can bring technical fie...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Orientation EBC 2013: Digitising Natural History

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
KodekX | Application Modernization Development
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
KodekX | Application Modernization Development
sap open course for s4hana steps from ECC to s4
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx

Evaluating Named Entity Recognition and Disambiguation in News and Tweets