SlideShare a Scribd company logo
Using Embeddings for Both
Entity Recognition and Linking
in Tweets
Giuseppe Attardi, Daniele
Sartiano, Maria Simi, Irene
Sucameli
Dipartimento di Informatica
Università di Pisa
Università di Pisa
Task Description
 Annotate named entity mentions in tweets
and link them to corresponding DBpedia entry
 Training set provided by the organizers
consists of just 1629 tweets
Approach
 Two stages:
1. NER
2. Entity linker
 NER requires more training data
 Added to training set 6439 tweets from the
PoSTWITA task
 Trained NER applied to set of 7100 tweets,
then manually corrected
 Final NER training set 13,945 tweets
Approach
1. Train word embeddings on a large corpus of
Italian tweets
2. Train a bidirectional LSTM character-level
Named Entity tagger, using the pre-trained
word embeddings
3. Build a dictionary mapping Italian titles to
English Dbpedia title , e.g.
Milano (http://dbpedia.org/resource/Milan,
Location)
4. Map anchor texts from Wikipedia to the above
titles:
Person Cristoforo_Colombo Colombo
Approach 2
5. Create word embeddings from the Italian
Wikipedia
6. For each page abstract compute the average of
the word embeddings of its tokens and map it to
its URL
7. Perform Named Entity tagging on the test set
8. For each extracted entity, compute the average of
the word embeddings for a context of words
around the entity.
9. Annotate the mention with the DBpedia entity
which is closest to those abstracts
10. For the Twitter mentions, use the Twitter API to
obtain the real name and set the category to
Person if present in a gazetteer of names.
Note
 The last step is somewhat in contrast with the
task guidelines
Bi-LSTM Character-level NER
 Character-level features are learned
 No hand-engineered features like prefix and
suffix
rMari
o
rMari rMar rMa
lM lMa lMar lMari
M a r i
rMario eMario lMario
rM
lMari
o
NER Tagger
B-
PER
I-
PER
O
B-
LOC
c1 c2 c3 c4
r1 r2 r3 r4
l1 l2 l3 l4
Mario Monti a Roma
CRF
Layer
Bi LSTM
Encoder
Word
Embeddings
NER Accuracy (dev set)
Category Precision Recall F1
Character 50.00 16.67 25.00
Event 92.48 87.45 89.89
Location 77.51 75.00 76.24
Organization 88.30 78.13 82.91
Person 73.71 88.26 88.33
Product 65.48 60.77 63.04
Thing 50.00 36.84 42.42
Official results
Run
Mention
ceaf
Strong
typed
mention
match
Strong
link
match
Final
score
UniPI.3 0.561 0.474 0.456 0.5034
UniPI.1 0.561 0.466 0.443 0.4971
Team2.base 0.530 0.472 0.477 0.4967
UniPI.2 0.561 0.463 0.443 0.4962
UniPI.3 without
mention check
0.616 0.531 0.451 0.541
Discussion
 Effectiveness of embeddings in disambiguation
 See improvement in the strong link match
score from run UniPI.2 to UniPI.3
Liverpool_F.C. vs Liverpool
Italy_national_football_team vs Italy
S.S._Lazio vs Lazio
Diego_Della_Valle vs Pietro_Della_Valle
Nobel_Prize vs Alfred_Nobel
 Bad cases:
Maria_II_of_Portugal for Maria
Luke_the_Evangelist for Luca
 Tesla GPU granted by NVIDIA
Conclusions
 Deep Learning approach
 Char embedding helpful for dealing with noise
in tweets in NER
 Word emeddings used for semantic
relatedness in entity linking
 Side product: a new gold resource of
13,609 tweets (242,453 tokens) annotated
with NE categories, leveraging the resource
from the Evalita PoSTWITA task

More Related Content

PPTX
CNN for Sentiment Analysis on Italian Tweets
PPTX
Vectorland: Brief Notes from Using Text Embeddings for Search
PDF
word embeddings and applications to machine translation and sentiment analysis
PPTX
Using Text Embeddings for Information Retrieval
PDF
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
PPTX
A Simple Introduction to Word Embeddings
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
PDF
Word Embeddings, why the hype ?
CNN for Sentiment Analysis on Italian Tweets
Vectorland: Brief Notes from Using Text Embeddings for Search
word embeddings and applications to machine translation and sentiment analysis
Using Text Embeddings for Information Retrieval
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
A Simple Introduction to Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Word Embeddings, why the hype ?

Similar to Using Embeddings for Both Entity Recognition and Linking on Tweets (20)

PPTX
MVPCONF Latam 2019
PDF
Android training in Nagpur
PDF
Android classes-in-pune-syllabus
PDF
Pointers In C
PDF
Pointers In C
PDF
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
PDF
Pointers c imp
PDF
Pointers
PDF
running stable diffusion on android
PDF
Object Oriented Programming using C++ PCIT102.pdf
PPTX
ICPC 2012 - Mining Source Code Descriptions
PDF
6 Months Net
PDF
SE-IT JAVA LAB SYLLABUS
PDF
PDF
Python Programming A Modular Approach Taneja Sheetal Kumar Naveen
PDF
Devry cis 170 c i lab 5 of 7 arrays and strings
PDF
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
PPTX
6 assembly language computer organization
DOC
Cis 170 c ilab 5 of 7 arrays and strings
MVPCONF Latam 2019
Android training in Nagpur
Android classes-in-pune-syllabus
Pointers In C
Pointers In C
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pointers c imp
Pointers
running stable diffusion on android
Object Oriented Programming using C++ PCIT102.pdf
ICPC 2012 - Mining Source Code Descriptions
6 Months Net
SE-IT JAVA LAB SYLLABUS
Python Programming A Modular Approach Taneja Sheetal Kumar Naveen
Devry cis 170 c i lab 5 of 7 arrays and strings
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
6 assembly language computer organization
Cis 170 c ilab 5 of 7 arrays and strings
Ad

Recently uploaded (20)

PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Computer network topology notes for revision
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
annual-report-2024-2025 original latest.
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
IB Computer Science - Internal Assessment.pptx
.pdf is not working space design for the following data for the following dat...
Qualitative Qantitative and Mixed Methods.pptx
Mega Projects Data Mega Projects Data
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
annual-report-2024-2025 original latest.
Ad

Using Embeddings for Both Entity Recognition and Linking on Tweets

  • 1. Using Embeddings for Both Entity Recognition and Linking in Tweets Giuseppe Attardi, Daniele Sartiano, Maria Simi, Irene Sucameli Dipartimento di Informatica Università di Pisa Università di Pisa
  • 2. Task Description  Annotate named entity mentions in tweets and link them to corresponding DBpedia entry  Training set provided by the organizers consists of just 1629 tweets
  • 3. Approach  Two stages: 1. NER 2. Entity linker  NER requires more training data  Added to training set 6439 tweets from the PoSTWITA task  Trained NER applied to set of 7100 tweets, then manually corrected  Final NER training set 13,945 tweets
  • 4. Approach 1. Train word embeddings on a large corpus of Italian tweets 2. Train a bidirectional LSTM character-level Named Entity tagger, using the pre-trained word embeddings 3. Build a dictionary mapping Italian titles to English Dbpedia title , e.g. Milano (http://dbpedia.org/resource/Milan, Location) 4. Map anchor texts from Wikipedia to the above titles: Person Cristoforo_Colombo Colombo
  • 5. Approach 2 5. Create word embeddings from the Italian Wikipedia 6. For each page abstract compute the average of the word embeddings of its tokens and map it to its URL 7. Perform Named Entity tagging on the test set 8. For each extracted entity, compute the average of the word embeddings for a context of words around the entity. 9. Annotate the mention with the DBpedia entity which is closest to those abstracts 10. For the Twitter mentions, use the Twitter API to obtain the real name and set the category to Person if present in a gazetteer of names.
  • 6. Note  The last step is somewhat in contrast with the task guidelines
  • 7. Bi-LSTM Character-level NER  Character-level features are learned  No hand-engineered features like prefix and suffix rMari o rMari rMar rMa lM lMa lMar lMari M a r i rMario eMario lMario rM lMari o
  • 8. NER Tagger B- PER I- PER O B- LOC c1 c2 c3 c4 r1 r2 r3 r4 l1 l2 l3 l4 Mario Monti a Roma CRF Layer Bi LSTM Encoder Word Embeddings
  • 9. NER Accuracy (dev set) Category Precision Recall F1 Character 50.00 16.67 25.00 Event 92.48 87.45 89.89 Location 77.51 75.00 76.24 Organization 88.30 78.13 82.91 Person 73.71 88.26 88.33 Product 65.48 60.77 63.04 Thing 50.00 36.84 42.42
  • 10. Official results Run Mention ceaf Strong typed mention match Strong link match Final score UniPI.3 0.561 0.474 0.456 0.5034 UniPI.1 0.561 0.466 0.443 0.4971 Team2.base 0.530 0.472 0.477 0.4967 UniPI.2 0.561 0.463 0.443 0.4962 UniPI.3 without mention check 0.616 0.531 0.451 0.541
  • 11. Discussion  Effectiveness of embeddings in disambiguation  See improvement in the strong link match score from run UniPI.2 to UniPI.3 Liverpool_F.C. vs Liverpool Italy_national_football_team vs Italy S.S._Lazio vs Lazio Diego_Della_Valle vs Pietro_Della_Valle Nobel_Prize vs Alfred_Nobel  Bad cases: Maria_II_of_Portugal for Maria Luke_the_Evangelist for Luca  Tesla GPU granted by NVIDIA
  • 12. Conclusions  Deep Learning approach  Char embedding helpful for dealing with noise in tweets in NER  Word emeddings used for semantic relatedness in entity linking  Side product: a new gold resource of 13,609 tweets (242,453 tokens) annotated with NE categories, leveraging the resource from the Evalita PoSTWITA task