Using Embeddings for Both Entity Recognition and Linking on Tweets

Using Embeddings for Both
Entity Recognition and Linking
in Tweets
Giuseppe Attardi, Daniele
Sartiano, Maria Simi, Irene
Sucameli
Dipartimento di Informatica
Università di Pisa
Università di Pisa

Task Description
 Annotate named entity mentions in tweets
and link them to corresponding DBpedia entry
 Training set provided by the organizers
consists of just 1629 tweets

Approach
 Two stages:
1. NER
2. Entity linker
 NER requires more training data
 Added to training set 6439 tweets from the
PoSTWITA task
 Trained NER applied to set of 7100 tweets,
then manually corrected
 Final NER training set 13,945 tweets

Approach
1. Train word embeddings on a large corpus of
Italian tweets
2. Train a bidirectional LSTM character-level
Named Entity tagger, using the pre-trained
word embeddings
3. Build a dictionary mapping Italian titles to
English Dbpedia title , e.g.
Milano (http://dbpedia.org/resource/Milan,
Location)
4. Map anchor texts from Wikipedia to the above
titles:
Person Cristoforo_Colombo Colombo

Approach 2
5. Create word embeddings from the Italian
Wikipedia
6. For each page abstract compute the average of
the word embeddings of its tokens and map it to
its URL
7. Perform Named Entity tagging on the test set
8. For each extracted entity, compute the average of
the word embeddings for a context of words
around the entity.
9. Annotate the mention with the DBpedia entity
which is closest to those abstracts
10. For the Twitter mentions, use the Twitter API to
obtain the real name and set the category to
Person if present in a gazetteer of names.

Note
 The last step is somewhat in contrast with the
task guidelines

Bi-LSTM Character-level NER
 Character-level features are learned
 No hand-engineered features like prefix and
suffix
rMari
o
rMari rMar rMa
lM lMa lMar lMari
M a r i
rMario eMario lMario
rM
lMari
o

NER Tagger
B-
PER
I-
PER
O
B-
LOC
c1 c2 c3 c4
r1 r2 r3 r4
l1 l2 l3 l4
Mario Monti a Roma
CRF
Layer
Bi LSTM
Encoder
Word
Embeddings

NER Accuracy (dev set)
Category Precision Recall F1
Character 50.00 16.67 25.00
Event 92.48 87.45 89.89
Location 77.51 75.00 76.24
Organization 88.30 78.13 82.91
Person 73.71 88.26 88.33
Product 65.48 60.77 63.04
Thing 50.00 36.84 42.42

Official results
Run
Mention
ceaf
Strong
typed
mention
match
Strong
link
match
Final
score
UniPI.3 0.561 0.474 0.456 0.5034
UniPI.1 0.561 0.466 0.443 0.4971
Team2.base 0.530 0.472 0.477 0.4967
UniPI.2 0.561 0.463 0.443 0.4962
UniPI.3 without
mention check
0.616 0.531 0.451 0.541

Discussion
 Effectiveness of embeddings in disambiguation
 See improvement in the strong link match
score from run UniPI.2 to UniPI.3
Liverpool_F.C. vs Liverpool
Italy_national_football_team vs Italy
S.S._Lazio vs Lazio
Diego_Della_Valle vs Pietro_Della_Valle
Nobel_Prize vs Alfred_Nobel
 Bad cases:
Maria_II_of_Portugal for Maria
Luke_the_Evangelist for Luca
 Tesla GPU granted by NVIDIA

Conclusions
 Deep Learning approach
 Char embedding helpful for dealing with noise
in tweets in NER
 Word emeddings used for semantic
relatedness in entity linking
 Side product: a new gold resource of
13,609 tweets (242,453 tokens) annotated
with NE categories, leveraging the resource
from the Evalita PoSTWITA task

Using Embeddings for Both Entity Recognition and Linking on Tweets

More Related Content

Similar to Using Embeddings for Both Entity Recognition and Linking on Tweets (20)

Recently uploaded (20)

Using Embeddings for Both Entity Recognition and Linking on Tweets