SlideShare a Scribd company logo
Alberto Parravicini
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Fast Entity
Linking via Graph
Embeddings
Alberto Parravicini 23/05/2019
Entity Linking (EL): connecting words of
interest to unique identities (e.g. Wikipedia
Page)
Entity Linking
2
Alberto Parravicini 23/05/2019
Component of applications that require
high-level representations of text:
1. Search Engines, for semantic search
2. Recommender Systems, to retrieve documents
similar to each other
3. Chat bots, to understand intents and entities
Use Cases
3
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
1. Named Entity Recognition (NER): spot
mentions (a.k.a. Named Entities)
● High-accuracy in the state-of-the-art[1]
The EL Pipeline (1/2)
[1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging."
4
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
5
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
6
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
7
Alberto Parravicini 23/05/2019
● Novel unsupervised framework for EL
● No dependency on NLP
● First EL algorithm to use graph embeddings
● Accuracy similar to supervised SoA techniques
● Highly scalable and real-time execution time
● < 1 sec to process text with 30+ mentions
Our contributions
8
Alberto Parravicini 23/05/2019
Our EL Pipeline
Input text
with NER
Graph Building Embeddings Candidate Finder
Disambiguation
Graph Learning
Execution
Output
1 2 3 4
9
Alberto Parravicini 23/05/2019
We obtain a large graph from DBpedia
● All the information of Wikipedia, stored as triples
● 12M entities, 170M links
Step 1/4
Graph Creation
10
Alberto Parravicini 23/05/2019
From DBPedia, we build two graphs
Redirects GraphProperty Graph
Step 1/4
Graph Creation
11
Alberto Parravicini 23/05/2019
● Graph embeddings encode vertices as vectors
● “Similar” vertices have “similar” embeddings
● Idea: entities with the same context should have
low embedding distance
Embeddings Creation (1/2)
Step 2/4
12
Alberto Parravicini 23/05/2019
● In our work, we use DeepWalk[1]
● Like word2vec[2]
, it leverages random walks (i.e.
vertex sequences) to create embeddings
● Embedding size 170, walk length 8
● DeepWalk uses only the graph topology
● Simple baseline, we can use better algorithms
and leverage graph features
Embeddings Creation (2/2)
[1] Bryan Perozzi et al.2014. Deepwalk
[2] Tomas Mikolov et al. 2013. Distributed
representations of words and phrases and
their compositionality
Step 2/4
13
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
Step 3/4
14
Alberto Parravicini 23/05/2019
Candidates Finder
Step 3/4
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
15
Alberto Parravicini 23/05/2019
Disambiguation (1/3)
●We want to pick the “best” candidate for
each mention
● In a “good” solution, candidates are related to
each other (e.g. Donald Trump, Hillary Clinton)
●Observation: a good tuple of candidates has
embeddings close to each other
●Evaluating all combinations
is infeasible
● 10 mentions with 100 candidates 10010
Step 4/4
16
Alberto Parravicini 23/05/2019
17
●We use an heuristic state-space search
algorithm to maximize:
Sum of string
similarities
Step 4/4
Sum of embedding
cosine similarities
w.r.t. tuple mean
Disambiguation (2/3)
Alberto Parravicini 23/05/2019
● Iterative state-space heuristic
Greedy
iterative
procedure
Step 4/4
18
Disambiguation (3/3)
Alberto Parravicini 23/05/2019
● We compared against 6 SoA EL algorithms, on 5
datasets
● Our Micro-averaged F1 score is comparable with
SoA supervised algorithms
Results: accuracy
19
Alberto Parravicini 23/05/2019
● Different settings enable real-time EL, with
minimal loss in accuracy
● E.g. number of iterations, early stop
Results: exec. time
20
Alberto Parravicini 23/05/2019
● Execution time is well divided between
Candidate Finder and Disambiguation
Results: exec. time of
single steps
21
Thank you!
Fast Entity Linking via Graph Embeddings
● Novel unsupervised framework for EL
● First EL algorithm to use graph embeddings
○ Accuracy similar to supervised SoA techniques
● Real-time execution time
Alberto Parravicini, alberto.parravicini@polimi.it
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Alberto Parravicini 23/05/2019
●Turn topology and properties of each vertex
into a vector
●“Similar” vertices have “similar” embeddings
Embeddings
Step 2/4
embedding
properties
topology
Hamilton, William L., Rex Ying, and Jure Leskovec.
"Representation learning on graphs: Methods and
applications." arXiv preprint arXiv:1709.05584 (2017).
Alberto Parravicini 23/05/2019
… and join them together
Step 1/4
Graph Creation
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a small number of
candidate vertices with string similarity
● We use a simple index-based string search
● Fuzzy matching with 2-grams and 3-grams
● This provide a simple baseline (60-70%
accuracy)
Step 3/4

More Related Content

PPTX
Parallel processing -open mp
PPT
Collatz Conjecture Research
PDF
Resume
PPTX
Functional programming (Let's fall back in love with Programming)
PPT
Script Identification Using MATLAB
PDF
Resume 20170109
PPTX
Introduction to R
Parallel processing -open mp
Collatz Conjecture Research
Resume
Functional programming (Let's fall back in love with Programming)
Script Identification Using MATLAB
Resume 20170109
Introduction to R

What's hot (13)

PPTX
Configuration of classes
PDF
PPTX
Rust presentation convergeconf
PPT
Media4Math's Algebra Series
PPT
Information Flow based Ontology Mapping - 2002
PDF
A Data Science Tutorial in Python
PPTX
Knowledge Graphs and Milestone
PDF
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
PPTX
Fast Python: Master the Basics to Write Faster Code
PDF
brf to mathml
PDF
Intro to Elixir
PDF
Integrate with Tracing and Logging
PPTX
Unit 1 polynomial manipulation
Configuration of classes
Rust presentation convergeconf
Media4Math's Algebra Series
Information Flow based Ontology Mapping - 2002
A Data Science Tutorial in Python
Knowledge Graphs and Milestone
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
Fast Python: Master the Basics to Write Faster Code
brf to mathml
Intro to Elixir
Integrate with Tracing and Logging
Unit 1 polynomial manipulation
Ad

Similar to Fast and Accurate Entity Linking via Graph Embedding (20)

PDF
Gospel - High-performance heterogeneous architectures for graph analytics
PDF
IRJET- Python Based Machine Learning for Profile Matching
PDF
Pattern-based Definition and Generation of Components for a Synchronous React...
PDF
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
PDF
Colombo+ronzoni+fontana
PDF
Master in Big Data Analytics and Social Mining 20015
PDF
Vol 16 No 2 - July-December 2016
PDF
CIS 5 Project.pdf
PDF
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
PPTX
CML's Presentation at FengChia University
PDF
IRJET- Semantics based Document Clustering
PDF
Partial Object Detection in Inclined Weather Conditions
PDF
Locloud - D2.6: Crawler ready tagging tools
PDF
IRJET - Event Notifier on Scraped Mails using NLP
PDF
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
PDF
Towards a Resource Slice Interoperability Hub for IoT
PDF
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
PDF
Comparing the performance of a business process: using Excel & Python
PDF
Visual Network Narrations
PDF
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
Gospel - High-performance heterogeneous architectures for graph analytics
IRJET- Python Based Machine Learning for Profile Matching
Pattern-based Definition and Generation of Components for a Synchronous React...
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
Colombo+ronzoni+fontana
Master in Big Data Analytics and Social Mining 20015
Vol 16 No 2 - July-December 2016
CIS 5 Project.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
CML's Presentation at FengChia University
IRJET- Semantics based Document Clustering
Partial Object Detection in Inclined Weather Conditions
Locloud - D2.6: Crawler ready tagging tools
IRJET - Event Notifier on Scraped Mails using NLP
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards a Resource Slice Interoperability Hub for IoT
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
Comparing the performance of a business process: using Excel & Python
Visual Network Narrations
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTCamp101@2020.10.1
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTCamp101@2020.10.1
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPT
Mechanical Engineering MATERIALS Selection
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Well-logging-methods_new................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPT
Project quality management in manufacturing
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Artificial Intelligence
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
web development for engineering and engineering
PPT
introduction to datamining and warehousing
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT on Performance Review to get promotions
Mechanical Engineering MATERIALS Selection
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Internet of Things (IOT) - A guide to understanding
UNIT 4 Total Quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Well-logging-methods_new................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Project quality management in manufacturing
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Artificial Intelligence
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
web development for engineering and engineering
introduction to datamining and warehousing
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

Fast and Accurate Entity Linking via Graph Embedding

  • 1. Alberto Parravicini Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX Fast Entity Linking via Graph Embeddings
  • 2. Alberto Parravicini 23/05/2019 Entity Linking (EL): connecting words of interest to unique identities (e.g. Wikipedia Page) Entity Linking 2
  • 3. Alberto Parravicini 23/05/2019 Component of applications that require high-level representations of text: 1. Search Engines, for semantic search 2. Recommender Systems, to retrieve documents similar to each other 3. Chat bots, to understand intents and entities Use Cases 3
  • 4. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 1. Named Entity Recognition (NER): spot mentions (a.k.a. Named Entities) ● High-accuracy in the state-of-the-art[1] The EL Pipeline (1/2) [1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." 4
  • 5. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 5
  • 6. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 6
  • 7. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 7
  • 8. Alberto Parravicini 23/05/2019 ● Novel unsupervised framework for EL ● No dependency on NLP ● First EL algorithm to use graph embeddings ● Accuracy similar to supervised SoA techniques ● Highly scalable and real-time execution time ● < 1 sec to process text with 30+ mentions Our contributions 8
  • 9. Alberto Parravicini 23/05/2019 Our EL Pipeline Input text with NER Graph Building Embeddings Candidate Finder Disambiguation Graph Learning Execution Output 1 2 3 4 9
  • 10. Alberto Parravicini 23/05/2019 We obtain a large graph from DBpedia ● All the information of Wikipedia, stored as triples ● 12M entities, 170M links Step 1/4 Graph Creation 10
  • 11. Alberto Parravicini 23/05/2019 From DBPedia, we build two graphs Redirects GraphProperty Graph Step 1/4 Graph Creation 11
  • 12. Alberto Parravicini 23/05/2019 ● Graph embeddings encode vertices as vectors ● “Similar” vertices have “similar” embeddings ● Idea: entities with the same context should have low embedding distance Embeddings Creation (1/2) Step 2/4 12
  • 13. Alberto Parravicini 23/05/2019 ● In our work, we use DeepWalk[1] ● Like word2vec[2] , it leverages random walks (i.e. vertex sequences) to create embeddings ● Embedding size 170, walk length 8 ● DeepWalk uses only the graph topology ● Simple baseline, we can use better algorithms and leverage graph features Embeddings Creation (2/2) [1] Bryan Perozzi et al.2014. Deepwalk [2] Tomas Mikolov et al. 2013. Distributed representations of words and phrases and their compositionality Step 2/4 13
  • 14. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges Step 3/4 14
  • 15. Alberto Parravicini 23/05/2019 Candidates Finder Step 3/4 Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges 15
  • 16. Alberto Parravicini 23/05/2019 Disambiguation (1/3) ●We want to pick the “best” candidate for each mention ● In a “good” solution, candidates are related to each other (e.g. Donald Trump, Hillary Clinton) ●Observation: a good tuple of candidates has embeddings close to each other ●Evaluating all combinations is infeasible ● 10 mentions with 100 candidates 10010 Step 4/4 16
  • 17. Alberto Parravicini 23/05/2019 17 ●We use an heuristic state-space search algorithm to maximize: Sum of string similarities Step 4/4 Sum of embedding cosine similarities w.r.t. tuple mean Disambiguation (2/3)
  • 18. Alberto Parravicini 23/05/2019 ● Iterative state-space heuristic Greedy iterative procedure Step 4/4 18 Disambiguation (3/3)
  • 19. Alberto Parravicini 23/05/2019 ● We compared against 6 SoA EL algorithms, on 5 datasets ● Our Micro-averaged F1 score is comparable with SoA supervised algorithms Results: accuracy 19
  • 20. Alberto Parravicini 23/05/2019 ● Different settings enable real-time EL, with minimal loss in accuracy ● E.g. number of iterations, early stop Results: exec. time 20
  • 21. Alberto Parravicini 23/05/2019 ● Execution time is well divided between Candidate Finder and Disambiguation Results: exec. time of single steps 21
  • 22. Thank you! Fast Entity Linking via Graph Embeddings ● Novel unsupervised framework for EL ● First EL algorithm to use graph embeddings ○ Accuracy similar to supervised SoA techniques ● Real-time execution time Alberto Parravicini, alberto.parravicini@polimi.it Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX
  • 23. Alberto Parravicini 23/05/2019 ●Turn topology and properties of each vertex into a vector ●“Similar” vertices have “similar” embeddings Embeddings Step 2/4 embedding properties topology Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
  • 24. Alberto Parravicini 23/05/2019 … and join them together Step 1/4 Graph Creation
  • 25. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a small number of candidate vertices with string similarity ● We use a simple index-based string search ● Fuzzy matching with 2-grams and 3-grams ● This provide a simple baseline (60-70% accuracy) Step 3/4