Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs

Actively Learning to Rank Semantic
Associations for Personalized
Contextual Exploration of
Knowledge Graphs
Federico Bianchi, Matteo Palmonari, Marco Cremaschi and Elisabetta Fersini
federico.bianchi@disco.unimib.it
ITIS Lab – Innovative Technologies for Interaction and Services
Dipartimento di Informatica, Sistemistica e Comunicazione
Università degli Studi di Milano-Bicocca
1-6-2017, Portorož, Slovenia

Outline
2
• Contextual Exploration of Knowledge Graphs
• Actively Learning to Rank Semantic Associations
• Experiments
• Conclusions and Future Work

Outline
3
• Experiments

Knowledge Graphs (KGs)
• Models used for knowledge
representation using graphs
• DBpedia, YAGO, Google KG, …
• Nodes represent real-world
entities
• Labelled edges represent
relations between them.
4
Bernie Sanders
Hillary Clinton
Democratic
Party
KGs may contain interesting relations for users

Relational Knowledge in KGs
and Semantic Associations
• KGs provide vast amount of
relational knowledge
• Semantic Associations (SAs)
• chains of relations between entities
• arbitrary length
• inverse properties included
5
Bernie Sanders
Democratic
Party
Hillary Clinton
party
party
Bernie_Sanders party > Democratic_Party < party Hilary_Clinton

Empowering Comprehension of Web Content
6
Who is Bernie Sanders? What is
his relation with Hillary Clinton?

7
Contextual Exploration of KGs
Support a user who is doing a familiar task, e.g., reading a news
article, to access content extracted from a KG, selected and pushed
to him in a proactive fashion.
Who is Bernie Sanders?
What is his relation with Hillary Clinton?

7
Contextual Exploration of KGs
Support a user who is doing a familiar task, e.g., reading a news
article, to access content extracted from a KG, selected and pushed
to him in a proactive fashion.
Bernie
Sanders
Democratic
Party
Hillary
Clinton
party
party

8
Contextual Exploration of DBpedia with DaCENA
www.dacena.org
(Palmonari&al, 2015)

Entity Extraction
9
Bernie Sanders has urged his supporters to
look beyond the Democratic presidential
nomination in a speech that stopped short of
fully endorsing Hillary Clinton but made
clear he was no longer actively challenging
her candidacy. In an anticlimatic speech that
signalled the effective end of a 14-month
campaign odyssey, the Vermont senator
insisted his “political revolution continues”
despite Clinton’s effective victory in the
delegate race.
Entities are extracted from text
Entity Extraction SAs Retrieval

Entity Extraction
9
delegate race.
Bernie Sanders, Hillary Clinton,
Democratic Party, Vermont…
delegate race.
Entities are extracted from text

Retrieval of Semantic Associations
10
SPARQL query to the DBpedia endpoint.
SAs between all the entities
• maximum number of hops = 2

10
Between ( Bernie Sanders and Hillary Clinton )

10
SPARQL

10
Bernie Sanders
Democratic
Party
party
SPARQL

10
Bernie Sanders
Democratic
Party
party
Hillary Clinton
party
SPARQL

Information Overload in Contextual KG
Exploration
11
Too many associations from even small pieces of
text
• E.g., 40107 associations from an article
with 942 words
• Not fit in a single screen
• Users can explore only a limited number of
associations (≤ 100)
Crucial issue for KG exploration:
Which are the most interesting to show to users?

Ranking SAs by Estimated Interest: Serendipity
12
Heuristic measure: try to find those associations that are relevant and may
be unexpected to users
Serendipity = relevance + unexpectedness
Serendipity(SA,Text) = α*relevance(SA,Text) + (1- α)*rarity(SA)

12
relevance (SA, Text) = cos(abstracts(SAs), Text)
- with TF-IDF weighting

12
(Aleman-Meza&al, 2005)relevance (SA, Text) = cos(abstracts(SAs), Text)
- with TF-IDF weighting

13

13
user can tune the weight assigned to relevance vs unexpectedness

Example of SAs ranked by Serendipity
14

Outline
16
• Experiments

Personalized Exploration of KGs
17
What if different users are interested in different SAs?
1. Learn a ranking function starting from explicit ratings given by the users
2. Ask users few ratings as possible (rating too many SAs can become a tedious task)
3. Speed up learning by sampling SAs that are estimated more infromative for training
the model
Definition of an active learning to rank model for personalized
contextual exploration of KGs

Active Learning to Rank for SAs
18

18
Ranking
Rank SVM algorithm:
• Derivated From SVM
(Support Vector Machine)
• Well known and widly
used in the literature

18
Ranking
Refine
Ranking?

18
Ranking
Refine
Ranking?
Final Ranking
no

18
Ranking
Refine
Ranking?
Active
Sampling
SAs
Two algorithm used to find
meaningful SAs:
• Pairwise Sampling (PS)
(Qian&al, 2013)
• AUC-Based Sampling (AS)
(Donmez&al, 2009)
Final Ranking
no
yes

18
Ranking
Refine
Ranking?
Active
Sampling
SAs
User
Rates
SAs
Final Ranking
no
yes

18
Ranking
Refine
Ranking?
Active
Sampling
SAs
User
Rates
SAs
Final Ranking
no
yes
But, ranking models need to
be initialized with ranked SAs
(cold-start problem)

18
Ranking
Refine
Ranking?
Active
Sampling
SAs
User
Rates
SAs
Final Ranking
no
yes
Bootstrapping

Clustering as Bootstrapping
19
• Use clustering algorithms on the
set of SAs
• For each cluster select the SA
that is closest to the cluster
average
• User rates all the SAs that
represent the clusters

Serendipity as Bootstrapping
20
• User rates top-k SAs ranked by Serendipity
• Users are able to see an ordered set of SAs since the beginning
• Users rates SAs that are estimated to be interesting for them

Serendipity vs Clustering
Clustering:
• PROS: selected SAs are representative of the vector space
• CONS: rated SAs might not be interesting for the user
Serendipity:
• PROS: rated SAs are estimated to be interesting for a generic user
• CONS: heuristic function, no representativeness
21

Example: Rating of Most Serendipitous SAs (#0)
22
Hillary
Clinton
New
York
Donald
Trump
Hillary
Clinton
Bill
Clinton
Democ.
Party
Donald
Trump
Indepen.
Politic.
United
State
Senate
region birthPlace
spouse party
party
political
party
Rating given by the
user
Ideal rating for the
user

Example: Rating of Most Serendipitous SAs (#0)
22
Hillary
Clinton
New
York
Donald
Trump
Hillary
Clinton
Bill
Clinton
Democ.
Party
Donald
Trump
Indepen.
Politic.
United
State
Senate
3
5
1
region birthPlace
spouse party
party
political
party
Rating given by the
user
user

Example: Ranking Learned with RankSVM (#0)
23
Bernie
Sanders
New
York
Donald
Trump
Hillary
Clinton
Repub.
Party
Donald
Trump
Donald
Trump
Repub.
Party
United
State
Senate
birthPlace birthPlace
other
party party
political
partyparty
Rating given by the
user
user

23
Bernie
Sanders
New
York
Donald
Trump
Hillary
Clinton
Repub.
Party
Donald
Trump
Donald
Trump
Repub.
Party
United
State
Senate
3
6
5
birthPlace birthPlace
other
party party
political
partyparty
Rating given by the
user
user

Example: Rating on Sampled SAs (#1)
24
Hillary
Clinton
Democ.
Party
Unites
State
Senate
Democ.
Party
Joe
Biden
Unites
State
Senate
political
party
leaderparty
party
Rating given by the
user
user

Example: Rating on Sampled SAs (#1)
24
Hillary
Clinton
Democ.
Party
Unites
State
Senate
Democ.
Party
Joe
Biden
Unites
State
Senate
5
1
political
party
leaderparty
party
Rating given by the
user
user

25
Hillary
Clinton
Repub.
Party
Donald
Trump
Donald
Trump
Democ.
Party
United
State
Senate
Donald
Trump
Repub.
Party
United
States
Senate
other
party party
political
party
political
partyparty
party
Rating given by the
user
user

25
Hillary
Clinton
Repub.
Party
Donald
Trump
Donald
Trump
Democ.
Party
United
State
Senate
Donald
Trump
Repub.
Party
United
States
Senate
6
6
5
other
party party
political
party
political
partyparty
party
Rating given by the
user
user

25
Hillary
Clinton
Repub.
Party
Donald
Trump
Donald
Trump
Democ.
Party
United
State
Senate
Donald
Trump
Repub.
Party
United
States
Senate
6
6
5
This SA was second in
the previous ranking
other
party party
political
party
political
partyparty
party
Rating given by the
user
user

Features for RankSVM
26
SAs are represented in the space using different features divided in three main categories:
• Topological Features:
PageRank on SAs (Page&al, 1999)
DBpedia PageRank (Thalhammer&al, 2016)
HITS (Kleinberg&al, 1999)
• Relevance Features:
Relevance (Palmonari&al, 2015),
Temporal Relevance (Bianchi&al, 2017)
• Predicate-Based Features:
Path Informativeness (Pirrò, 2015)
Path Pattern Informativeness (Pirrò, 2015)
Rarity (Aleman-Meza&al, 2005)

Outline
27
• Experiments

Experiments: Objectives
28
Validate personalization hypothesis
• Are different users interested in different SAs?
Evaluate the performance
• (Quick) improvement of the ranking quality with user
ratings with more iterations of feedback
• Comparison of different configurations and baseline
algorithms

Experiments: Settings
29
Gold standards: Ideal rankings collected by asking users to evaluate all
SAs extracted from articles or pieces of articles
Evaluation Settings:
Contextual Exploration: ratings on the whole dataset
Cross Validation: ratings on training data, ranking on test data
Measure: quality of generated rankings vs ideal rankings (nDCG)

Experiments: Data
30
Two different datasets:
LAFU (Large Articles, Few Users)
Complete articles (New York Times)
3 articles, 2 user => 3 ideal ranking
Average number of SAs for article => 2600
Rating from 1 to 3 (1 low interest, 3 high interest)
SAMU (Small Articles, Many Users)
Small pieces of text extracted from articles (New York Times, The Guardian)
5 articles, 14 users => 25 ideal ranking
Average number of SAs for article => 74
Rating from 1 to 6 (1 low interest, 6 high interest. Scale is symmetric)

31
Experiments: Alternative Configurations and
Baselines
Algorithm Bootstrapping Active Sampling Learning
Serendipity AS Serendipity AUC-Based Sampling RankSVM
Serendipity PS Serendipity Pairwise Sampling RankSVM
Dirichlet AS Dirichlet Clustering AUC-Based Sampling RankSVM
Dirichlet PS Dirichlet Clustering Pairwise Sampling RankSVM
Gaussian AS Gaussian Clustering AUC-Based Sampling RankSVM
Gaussian PS Gaussian Clustering Pairwise Sampling RankSVM
Random Random Random Random RankSVM
Random No Bootstrapping No Active Learning No Learning to Rank
Serendipity No Bootstrapping No Active Learning No Learning to Rank

Results: Personalization Hypothesis
32
Inter Rater Reliability measures to asses the level of agreement between users
with respect to the same items (SAs).
These measure are usually defined in a range [0, 1]:
• 0 => complete disagreement between users
• 1 => users give unanimous rates
Krippendorff's alpha 0.061
Kendall's W 0.26
Value are far from 1 => hypothesis validated

Results: Performance (nDCG@10)
33

33
No PS active sampling, no
real time usage possible

34

34
Serendipity AS
(AUC Based Sampling)

35
Random Random Baseline

36
Random Baseline
Serendipity Baseline

Outline
37
• Experiments

Conclusions and Future Work
38
Conclusions:
1. Quick optimization of personalized ranking function with Active Learning to Rank
2. Active Learning to Rank can be initialized with Serendipity (+performance, +interaction flow)
Future Work:
1. Exploring new algorithms for the active learning to rank
2. Need to better understand how to design user interaction for the ALR model

Thank You
39
Questions?
Contacts: federico.bianchi@disco.unimib.it
www.dacena.org
ITIS Lab – Innovative Technologies for Interaction and Services
Dipartimento di Informatica, Sistemistica e Comunicazione
Università degli Studi di Milano-Bicocca

References
40
Joachims, T. (2002, July). Optimizing search engines using clickthrough data. InProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery
and data mining (pp. 133-142). ACM.
Giuseppe Pirrò. Explaining and suggesting relatedness in knowledge graphs. In ISWC, pages 622–639. Springer, 2015.
Buyue Qian, Hongfei Li, Jun Wang, Xiang Wang, and Ian Davidson. Active learning to rank using pairwise supervision. In SIAM Int. Conf. Data Mining, pages 297–305.
SIAM, 2013.
Pinar Donmez and Jaime G Carbonell. Active sampling for rank learning via optimizing the area under the ROC curve. In ECIR, pages 78–89. Springer, 2009
Federico Bianchi, Matteo Palmonari, Marco Cremaschi, and Elisabetta Fersini. Actively learning to rank semantic associations for personalized contextual exploration of
knowledge graphs. In ESWC, 2017
Matteo Palmonari, Giorgio Uboldi, Marco Cremaschi, Daniele Ciminieri, and Federico Bianchi. Dacena: Serendipitous news reading with data contexts. In ESWC, pages
133–137. Springer, 2015
Page, Lawrence, et al. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, 1999.
Thalhammer, A., & Rettinger, A. (2016, May). PageRank on Wikipedia: towards general importance scores for entities. In International Semantic Web Conference (pp. 227-
240). Springer International Publishing.
Kleinberg, J. M., Kumar, R., Raghavan, P., Rajagopalan, S., & Tomkins, A. S. (1999, July). The web as a graph: measurements, models, and methods. In International
Computing and Combinatorics Conference (pp. 1-17). Springer Berlin Heidelberg.
Aleman-Meza, B., Halaschek-Weiner, C., Arpinar, I. B., Ramakrishnan, C., & Sheth, A. P. (2005). Ranking complex relationships on the semantic web. IEEE Internet
computing, 9(3), 37-44.

Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs

More Related Content

What's hot (12)

Similar to Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs (20)

Recently uploaded (20)

Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs

Editor's Notes