SlideShare a Scribd company logo
INSIGHT Centre for Data Analytics

www.insight-centre.org

Characterising concepts of interest
leveraging Linked Data
and the Social Web
Fabrizio Orlandi, Pavan Kapanipathi,
Amit Sheth, Alexandre Passant
IEEE/WIC/ACM Web Intelligence
Atlanta, GA, USA

20th November 2013

Copyright 2013 INSIGHT Centre for Data Analytics. All rights reserved.

Semantic Web & Linked Data
Research Programme
Scenario:
Personalisation and User Profiling on the Social Web

INSIGHT Centre for Data Analytics

www.insight-centre.org

Semantic Web & Linked Data
Research Programme
http://www.flickr.com/photos/giladlotan/
INSIGHT Centre for Data Analytics

www.insight-centre.org

Semantic Web & Linked Data
Research Programme
INSIGHT Centre for Data Analytics

www.insight-centre.org

Semantic Web & Linked Data
Research Programme
Solution
INSIGHT Centre for Data Analytics

www.insight-centre.org

Interlink social websites

Integration
&
User Modelling

Merge and model user data

Personalise usersโ€™ experience
using their profile

User Profile

Recommendations

Adaptive Systems

Search Personalisation
[Orlandi et al., I-Semantics 2012]

Semantic Web & Linked Data
Research Programme
Problem
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Entity-based user profiles of interests:

Sport
CEV Volleyball Cup
Music
Heavy Metal
Mastodon

Atlanta
โ€ฆ
6

Semantic Web & Linked Data
Research Programme
Problem
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Entity-based user profiles of interests:
Semantics?
Pragmatics?
Sport
CEV Volleyball Cup
Music
Heavy Metal
Mastodon

Relevance?

Atlanta
โ€ฆ
7

Semantic Web & Linked Data
Research Programme
Linking Open Data
INSIGHT Centre for Data Analytics

๏ฎ

8

www.insight-centre.org

The Semantics of the Web of Data

LOD Cloud by R. Cyganiak
and A. Jentzsch

Semantic Web & Linked Data
Research Programme
Example
INSIGHT Centre for Data Analytics

www.insight-centre.org

โ€œMastodon is the best heavy metal band from Atlantaโ€ฆ
Canโ€™t wait to see them live again!โ€

โ€œTrentino vs Lugano about to start - Diatec youngster to
impress again in CEV Champions League #volleyballโ€
โ€œW3C Invites Implementations of five Candidate
Recommendations for RDF 1.1 #SemanticWebโ€

Music

Heavy Metal
Mastodon
โ€ข Named entity recognition
and disambiguation

โ€ข Frequency + time-decay
weighting scheme

Atlanta
CEV Champions League
Volleyball
Semantic Web
RDF

9

Semantic Web & Linked Data
Research Programme
Example
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Are all the extracted entities useful for personalisation?
๏‚จ

How are concepts/entities being used on the Social Web? (Pragmatics)

Music
Heavy Metal
Mastodon (band)

Atlanta (GA.)
CEV Champions League
Volleyball

Very abstract, very popular
Very popular
Specific and time-dependent on events, etc.
Specific, very popular and time-dependent

Specific and time-dependent on events, etc.
Abstract and popular

Semantic Web
RDF
10

Abstract and not popular
Specific and not popular

Semantic Web & Linked Data
Research Programme
The Dimensions of our
Characterisation
INSIGHT Centre for Data Analytics

๏ฎ

Specificity
๏‚จ

๏ฎ

www.insight-centre.org

The level of abstraction that an entity has in a common
conceptual schema shared by humans

Popularity
๏‚จ

How popular an entity is on the Social Web
โ€“ How frequently is it mentioned/used at that point of time?

๏ฎ

Temporal Dynamics
๏‚จ

The trend and evolution of the frequency of mentions of an
entity on the Social Web
โ€“ i.e. popularity over time

11

Semantic Web & Linked Data
Research Programme
Requirements
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Our use case: real-time personalisation of Social
Web streams
1.

(quasi-) Real-time computation of the dimensions

2.

Results constantly up to date with the real world

3.

Knowledge base and domain independent approach

12

Semantic Web & Linked Data
Research Programme
Popularity
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We chose the Twitter Search API
๏‚จ

We search for an entity on the Twitter stream in a short recent time
frame.

๏‚จ

Run entity disambiguation on the resulting tweets to filter out noisy
tweets.

๏‚จ

Count the remaining tweets in a given timeframe.

๏‚จ

The Popularity measure is the resulting value in tweets/second.

๏‚จ

This is fast, simple, up-to-date, only for short recent timeframe.

e.g. โ€œMusicโ€~ 16.6 tw/s
โ€œHeavy Metalโ€~ 0.09 tw/s
โ€œSemantic Webโ€~ 0.0008 tw/s
13

Semantic Web & Linked Data
Research Programme
Temporal Dynamics
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We use Wikipedia page views
๏‚จ

Entities are already mapped to DBpedia

๏‚จ

MediaWiki API provides a long history of daily page views of
Wikipedia articles

๏‚จ

We use Mean and Standard Deviation for the last 30 days of page
views to identify if the popularity of an entity is:
โ€“ Stable/Unstable
โ€“ Trendy/Non-Trendy

CEV_Champions_League

Typhoon_Haiyan (2013)

(Diagrams from: stats.grok.se)

Semantic Web & Linked Data
Research Programme
Specificity
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We use the Linking Open Data (LOD) cloud
๏‚จ

Most of the available knowledge bases (e.g. DMOZ, Wordnet,
OpenCyc) are not up-to-date.

๏‚จ

Wikipedia would be large, domain-independent, continuously
updated, but:
โ€“ entities are not organised hierarchically in a taxonomy
โ€“ We cannot use taxonomy-based methods (i.e. super/sub -type rel.)
โ€“ PLUS: expensive algorithms would not be good for real-time computation

LOD Links Structure!
15

Semantic Web & Linked Data
Research Programme
Graph based measures
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

SOA graph based method:
๏‚จ

indegree and outdegree
(here called Incoming/Outgoing Predicates โ€“ IP and OP)

๏‚จ

We can use these methods with RDF triples

๏‚จ

We introduce โ€œdistinct in/out-degreeโ€ (IDP and ODP )
s1
p1
p1

s2

p2

p3
m

o1

p4

o2

Values for โ€œmโ€:
IP (indegree) = 3
OP (outdegree) = 2
IDP (distinct indegree) = 2
ODP (distinct outdegree) = 2

s3
16

Semantic Web & Linked Data
Research Programme
Our Specificity Measure
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

DRR (Distinct Relations Ratio):
Incoming Distinct Predicates (IDP)

DRR =

๏ฎ

Outgoing Distinct Predicates (ODP)

Compared with:
IP/OP, IP+OP, IP, IDP

๏ฎ

Computed on Sindice SPARQL
endpoint in less than 1sec.

17

Semantic Web & Linked Data
Research Programme
Alternative SOA Method
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

DMOZ (Open Directory Project) taxonomy
๏‚จ

๏‚จ

18

We use the hierarchical structure of DMOZ as an alternative method to
measure specificity.
We manually map entities to the DMOZ entities and compute the
distance from the root of the DMOZ tree.

Semantic Web & Linked Data
Research Programme
Generation of a Gold Standard
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Binary classification of entities
๏‚จ

5 humans classified 160 entities in:
โ€“ Generic (38%)
โ€“ Specific (62%)

๏‚จ

๏ฎ

Substantial agreement (k=0.61)

Ranking of entities
๏‚จ

5 humans rated the specificity of 160 entities in:
โ€“ 1 to 10 scale (1=very generic, 10=very specific)
Average Rate

7.03

Average Std. Dev.

1.45

AVG Top 30 High Std. Dev.

5.66

AVG Top 30 Low Std. Dev.

7.51

Abstract entities are harder
for humans to rate

19

Semantic Web & Linked Data
Research Programme
Evaluation: Classification
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We compared the different methods against the gold standard
created manually by the users
๏‚จ

Agreement with gold std. in the binary classification task:
DMOZ

IP/OP

IP+OP

IP

random

83.9%

๏‚จ

DRR
84.1%

70.0%

70.0%

72.5%

61.9%

The performance of the DRR measure for this classification task
is comparable to a manual classification done using the DMOZ
taxonomy and to human judgement.

20

Semantic Web & Linked Data
Research Programme
Evaluation: Ranking
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We rank the specificity of 50 randomly chosen entities using:
๏‚จ

Gold standard (average of the 5 usersโ€™ rates for each entity)

๏‚จ

DMOZ levels (integers, 0 to 9)
โ€“ We compute โ€œDMOZ-โ€ and โ€œDMOZ+โ€ as the worst and best possible rankings
compared to the gold standard ranking.

๏‚จ

๏ฎ

DRR, IP/OP, IP+OP, random, values (real numbers)

We compute NDCG (Normalized Discounted Cumulative Gain) at
different ranking positions โ€œpโ€.

(DCGideal is the ranking of the gold std.)

Semantic Web & Linked Data
Research Programme
Evaluation: Ranking
INSIGHT Centre for Data Analytics

www.insight-centre.org

DRR: +5% for NDCG at 10 and 20

Semantic Web & Linked Data
Research Programme
Evaluation on User Profiles
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

We evaluate the impact of the proposed measures on user
profiles of interests, a real use case
๏‚จ
๏‚จ

Interests extracted from usersโ€™ posts on Facebook and Twitter
with NLP tools (as described in our previous work [1])

๏‚จ

Frequency-based + time decay weighting strategy

๏‚จ

Each user rated his/her Top 30 list of interests generated (total
of 794 user ratings)

๏‚จ

23

27 volunteers

Ratings on a โ€œ1 to 5โ€ scale according to how relevant/interesting
is each entity of interest to the user (5 is highly relevant)

[1] Orlandi et al., I-Semantics 2012

Semantic Web & Linked Data
Research Programme
Evaluation on User Profiles
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Average score (1 to 5 scale) is computed according to groups of types of
entities

(+8%)

(17%)

(+12%)

๏ฎ

๏ฎ

24

Not-popular and generic entities better represent usersโ€™ perception of
their interests (but we have only 17% of them)
This behaviour might be different in other applications and use cases!
(e.g. news recommendations, etc.)

Semantic Web & Linked Data
Research Programme
Conclusions
INSIGHT Centre for Data Analytics

www.insight-centre.org

๏ฎ

Introduced dimensions for characterisation of concepts of interest:
specificity, popularity and temporal dynamics.

๏ฎ

Proposed methods for their computation satisfying requirements for
real-time personalisation of Social Web streams:
๏‚จ

๏ฎ

Introduced a novel measure (DRR) for specificity of concepts based
on the LOD cloud
๏‚จ

๏ฎ

Evaluated for two different tasks (classification and ranking) against SOA
methods (humans, DMOZ, graph measures)

Evaluated the impact of the measures on user profiles of interests
(27 users and ~800 ratings)
๏‚จ

25

Real-time, domain independent, up to date.

Abstract and non-popular interests are preferred by users

Semantic Web & Linked Data
Research Programme
Future work
INSIGHT Centre for Data Analytics

๏ฎ

www.insight-centre.org

Experiment the measures on user profiles used for different
personalisation tasks.
๏ฎ E.g. a tweets recommender system should give priority to trendy,
popular and specific entities instead.

๏ฎ

Improve the simple popularity and trend detection methods.

๏ฎ

Improve the DRR measure adding more โ€œsemanticsโ€, i.e. considering
the different types of edges.

26

Semantic Web & Linked Data
Research Programme
Thanks!
INSIGHT Centre for Data Analytics

www.insight-centre.org

@badmotorf
fabrizio.orlandi@deri.org
@pavankaps
pavan@knoesis.org
@amit_p
amit@knoesis.org
@terraces
alex@seevl.net

Semantic Web & Linked Data
Research Programme

More Related Content

PDF
Building Knowledge Graphs in DIG
PDF
Building and Using a Knowledge Graph to Combat Human Trafficking
PPTX
Semantic Search at Yahoo
PDF
Knowledge Graphs - The Power of Graph-Based Search
ย 
PDF
Reflected Intelligence: Real world AI in Digital Transformation
PDF
GraphTour London 2020 - Graphs for AI, Amy Hodler
ย 
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
PPT
Implementing Semantic Search
Building Knowledge Graphs in DIG
Building and Using a Knowledge Graph to Combat Human Trafficking
Semantic Search at Yahoo
Knowledge Graphs - The Power of Graph-Based Search
ย 
Reflected Intelligence: Real world AI in Digital Transformation
GraphTour London 2020 - Graphs for AI, Amy Hodler
ย 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Implementing Semantic Search

What's hot (20)

PPT
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
PPTX
Extracting, Aligning, and Linking Data to Build Knowledge Graphs
PPT
Introduction To Data Mining
PDF
Introduction to Data Science
PDF
Measuring Relevance in the Negative Space
PPT
Web Mining
PDF
How Graph Algorithms Answer your Business Questions in Banking and Beyond
ย 
PDF
Powerful Information Discovery with Big Knowledge Graphs โ€“The Offshore Leaks ...
PPT
Applications of Semantic Technology in the Real World Today
PPTX
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
PDF
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
PPTX
Social Network Analysis with Spark
PDF
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
PDF
Methods for Intrinsic Evaluation of Links in the Web of Data
PDF
Enterprise Knowledge Graph
PPTX
Propelling the Potential of Linked Data in Enterprises
PDF
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
PDF
Deep Recommender Systems - PAPIs.io LATAM 2018
PDF
Autodiscovery or The long tail of open data
PDF
Knowledge graphs ilaria maresi the hyve 23apr2020
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
Extracting, Aligning, and Linking Data to Build Knowledge Graphs
Introduction To Data Mining
Introduction to Data Science
Measuring Relevance in the Negative Space
Web Mining
How Graph Algorithms Answer your Business Questions in Banking and Beyond
ย 
Powerful Information Discovery with Big Knowledge Graphs โ€“The Offshore Leaks ...
Applications of Semantic Technology in the Real World Today
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Social Network Analysis with Spark
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Methods for Intrinsic Evaluation of Links in the Web of Data
Enterprise Knowledge Graph
Propelling the Potential of Linked Data in Enterprises
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Deep Recommender Systems - PAPIs.io LATAM 2018
Autodiscovery or The long tail of open data
Knowledge graphs ilaria maresi the hyve 23apr2020
Ad

Viewers also liked (6)

PPTX
iRap - Interest based RDF update propagation
PDF
Semantic Representation of Provenance in Wikipedia
PPTX
Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web
PDF
Prov-O-Viz: Interactive Provenance Visualization
PPTX
Profiling User Interests on the Social Semantic Web
PDF
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
iRap - Interest based RDF update propagation
Semantic Representation of Provenance in Wikipedia
Aggregated, Interoperable and Multi-Domain User Profiles for the Social Web
Prov-O-Viz: Interactive Provenance Visualization
Profiling User Interests on the Social Semantic Web
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Ad

Similar to Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web (20)

PPTX
Semantic mark-up with schema.org: helping search engines understand the Web
PPT
Intelligent expert systems for location planning
PPTX
Introduction to the Semantic Web
PPTX
SWT Lecture Session 1 - Introduction
PPT
Semantic web an overview and projects
PDF
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
PPTX
Making the Web Searchable - Keynote ICWE 2015
PPTX
(Keynote) Peter Mika - โ€œMaking the Web Searchableโ€
PPT
Intro semanticweb
PPT
RDF and Open Linked Data, a first approach
PPTX
Semantic Search keynote at CORIA 2015
PPT
Spivack Blogtalk 2008
PPTX
Semantic web
PDF
Hide the Stack: Toward Usable Linked Data
ย 
PPTX
Research into Practice case study 2: Library linked data implementations an...
PPT
Information Extraction and Linked Data Cloud
PDF
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
PDF
The state of the art in Linked Data
PPSX
The Web of data and web data commons
PDF
Linked data and Semantic Web Applications for Libraries
Semantic mark-up with schema.org: helping search engines understand the Web
Intelligent expert systems for location planning
Introduction to the Semantic Web
SWT Lecture Session 1 - Introduction
Semantic web an overview and projects
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
Making the Web Searchable - Keynote ICWE 2015
(Keynote) Peter Mika - โ€œMaking the Web Searchableโ€
Intro semanticweb
RDF and Open Linked Data, a first approach
Semantic Search keynote at CORIA 2015
Spivack Blogtalk 2008
Semantic web
Hide the Stack: Toward Usable Linked Data
ย 
Research into Practice case study 2: Library linked data implementations an...
Information Extraction and Linked Data Cloud
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
The state of the art in Linked Data
The Web of data and web data commons
Linked data and Semantic Web Applications for Libraries

More from Fabrizio Orlandi (10)

PDF
Beyond 2022 project presentation 2021
PDF
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
PDF
Modelling context and statement-level metadata in knowledge graphs
PDF
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
PDF
Semantic user profiling and Personalised filtering of the Twitter stream
PDF
Semantic search on heterogeneous wiki systems - Wikimania 2010
PDF
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
PDF
Semantic Search on Heterogeneous Wiki Systems - poster
PPT
Semantic Search on Heterogeneous Wiki Systems - Short
PDF
Enabling cross-wikis integration by extending the SIOC ontology
Beyond 2022 project presentation 2021
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Modelling context and statement-level metadata in knowledge graphs
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Semantic user profiling and Personalised filtering of the Twitter stream
Semantic search on heterogeneous wiki systems - Wikimania 2010
Semantic Search on Heterogeneous Wiki Systems - wikisym2010
Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - Short
Enabling cross-wikis integration by extending the SIOC ontology

Recently uploaded (20)

PDF
Transform Your Social Media, Grow Your Brand
PDF
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
PPTX
Types of Social Media Marketing for Business Success
PDF
The Edge Youโ€™ve Been Missing Get the Sociocosmos Edge
PDF
Subscribe This Channel Subscribe Back You
DOCX
Buy Goethe A1 ,B2 ,C1 certificate online without writing
PDF
Live Echo Boost on TikTok_ Double Devices, Higher Ranks
ย 
PDF
The Fastest Way to Look Popular Buy Reactions Today
PDF
Mastering Social Media Marketing in 2025.pdf
PDF
TikTok Live shadow viewers_ Who watches without being counted
ย 
PDF
Your Best Post Vanished. Blame the Attention Economy
PPTX
How Social Media Influencers Repurpose Content (1).pptx
PPTX
Strategies for Social Media App Enhancement
PPTX
Result-Driven Social Media Marketing Services | Boost ROI
PDF
Instagram Reels Growth Guide 2025.......
PDF
Why Digital Marketing Matters in Todayโ€™s World Ask ChatGPT
PDF
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
PPTX
Office Administration Courses in Trivandrum That Employers Value.pptx
PPTX
Preposition and Asking and Responding Suggestion.pptx
PDF
11111111111111111111111111111111111111111111111
Transform Your Social Media, Grow Your Brand
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
Types of Social Media Marketing for Business Success
The Edge Youโ€™ve Been Missing Get the Sociocosmos Edge
Subscribe This Channel Subscribe Back You
Buy Goethe A1 ,B2 ,C1 certificate online without writing
Live Echo Boost on TikTok_ Double Devices, Higher Ranks
ย 
The Fastest Way to Look Popular Buy Reactions Today
Mastering Social Media Marketing in 2025.pdf
TikTok Live shadow viewers_ Who watches without being counted
ย 
Your Best Post Vanished. Blame the Attention Economy
How Social Media Influencers Repurpose Content (1).pptx
Strategies for Social Media App Enhancement
Result-Driven Social Media Marketing Services | Boost ROI
Instagram Reels Growth Guide 2025.......
Why Digital Marketing Matters in Todayโ€™s World Ask ChatGPT
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
Office Administration Courses in Trivandrum That Employers Value.pptx
Preposition and Asking and Responding Suggestion.pptx
11111111111111111111111111111111111111111111111

Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked Data and the Social Web

  • 1. INSIGHT Centre for Data Analytics www.insight-centre.org Characterising concepts of interest leveraging Linked Data and the Social Web Fabrizio Orlandi, Pavan Kapanipathi, Amit Sheth, Alexandre Passant IEEE/WIC/ACM Web Intelligence Atlanta, GA, USA 20th November 2013 Copyright 2013 INSIGHT Centre for Data Analytics. All rights reserved. Semantic Web & Linked Data Research Programme
  • 2. Scenario: Personalisation and User Profiling on the Social Web INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme http://www.flickr.com/photos/giladlotan/
  • 3. INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme
  • 4. INSIGHT Centre for Data Analytics www.insight-centre.org Semantic Web & Linked Data Research Programme
  • 5. Solution INSIGHT Centre for Data Analytics www.insight-centre.org Interlink social websites Integration & User Modelling Merge and model user data Personalise usersโ€™ experience using their profile User Profile Recommendations Adaptive Systems Search Personalisation [Orlandi et al., I-Semantics 2012] Semantic Web & Linked Data Research Programme
  • 6. Problem INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Entity-based user profiles of interests: Sport CEV Volleyball Cup Music Heavy Metal Mastodon Atlanta โ€ฆ 6 Semantic Web & Linked Data Research Programme
  • 7. Problem INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Entity-based user profiles of interests: Semantics? Pragmatics? Sport CEV Volleyball Cup Music Heavy Metal Mastodon Relevance? Atlanta โ€ฆ 7 Semantic Web & Linked Data Research Programme
  • 8. Linking Open Data INSIGHT Centre for Data Analytics ๏ฎ 8 www.insight-centre.org The Semantics of the Web of Data LOD Cloud by R. Cyganiak and A. Jentzsch Semantic Web & Linked Data Research Programme
  • 9. Example INSIGHT Centre for Data Analytics www.insight-centre.org โ€œMastodon is the best heavy metal band from Atlantaโ€ฆ Canโ€™t wait to see them live again!โ€ โ€œTrentino vs Lugano about to start - Diatec youngster to impress again in CEV Champions League #volleyballโ€ โ€œW3C Invites Implementations of five Candidate Recommendations for RDF 1.1 #SemanticWebโ€ Music Heavy Metal Mastodon โ€ข Named entity recognition and disambiguation โ€ข Frequency + time-decay weighting scheme Atlanta CEV Champions League Volleyball Semantic Web RDF 9 Semantic Web & Linked Data Research Programme
  • 10. Example INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Are all the extracted entities useful for personalisation? ๏‚จ How are concepts/entities being used on the Social Web? (Pragmatics) Music Heavy Metal Mastodon (band) Atlanta (GA.) CEV Champions League Volleyball Very abstract, very popular Very popular Specific and time-dependent on events, etc. Specific, very popular and time-dependent Specific and time-dependent on events, etc. Abstract and popular Semantic Web RDF 10 Abstract and not popular Specific and not popular Semantic Web & Linked Data Research Programme
  • 11. The Dimensions of our Characterisation INSIGHT Centre for Data Analytics ๏ฎ Specificity ๏‚จ ๏ฎ www.insight-centre.org The level of abstraction that an entity has in a common conceptual schema shared by humans Popularity ๏‚จ How popular an entity is on the Social Web โ€“ How frequently is it mentioned/used at that point of time? ๏ฎ Temporal Dynamics ๏‚จ The trend and evolution of the frequency of mentions of an entity on the Social Web โ€“ i.e. popularity over time 11 Semantic Web & Linked Data Research Programme
  • 12. Requirements INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Our use case: real-time personalisation of Social Web streams 1. (quasi-) Real-time computation of the dimensions 2. Results constantly up to date with the real world 3. Knowledge base and domain independent approach 12 Semantic Web & Linked Data Research Programme
  • 13. Popularity INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We chose the Twitter Search API ๏‚จ We search for an entity on the Twitter stream in a short recent time frame. ๏‚จ Run entity disambiguation on the resulting tweets to filter out noisy tweets. ๏‚จ Count the remaining tweets in a given timeframe. ๏‚จ The Popularity measure is the resulting value in tweets/second. ๏‚จ This is fast, simple, up-to-date, only for short recent timeframe. e.g. โ€œMusicโ€~ 16.6 tw/s โ€œHeavy Metalโ€~ 0.09 tw/s โ€œSemantic Webโ€~ 0.0008 tw/s 13 Semantic Web & Linked Data Research Programme
  • 14. Temporal Dynamics INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We use Wikipedia page views ๏‚จ Entities are already mapped to DBpedia ๏‚จ MediaWiki API provides a long history of daily page views of Wikipedia articles ๏‚จ We use Mean and Standard Deviation for the last 30 days of page views to identify if the popularity of an entity is: โ€“ Stable/Unstable โ€“ Trendy/Non-Trendy CEV_Champions_League Typhoon_Haiyan (2013) (Diagrams from: stats.grok.se) Semantic Web & Linked Data Research Programme
  • 15. Specificity INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We use the Linking Open Data (LOD) cloud ๏‚จ Most of the available knowledge bases (e.g. DMOZ, Wordnet, OpenCyc) are not up-to-date. ๏‚จ Wikipedia would be large, domain-independent, continuously updated, but: โ€“ entities are not organised hierarchically in a taxonomy โ€“ We cannot use taxonomy-based methods (i.e. super/sub -type rel.) โ€“ PLUS: expensive algorithms would not be good for real-time computation LOD Links Structure! 15 Semantic Web & Linked Data Research Programme
  • 16. Graph based measures INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org SOA graph based method: ๏‚จ indegree and outdegree (here called Incoming/Outgoing Predicates โ€“ IP and OP) ๏‚จ We can use these methods with RDF triples ๏‚จ We introduce โ€œdistinct in/out-degreeโ€ (IDP and ODP ) s1 p1 p1 s2 p2 p3 m o1 p4 o2 Values for โ€œmโ€: IP (indegree) = 3 OP (outdegree) = 2 IDP (distinct indegree) = 2 ODP (distinct outdegree) = 2 s3 16 Semantic Web & Linked Data Research Programme
  • 17. Our Specificity Measure INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org DRR (Distinct Relations Ratio): Incoming Distinct Predicates (IDP) DRR = ๏ฎ Outgoing Distinct Predicates (ODP) Compared with: IP/OP, IP+OP, IP, IDP ๏ฎ Computed on Sindice SPARQL endpoint in less than 1sec. 17 Semantic Web & Linked Data Research Programme
  • 18. Alternative SOA Method INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org DMOZ (Open Directory Project) taxonomy ๏‚จ ๏‚จ 18 We use the hierarchical structure of DMOZ as an alternative method to measure specificity. We manually map entities to the DMOZ entities and compute the distance from the root of the DMOZ tree. Semantic Web & Linked Data Research Programme
  • 19. Generation of a Gold Standard INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Binary classification of entities ๏‚จ 5 humans classified 160 entities in: โ€“ Generic (38%) โ€“ Specific (62%) ๏‚จ ๏ฎ Substantial agreement (k=0.61) Ranking of entities ๏‚จ 5 humans rated the specificity of 160 entities in: โ€“ 1 to 10 scale (1=very generic, 10=very specific) Average Rate 7.03 Average Std. Dev. 1.45 AVG Top 30 High Std. Dev. 5.66 AVG Top 30 Low Std. Dev. 7.51 Abstract entities are harder for humans to rate 19 Semantic Web & Linked Data Research Programme
  • 20. Evaluation: Classification INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We compared the different methods against the gold standard created manually by the users ๏‚จ Agreement with gold std. in the binary classification task: DMOZ IP/OP IP+OP IP random 83.9% ๏‚จ DRR 84.1% 70.0% 70.0% 72.5% 61.9% The performance of the DRR measure for this classification task is comparable to a manual classification done using the DMOZ taxonomy and to human judgement. 20 Semantic Web & Linked Data Research Programme
  • 21. Evaluation: Ranking INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We rank the specificity of 50 randomly chosen entities using: ๏‚จ Gold standard (average of the 5 usersโ€™ rates for each entity) ๏‚จ DMOZ levels (integers, 0 to 9) โ€“ We compute โ€œDMOZ-โ€ and โ€œDMOZ+โ€ as the worst and best possible rankings compared to the gold standard ranking. ๏‚จ ๏ฎ DRR, IP/OP, IP+OP, random, values (real numbers) We compute NDCG (Normalized Discounted Cumulative Gain) at different ranking positions โ€œpโ€. (DCGideal is the ranking of the gold std.) Semantic Web & Linked Data Research Programme
  • 22. Evaluation: Ranking INSIGHT Centre for Data Analytics www.insight-centre.org DRR: +5% for NDCG at 10 and 20 Semantic Web & Linked Data Research Programme
  • 23. Evaluation on User Profiles INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org We evaluate the impact of the proposed measures on user profiles of interests, a real use case ๏‚จ ๏‚จ Interests extracted from usersโ€™ posts on Facebook and Twitter with NLP tools (as described in our previous work [1]) ๏‚จ Frequency-based + time decay weighting strategy ๏‚จ Each user rated his/her Top 30 list of interests generated (total of 794 user ratings) ๏‚จ 23 27 volunteers Ratings on a โ€œ1 to 5โ€ scale according to how relevant/interesting is each entity of interest to the user (5 is highly relevant) [1] Orlandi et al., I-Semantics 2012 Semantic Web & Linked Data Research Programme
  • 24. Evaluation on User Profiles INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Average score (1 to 5 scale) is computed according to groups of types of entities (+8%) (17%) (+12%) ๏ฎ ๏ฎ 24 Not-popular and generic entities better represent usersโ€™ perception of their interests (but we have only 17% of them) This behaviour might be different in other applications and use cases! (e.g. news recommendations, etc.) Semantic Web & Linked Data Research Programme
  • 25. Conclusions INSIGHT Centre for Data Analytics www.insight-centre.org ๏ฎ Introduced dimensions for characterisation of concepts of interest: specificity, popularity and temporal dynamics. ๏ฎ Proposed methods for their computation satisfying requirements for real-time personalisation of Social Web streams: ๏‚จ ๏ฎ Introduced a novel measure (DRR) for specificity of concepts based on the LOD cloud ๏‚จ ๏ฎ Evaluated for two different tasks (classification and ranking) against SOA methods (humans, DMOZ, graph measures) Evaluated the impact of the measures on user profiles of interests (27 users and ~800 ratings) ๏‚จ 25 Real-time, domain independent, up to date. Abstract and non-popular interests are preferred by users Semantic Web & Linked Data Research Programme
  • 26. Future work INSIGHT Centre for Data Analytics ๏ฎ www.insight-centre.org Experiment the measures on user profiles used for different personalisation tasks. ๏ฎ E.g. a tweets recommender system should give priority to trendy, popular and specific entities instead. ๏ฎ Improve the simple popularity and trend detection methods. ๏ฎ Improve the DRR measure adding more โ€œsemanticsโ€, i.e. considering the different types of edges. 26 Semantic Web & Linked Data Research Programme
  • 27. Thanks! INSIGHT Centre for Data Analytics www.insight-centre.org @badmotorf fabrizio.orlandi@deri.org @pavankaps pavan@knoesis.org @amit_p amit@knoesis.org @terraces alex@seevl.net Semantic Web & Linked Data Research Programme