SlideShare a Scribd company logo
A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and
Yulan He°
Knowledge Media Institute, The Open University, Milton Keynes
Ÿ University of Sheffield, Sheffield
w Lancaster University, Lancaster
° Aston University, Birmingham
UK. 2013
Harnessing Linked Knowledge Sources for
Topic Classification in Social Media
INTRODUCTION
Social Media Streams - Risk in violent and criminal activities
INTRODUCTION
Research Questions:
o  Can semantic features help in topic classification (TC)?
o  Which knowledge source (KS) data and KS taxonomies
provide useful information for improving the TC of tweets?
OUTLINE
• Introduction
- Topic Classification (TC) of Microposts
- Related Work
- State of the art limitations
• Proposed Approach
• Experiments
• Findings
• Conclusions
INTRODUCTION
u  Difficulties of Topic Classification of microposts
o  Restricted number of characters
o  Irregular and ill-formed words
•  Mixing upper and lowercase letter
§  Makes it difficult to detect proper nouns, and other part of
speech tags.
•  Wide variety of language
§  E.g., “see u soon”
o  Event-dependent emerging jargon
• Volatile jargon relevant to particular events
§  E.g., “Jan.25” (used during the Egyptian revolution
o  High Topical Diversity
o  Sparse data
INTRODUCTION
Social Knowledge Sources (KS)
DBpedia* Yago2 Freebase
Resources 2.35 million 447million 3.6 million
Classes 359 562,312 1,450
Properties 1,820 253,213,842 7,000
*Using dbpedia ontology
o  Structured Semantic Web Representation of data
•  Maintained by thousand of editors
§  E.g DBpedia, derived from Wikipedia
§  Freebase
•  Evolves and adapts as knowledge changes [Syed et al,
2008]
o  Cover a broad range of topics
o  Characterise topics with a large number of resources
INTRODUCTION
Local and External Metadata of a Tweet
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
<http://dbpedia.org/resource/Barack_Obama
<http://dbpedia.org/resource/Egypt
<http://dbpedia.org/resource/Hosni_Mubarak
PROPOSED APPROACH
o  State of the art limitations
§  Use of single knowledge sources
§  Entities’ metadata is constrained by the used NER service
(e.g OpenCalais, Alchemy).
o  Our approach
§  Exploits multiple knowledge sources.
§  Enhances the entity metadata by deriving semantic graphs.
§  Leverages the graph structures surrounding entities present
in a KS for the TC task.
Exploiting Knowledge Sources for the Topic Classification of
Microposts
OUTLINE
• Introduction
• Proposed Approach
• Semantic Meta-graphs
• Weighting Schemas
• Enhancing TC with Semantic Features
• Experiments
• Findings
• Conclusions
PROPOSED APPROACH
Rationale…
1
2
PROPOSED APPROACH
Rationale…
1
2
Could be more indicative
of War and Conflict
PROPOSED APPROACH
Rationale…
2
Not necessarily a good
indicator of War and
Conflict
PROPOSED APPROACH
Rationale…
1
2
Can the graph structure of existing Knowledge sources provide
an abstraction of the use of these entity types for representing a
topic ?
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
1 Datasets Collection
SPARQL query for all resources from a
given Topic (e.g. War )
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
3 Semantic Features Derivation
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Definition 1- Resource Meta-graph
Is a sequence of tuples G:=(R,P,C,Y) where
•  R, P, C are finite sets whose elements are resources,
properties and classes;
•  Y is a ternary relation representing a
hypergraph with ternary edges.
•  Y is a tripartite graph where the vertices
are
Y ! R " P "C
H Y( ) = V, D
D = r, p,c{ } r, p,c( ) ! Y{ }
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
How can we weight these graphs to reveal semantic
features characterise Obama in the context of
Violence?
?
?
?
?
?? ?
PROPOSED APPROACH
Weighting Semantic Features
Specificity
Measures the relative importance of a property to
a given class in a KS graph GKS:
p ! G e( )
c ! G e( )
specificityKS p,c( ) = pN R(c)( )
N(R(c))
PROPOSED APPROACH
Weighting Semantic Features
Generality
Captures the specialisation of a property p to a given class c,
by computing the property’s frequency among other
semantically related classes R’(c).
Where N(R’(c)) is the number of resources whose type is
either c or a specialisation of c’s parent classes.
generalityKS p,c( ) =
N R'(c)( )
pN (R'(c))
PROPOSED APPROACH
Weighting Semantic Features
SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/
populationDensity….
FA1+ C
PopulatedPlace, Office_holder, PresidentOfTheUnitedStates,
Politician…
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c) + Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c)+Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA2+ parent(c)
Place, Office_holder, President, Politician…
OUTLINE
• Introduction
• Proposed Approach
• Experiments
• Dataset
• Baseline Features
• Results
• Findings
• Conclusions
PROPOSED APPROACH
Datasets
o  Twitter Dataset [Abel et al., 2011] (TW)
§  Collected during two months starting on Nov 2010.
§  Topically annotated
§  Using tweets labelled as “War & Conflict” (War),
“Law & Crime” (Cri), “Disaster &
Accident” (DisAcc).
§  Multilabelled dataset comprising 10,189 Tweets.
o  DBpedia (DB) and Freebase (FB) Dataset
§  SPARQL queried endpoints for all resources from
categories and subcategories of skos:concept of War,
Cri, DisAcc.
•  DBpedia – 9,465 articles
•  Freebase – 16,915 articles
PROPOSED APPROACH
Datasets
PROPOSED APPROACH
Experimental Setup A
1.  Use annotated Tweets for training (TW)
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the TW corpus. Trained/
Tested on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for TW dataset
PROPOSED APPROACH
Experimental Setup B
1.  Use labelled articles from DBpedia (DB) and Freebase
(FB) for training
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the DB, FB, DB+FB, DB
+FB+TW training corpus and test on TW. Trained/Tested
on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for Training on KS articles, and Testing on TW
PROPOSED APPROACH
Factors contributing to the performance of a KS graph for TC
1.  Topic-Class Entropy
2.  Entity-Class Entropy
3.  Topic-Class-Property Entropy
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
Indicates that the higher the number of ambiguous
entities in a topic within a KS graph, the lower the
performance of the TC.
FINDINGS
1.  KSs combined with Twitter data provide complementary
information for TC of Tweets, outperforming the KS
approaches and the approach using Tweets only.
2.  A KS performance on TC depends on the coverage of
the entities within that KS.
3.  When entities have low coverage in a KS, exploiting the
mapping between corresponding KSs’ ontologies is
beneficial.
CONCLUSIONS
•  Explored the task of topic classification of tweets
•  Exploited information in KSs (e.g. DBpedia, Freebase)
using semantic graphs for concepts and properties
surrounding an entity.
•  Presented the importance of considering graph
structures in KSs for the supervised classification of
tweets, by achieving significant improvement over
various state-of-the-art approaches using both single
KSs and Tweets only.
CONTACT US
A.  Elizabeth Cano
•  http://people.kmi.open.ac.uk/cano/
B.  Andrea Varga
•  http://sites.google.com/site/missandreavarga/
C.  Matthew Rowe
•  http://lancs.ac.uk/staff/rowem/
D.  Fabio Ciravegna
•  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna
E.  Yulan He
•  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

More Related Content

PPTX
Gleaning Types for Literals in RDF with Application to Entity Summarization
PDF
Entity Linking
PDF
Entity Retrieval (WSDM 2014 tutorial)
PDF
Evaluation Initiatives for Entity-oriented Search
PDF
Entity Search: The Last Decade and the Next
PPTX
Entity Linking in Queries: Tasks and Evaluation
PDF
Entity Retrieval (SIGIR 2013 tutorial)
PDF
Entity Linking in Queries: Efficiency vs. Effectiveness
Gleaning Types for Literals in RDF with Application to Entity Summarization
Entity Linking
Entity Retrieval (WSDM 2014 tutorial)
Evaluation Initiatives for Entity-oriented Search
Entity Search: The Last Decade and the Next
Entity Linking in Queries: Tasks and Evaluation
Entity Retrieval (SIGIR 2013 tutorial)
Entity Linking in Queries: Efficiency vs. Effectiveness

What's hot (6)

PDF
Exploiting Entity Linking in Queries For Entity Retrieval
PPT
Intelligent Methods in Models of Text Information Retrieval: Implications for...
PDF
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
PDF
Entity Retrieval (WWW 2013 tutorial)
PPTX
Rules for inducing hierarchies from social tagging data
PDF
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploiting Entity Linking in Queries For Entity Retrieval
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (WWW 2013 tutorial)
Rules for inducing hierarchies from social tagging data
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Ad

Viewers also liked (19)

PDF
CIMAT 2011 - Problemáticas
PDF
Centro De Innovacion En Productividad Presentacion
PDF
Concepts of IT-Based Modern Living
PPTX
Actividad 1. módulo vi. sustentación. clcp
PPSX
Villa Victoria Mar Del Plata
PDF
Curriculum Febbraio 2009
PDF
Genano professional air decontamination
PPT
Fuerza vital, cómo recuperarla
PDF
Dossier pédagogique Visages d'enfants par Anne Andrist
PDF
Dsg Studie Emotions
PPTX
Welcomm Presentation 2
PDF
Presentacion athagon ingame
PDF
Master en Dirección y Gestión de Empresas de Moda
PDF
Reiner 940 HandJet printer
PDF
How Consumers Engage with Mobile Apps
PDF
120925 meroni polimi desis lab
PDF
Green with liability
PDF
Tarjeta prepago BN E-credit Mástercard
PDF
The Search For Peace Pdrc
CIMAT 2011 - Problemáticas
Centro De Innovacion En Productividad Presentacion
Concepts of IT-Based Modern Living
Actividad 1. módulo vi. sustentación. clcp
Villa Victoria Mar Del Plata
Curriculum Febbraio 2009
Genano professional air decontamination
Fuerza vital, cómo recuperarla
Dossier pédagogique Visages d'enfants par Anne Andrist
Dsg Studie Emotions
Welcomm Presentation 2
Presentacion athagon ingame
Master en Dirección y Gestión de Empresas de Moda
Reiner 940 HandJet printer
How Consumers Engage with Mobile Apps
120925 meroni polimi desis lab
Green with liability
Tarjeta prepago BN E-credit Mástercard
The Search For Peace Pdrc
Ad

Similar to Harnessing Linked Knowledge Sources for Topic Classification in Social Media (20)

PPTX
Contextual Ontology Alignment - ESWC 2011
PDF
Effective Semantics for Engineering NLP Systems
PPTX
NLP & DBpedia
PPTX
ESWC 2011 BLOOMS+
PDF
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
PDF
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
PDF
bridging formal semantics and social semantics on the web
PPTX
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
PDF
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
PPTX
AI_Session 21 First order logic.pptx
PPTX
Quantifying the bias in data links
PDF
Open IE tutorial 2018
PDF
Framester and WFD
PPTX
Deep Learning for Search
PDF
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
PDF
Table Retrieval and Generation
PPTX
Different Semantic Perspectives for Question Answering Systems
PPT
Ontological on Engineering Presentation1
PDF
LDAvis
PPTX
How the Web can change social science research (including yours)
Contextual Ontology Alignment - ESWC 2011
Effective Semantics for Engineering NLP Systems
NLP & DBpedia
ESWC 2011 BLOOMS+
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
bridging formal semantics and social semantics on the web
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
AI_Session 21 First order logic.pptx
Quantifying the bias in data links
Open IE tutorial 2018
Framester and WFD
Deep Learning for Search
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Table Retrieval and Generation
Different Semantic Perspectives for Question Answering Systems
Ontological on Engineering Presentation1
LDAvis
How the Web can change social science research (including yours)

More from Amparo Elizabeth Cano Basave (13)

PDF
A Study of the Impact of Persuasive Argumentation in Political Debates
PDF
Detecting child grooming behaviour patterns on social media
PPTX
Violence det ijcnlp13-slideshare
PDF
Volatile Classification of Point of Interests based on Social Activity Streams
PDF
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
PPTX
PPTX
Entity-Based Semantics Emerging from Personal Awareness Streams
PDF
Ekaw2010 tutorial3 practical
PDF
Ekaw2010 tutorial3
PPTX
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
PDF
PDF
Veracity- Modeling and Proving Trustworthiness of Web Resources
A Study of the Impact of Persuasive Argumentation in Political Debates
Detecting child grooming behaviour patterns on social media
Violence det ijcnlp13-slideshare
Volatile Classification of Point of Interests based on Social Activity Streams
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Entity-Based Semantics Emerging from Personal Awareness Streams
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Veracity- Modeling and Proving Trustworthiness of Web Resources

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

  • 1. A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and Yulan He° Knowledge Media Institute, The Open University, Milton Keynes Ÿ University of Sheffield, Sheffield w Lancaster University, Lancaster ° Aston University, Birmingham UK. 2013 Harnessing Linked Knowledge Sources for Topic Classification in Social Media
  • 2. INTRODUCTION Social Media Streams - Risk in violent and criminal activities
  • 3. INTRODUCTION Research Questions: o  Can semantic features help in topic classification (TC)? o  Which knowledge source (KS) data and KS taxonomies provide useful information for improving the TC of tweets?
  • 4. OUTLINE • Introduction - Topic Classification (TC) of Microposts - Related Work - State of the art limitations • Proposed Approach • Experiments • Findings • Conclusions
  • 5. INTRODUCTION u  Difficulties of Topic Classification of microposts o  Restricted number of characters o  Irregular and ill-formed words •  Mixing upper and lowercase letter §  Makes it difficult to detect proper nouns, and other part of speech tags. •  Wide variety of language §  E.g., “see u soon” o  Event-dependent emerging jargon • Volatile jargon relevant to particular events §  E.g., “Jan.25” (used during the Egyptian revolution o  High Topical Diversity o  Sparse data
  • 6. INTRODUCTION Social Knowledge Sources (KS) DBpedia* Yago2 Freebase Resources 2.35 million 447million 3.6 million Classes 359 562,312 1,450 Properties 1,820 253,213,842 7,000 *Using dbpedia ontology o  Structured Semantic Web Representation of data •  Maintained by thousand of editors §  E.g DBpedia, derived from Wikipedia §  Freebase •  Evolves and adapts as knowledge changes [Syed et al, 2008] o  Cover a broad range of topics o  Characterise topics with a large number of resources
  • 7. INTRODUCTION Local and External Metadata of a Tweet
  • 8. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person
  • 9. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person <http://dbpedia.org/resource/Barack_Obama <http://dbpedia.org/resource/Egypt <http://dbpedia.org/resource/Hosni_Mubarak
  • 10. PROPOSED APPROACH o  State of the art limitations §  Use of single knowledge sources §  Entities’ metadata is constrained by the used NER service (e.g OpenCalais, Alchemy). o  Our approach §  Exploits multiple knowledge sources. §  Enhances the entity metadata by deriving semantic graphs. §  Leverages the graph structures surrounding entities present in a KS for the TC task. Exploiting Knowledge Sources for the Topic Classification of Microposts
  • 11. OUTLINE • Introduction • Proposed Approach • Semantic Meta-graphs • Weighting Schemas • Enhancing TC with Semantic Features • Experiments • Findings • Conclusions
  • 13. PROPOSED APPROACH Rationale… 1 2 Could be more indicative of War and Conflict
  • 14. PROPOSED APPROACH Rationale… 2 Not necessarily a good indicator of War and Conflict
  • 15. PROPOSED APPROACH Rationale… 1 2 Can the graph structure of existing Knowledge sources provide an abstraction of the use of these entity types for representing a topic ?
  • 16. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 1 Datasets Collection SPARQL query for all resources from a given Topic (e.g. War )
  • 17. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 18. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 19. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 20. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 3 Semantic Features Derivation
  • 21. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 22. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 23. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 24. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 25. PROPOSED APPROACH Definition 1- Resource Meta-graph Is a sequence of tuples G:=(R,P,C,Y) where •  R, P, C are finite sets whose elements are resources, properties and classes; •  Y is a ternary relation representing a hypergraph with ternary edges. •  Y is a tripartite graph where the vertices are Y ! R " P "C H Y( ) = V, D D = r, p,c{ } r, p,c( ) ! Y{ }
  • 26. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author
  • 27. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author How can we weight these graphs to reveal semantic features characterise Obama in the context of Violence? ? ? ? ? ?? ?
  • 28. PROPOSED APPROACH Weighting Semantic Features Specificity Measures the relative importance of a property to a given class in a KS graph GKS: p ! G e( ) c ! G e( ) specificityKS p,c( ) = pN R(c)( ) N(R(c))
  • 29. PROPOSED APPROACH Weighting Semantic Features Generality Captures the specialisation of a property p to a given class c, by computing the property’s frequency among other semantically related classes R’(c). Where N(R’(c)) is the number of resources whose type is either c or a specialisation of c’s parent classes. generalityKS p,c( ) = N R'(c)( ) pN (R'(c))
  • 30. PROPOSED APPROACH Weighting Semantic Features SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
  • 31. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF
  • 32. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/ populationDensity…. FA1+ C PopulatedPlace, Office_holder, PresidentOfTheUnitedStates, Politician…
  • 33. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c) + Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F
  • 34. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c)+Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA2+ parent(c) Place, Office_holder, President, Politician…
  • 36. PROPOSED APPROACH Datasets o  Twitter Dataset [Abel et al., 2011] (TW) §  Collected during two months starting on Nov 2010. §  Topically annotated §  Using tweets labelled as “War & Conflict” (War), “Law & Crime” (Cri), “Disaster & Accident” (DisAcc). §  Multilabelled dataset comprising 10,189 Tweets. o  DBpedia (DB) and Freebase (FB) Dataset §  SPARQL queried endpoints for all resources from categories and subcategories of skos:concept of War, Cri, DisAcc. •  DBpedia – 9,465 articles •  Freebase – 16,915 articles
  • 38. PROPOSED APPROACH Experimental Setup A 1.  Use annotated Tweets for training (TW) -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the TW corpus. Trained/ Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 40. PROPOSED APPROACH Experimental Setup B 1.  Use labelled articles from DBpedia (DB) and Freebase (FB) for training -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the DB, FB, DB+FB, DB +FB+TW training corpus and test on TW. Trained/Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 41. PROPOSED APPROACH Results for Training on KS articles, and Testing on TW
  • 42. PROPOSED APPROACH Factors contributing to the performance of a KS graph for TC 1.  Topic-Class Entropy 2.  Entity-Class Entropy 3.  Topic-Class-Property Entropy
  • 43. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers.
  • 44. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers. Indicates that the higher the number of ambiguous entities in a topic within a KS graph, the lower the performance of the TC.
  • 45. FINDINGS 1.  KSs combined with Twitter data provide complementary information for TC of Tweets, outperforming the KS approaches and the approach using Tweets only. 2.  A KS performance on TC depends on the coverage of the entities within that KS. 3.  When entities have low coverage in a KS, exploiting the mapping between corresponding KSs’ ontologies is beneficial.
  • 46. CONCLUSIONS •  Explored the task of topic classification of tweets •  Exploited information in KSs (e.g. DBpedia, Freebase) using semantic graphs for concepts and properties surrounding an entity. •  Presented the importance of considering graph structures in KSs for the supervised classification of tweets, by achieving significant improvement over various state-of-the-art approaches using both single KSs and Tweets only.
  • 47. CONTACT US A.  Elizabeth Cano •  http://people.kmi.open.ac.uk/cano/ B.  Andrea Varga •  http://sites.google.com/site/missandreavarga/ C.  Matthew Rowe •  http://lancs.ac.uk/staff/rowem/ D.  Fabio Ciravegna •  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna E.  Yulan He •  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

Editor's Notes

  • #2: I will present a work done in collaboration with the universities of sheffield, lancaster and Aston. This work was done as part of the Violence Detection project which investigates different approaches for the detection of violence-related events emerging from social media streams.
  • #3: During the last 2 years we have witnessed the use of these services to express different emotions within society; these services have become a proxy of information which communicates the social perception of situations regarding for exampleTerrorismSocial Crisis RacismTherefore the real time identification of the topics discussed in these channels could aid in different scenarios includeing violence detection and emergency response situations.
  • #14: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  • #15: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  • #16: These two tweets make reference to the same entity, “President Obama”.However the context in which the entity is used is different, in the first case, the co-occurrence of Obama, Egypt and Mubarak could be more indicative of the War and Conlict topic, while in the second case the occurrence of President Obama and Michelle, is less likely to indicate a war and conflict related topic.So we wonder whether the graph structure of existing Knowledge source could aid in provide an abstraction of the use of these entity types for representing a topic.
  • #17: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  • #18: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  • #23: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  • #28: Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and ConnflictHow can we weight this graphs so as to reveal which of these features characterise Obama in the context of Violence?
  • #29: In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  • #30: In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  • #35: Where parent(c) denotes the total number of unique parent classes derived from a Ks graph.
  • #37: For evaluating the impact of enhancing the feature space with semantic features for the task of topic classification of tweets. We evaluated the performance of using a large corpus of tweets and a two large coverage KS which are Dbpedia and Freebase. The Twitter dataset was derived previously by Abel et al. and it comprises tweets which were collected during two months starting from November 2010. This dataset has been topically annotated.
  • #38: For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  • #43: For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.Topic-Class Entropy :- Low entropy(LE) indicates a focused topic, while high entropy(HE) indicates that it is more random on the subjects it discusses.Entity-Class Entropy: - LE indicates a topic is less ambiguous (i.e. entities belong to fewer classes, while (HE) high ambiguity at the level of the entities. Topic-Class-Property Entropy:- LE indicates a topic is dominated by few class-properties, while (HE) reveals high property diversity.
  • #44: The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  • #45: The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  • #47: For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  • #48: For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.