SlideShare a Scribd company logo
Stretching the Life of Twitter 
Classifiers with Time-Stamped 
Semantic Graphs 
A. Elizabeth Cano (@pixarelli) 
amparo.cano@open.ac.uk 
Yulan He 
y.he9@aston.ac.uk 
Harith Alani 
h.alani@open.ac.uk 
1
Introduction Social Media Streams 
2
Introduction Representing Topics in 
Dynamic Environments 
Techniques for topic classification of Social Media 
are sensitive to the evolution of topics 
#Jan24 
dead 
Egypt 
protest 
security 
Egypt 
Pres Morsi 
Tehran 
Syrian 
uprising 
Boston 
bombing 
suspect 
Watertown 
Obama 
strategy 
ISIS 
3 dead in protest 
in Egypt. Security 
official vows to ‘deal 
firmly..#Jan24 
Egypt Pres Morsi 
uses his visit to 
Tehran to praise the 
Syrian uprising 
#Boston bombing 
suspect “pinned 
down” on boat in 
Watertown 
Why Obama needs to 
rethink his entire ISIS 
strategy… 
2011 
2012 
2013 
2014 
3
Introduction 
Challenges 
• Keeping updated model requires regular 
retuning. 
• Manual annotation expensive 
Questions 
• Which feature types provide a more stable 
representation of a topic? 
4
Introduction Previous work 
Using local features 
• Bag of Words (BoW)[Genc et al., 2011] 
• BoW + Bag of Entities (BoE) [Vitale et al., 2012] 
• BoW + BoE + Part of Speech (PoS) tagging [Munoz et al., 
2011][Varga et al., 2012] 
Exploiting the link structure of a Knowledge Source 
• Exploiting categories containing entities [Michelson et al., 
2010] 
• Relating tweets with Wikipedia resources[Milne et al., 2008] 
[Xu et al., 2011]. 
• Use of semantic features for topic classification [Cano et al., 
2013] [Varga et al.,2014]. 
5
Introduction Topic Evolution 
Twitter 
Corpus 
Topic 
. . . . 
. . . . 
t 
t+1 
. . . . 
Seman7c 
Lexical 
6
Introduction Characterising Topic Changes 
with DBpedia 
Some features remain unchanged, others 
provide information of past, current or future 
contexts (e.g. dbp:UnitedStatesPresidentialCandidates)! 
dbo:wikiPageWikiLink 
3.8 DBPEDIA dbp:Budget_Control_Act_of_2011 
3.7 DBPEDIA 
dbp:Al-Qaeda category:United_States_presidential_candidates,_2012 
dbp:Hawaii dbo:birthPlace 
dbp:Barack_Obama 
rdf:type 
yago:PresidentOfTheUnitedStates 
rdfs:subClassOf 
dbo:Person 
dbo:author 
dbp:Michelle_Obama 
dbo:spouse 
skos:subject 
dbp:The_Audacity_of_Hope 
.. 
dbp:Dreams_from_My_Father 
. . 
category:Community_organisers 
. . 
category:Columbia_University_Alumni 
3.6 DBPEDIA 
skos:subject 
dbo:leader 
dbp:United_States_National_Council 
dbp:National_Science_and_Techology 
dbo:wikiPageWikiLink 
7
Approach DBpedia Graph Snapshots 
Definition: 
Time-dependent Resource Meta Graph! 
Is a sequence of tuples G:=(R,P,C,Y, ft) where 
• R, P, C are finite sets whose elements are 
resources, properties and classes; 
• Y is a ternary relation 
Y ⊆ R× P ×C 
representing a hypergraph with ternary edges. 
• Y is a tripartite graph H (Y ) = V,D 
where the 
vertices are 
D = {{r, p, c} (r, p, c) ∈Y} 
• ft assigns a temporal marker to each ternary 
edge. 
8
Approach Semantic Representation of a 
Tweet 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbp: http://dbpedia.org/resource/ 
9
Approach Semantic Representation of a 
Tweet 
Class Features (rdf:type) 
<dbo:OfficeHolder> rdf:type 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
rdf:type 
<yago:NobelPeacePrizeLaureates> 
rdf:type 
<dbo:Country> 
rdf:type 
<dbo:Broadcaster> 
dbo: http://dbpedia.org/ontology/ 
10
Approach Semantic Representation of a 
Tweet 
dbprop:title 
<dbp:Hosni_Mubarak> 
Property Features 
<dbp:Prime_Minister_of_Egypt> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbprop:nationality 
American 
dbprop:headquarters 
<dbp:Altanta> 
dbprop:languages 
<dbp:Egyptian_Arabic> 
11 
skos: http://dbpedia.org/resource/Category:
Approach Semantic Representation of a 
Tweet 
Category Features (skos) 
<skos:PresidentsOfEgypt> dcterms:subject 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dcterms:subject 
dcterms:subject 
dcterms:subject 
<skos:English-language_television_stations> 
<skos:Presidents_of_the_United_States 
<skos:Arab_republics> 
12 
skos: http://dbpedia.org/resource/Category:
Approach Semantic Representation of a 
Tweet 
dbprop:title 
<dbp:Hosni_Mubarak> 
Resource Features 
<dbp:Prime_Minister_of_Egypt> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
dbprop:commander 
dbprop:headquarters 
<dbp:Altanta> 
<dbp:Death_Of_Osama_Bin_Laden> 
dbprop:languages 
<dbp:Egyptian_Arabic> 
13 
skos: http://dbpedia.org/resource/Category:
Approach DBpedia Graph Snapshots 
I.e. The meta-graph of entity e is the aggregation of 
all resources, properties and classes related to this 
entity at time t. 
Properties and Resources 
<dbp:Barack_Obama> 
DBpedia 3.6 3.7 3.8 …. 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:commander 
prop:spouse 
<MichelleObama> 
prop:birthPlace 
<Hawaii> 
prop:wikiPageWikiLink 
<UnitedStatesPresidentialCandidates> 
prop:wikiPageWikiLink 
<dbp:Death_Of_Osama_Bin_Laden> 
14 
<Budget_Control_Act_of_2011>
Approach Semantic Feature Weighting 
Strategies 
Topic Relevance-based Weighting Strategy: 
Characterise the global relevance of a semantic feature to a 
given topic in DBpedia at a given point in time. 
DBpedia Graph Topic graph in DBpedia Graph 
? 
15
Approach Semantic Feature Weighting 
Strategies 
Topic Relevance-based Weighting Strategy: 
• Class-based Topic Relevance (ClsW) 
• Property-based Topic Relevance (PropW) 
• Category-based Topic Relevance (CatW) 
• Resource Relevance (ResW) 
16
Approach Semantic Feature Weighting 
Strategies 
Integrating weights into a Tweet representation 
DB_ t Wx ( f ) = DB_ t Nx ( f ) +1 
F + 
DB_ t Nx ( f ') f '∈F Σ 
# 
%% 
$ 
& 
(( 
' 
∗ WDB_ t ( f ) #$ 
1/2 
&' 
Semantic feature f in a document x: 
Frequency with Laplace smoothing 
Weight derived from DB_t graph 
17
Experiments Framework for Twitter Topic 
Classification with DBpedia 
• Do semantic features built from DBpedia Graphs 
18 
aid on a cross-epoch topic classification of 
Tweets? 
• Which feature type provides a more stable topic 
representation over time?
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
19
Experiments Datasets 
Tweets 
2010 2011 2013 
Nov-Dec Aug Sep 
1x106 1x106 1x106 
Assigns a topic label from a 
pool of over 10 categories 
Violence Related Topics 
Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) 
Perform Manual Annotation until 1K per year per Topic 
Negative set 1K per year for Topics other than the 3 
12K annotated tweets 
20
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
Concept Enrichment 
<dbp:Hosni_Mubarak> 
<dbp:Barack_Obama> <dbp:Egypt> 
<dbp:CNN> 
21
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
22
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
DBpedia Topic 
Relevance based 
Feature Weighting 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
23
Experiments Datasets 
LEX 
24 
W&C D&A L&C NEG 
2010 
2011 
2013 
2010 
2011 
2013 
2010 
2011 
2013 
2010 
2011 
2013 
SEMANTIC 
BoW Category Property Resource Class
Experiments Framework for Twitter Topic 
Classification with DBpedia 
Concept Enrichment 
Resource Backtrack Mapping 
2010 2011 2013 
Deriving Semantic Graph 
Snapshots 
Topic 
Labelled 
Microposts 
2010 
2011 
2013 
Build Topic 
Classifier 
DBpedia Topic 
Relevance based 
Feature Weighting 
Microposts 
2010 
Dumps 
2011 
3.6 
3.7 
3.8 
2013 Resources 
3.9 
25
Experiments Understanding the Stability of 
a Topic Representation 
Same epoch Scenario 
Lexical Semantic Combined Epoch t t+1 
train test 
26
Experiments Epoch Scenarios 
Same epoch Scenario (Trained on 2010- Tested on 2010) 
All 
the 
experiments 
reported 
in 
our 
paper 
were 
conducted 
using 
a 
10-­‐fold 
cross 
valida7on 
seMng 
Disaster_Acc Law_Crime War_Conflict 
F1 F1 F1 
BoW 0.831 0.765 0.844 
Category 0.697 0.650 0.744 
Property 0.680 0.639 0.720 
Resource 0.692 0.637 0.762 
Class 0.633 0.583 0.637 
27
Experiments Understanding the Stability of 
a Topic Representation 
Same epoch Scenario 
Lexical Semantic Combined Epoch t t+1 
train test 
Cross-epoch Scenario 
train test 
t t+1 28
Experiments Epoch Scenarios 
Cross-epoch Scenario (Trained on 2010- Tested on X) 
Disaster_Acc 
Cross- 
Epoch 
2010-2011 2010-2013 2011-2013 Average 
F1 F1 F1 
BoW 0.634 0.481 0.261 0.458 
Category 0.683 0.539 0.524 0.582 
Property 0.665 0.557 0.502 0.603 
Resource 0.774 0.544 0.445 0.587 
Class 0.691 0.665 0.669 0.675 
29
Experiments Epoch Scenarios 
Averaged Cross-epoch Scenarios 
Disaster_Acc Law_Crime War_Conflict Average 
F1 F1 F1 
BoW 0.458 0.620 0.531 0.536 
Category 0.582 0.537 0.453 0.55 
Property 0.574 0.504 0.506 0.528 
Resource 0.587 0.578 0.466 0.544 
Class 0.675 0.647 0.664 0.665 
30
Conclusions 
• Semantic Features are much slower to decay 
than lexical features. 
• Semantic representation improve performance in 
cross-time setting scenarios. 
• Class based features alone achieve on average a 
gain of 7% over lexical features on cross-epoch 
setting scenarios. 
31
Future Work 
• Concept-drift tracking for transfer learning using 
Linked Data sources. 
• Study cross-epoch transfer learning approaches 
using semantic features. 
32
Questions 
ampaeli@gmail.com 
@pixarelli 
33

More Related Content

PDF
Entity Search: The Last Decade and the Next
PDF
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
PDF
Category-Level Transfer Learning from Knowledge Base to Microblog Stream for ...
PDF
Table Retrieval and Generation
PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
PPTX
Federated Query Formulation and Processing Through BioFed
PPTX
Oles Petriv “Creating one concept embedding space for persons, brands and new...
PPTX
Efficient source selection for sparql endpoint federation
Entity Search: The Last Decade and the Next
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
Category-Level Transfer Learning from Knowledge Base to Microblog Stream for ...
Table Retrieval and Generation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Federated Query Formulation and Processing Through BioFed
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Efficient source selection for sparql endpoint federation

Viewers also liked (16)

PPS
Locklear
PPTX
Violence det ijcnlp13-slideshare
PDF
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
PPT
Pedir Servir Traer
KEY
Product CEO vs The World
PPTX
PDF
Detecting child grooming behaviour patterns on social media
PDF
Ekaw2010 tutorial3 practical
PPTX
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
PDF
A Study of the Impact of Persuasive Argumentation in Political Debates
PDF
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
PDF
Volatile Classification of Point of Interests based on Social Activity Streams
PPT
Units Of Measurement Spanish
PPT
Introduction to Biometric lectures... Prepared by Dr.Abbas
PPT
Reflexive Verb Intro
PPT
El Modo Imperativo Updated
Locklear
Violence det ijcnlp13-slideshare
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Pedir Servir Traer
Product CEO vs The World
Detecting child grooming behaviour patterns on social media
Ekaw2010 tutorial3 practical
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
A Study of the Impact of Persuasive Argumentation in Political Debates
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Volatile Classification of Point of Interests based on Social Activity Streams
Units Of Measurement Spanish
Introduction to Biometric lectures... Prepared by Dr.Abbas
Reflexive Verb Intro
El Modo Imperativo Updated
Ad

Similar to Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs (20)

ODP
Exploiting Linked Open Data and Natural Language Processing for Classificati...
PDF
Effective Semantics for Engineering NLP Systems
PDF
News construction from microblogging post using open data
PPT
Towards Linked Ontologies and Data on the Semantic Web
PPTX
Language of Politics on Twitter - 03 Analysis
PPTX
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
PPTX
The Unreasonable Effectiveness of Metadata
PDF
Analyzing-Threat-Levels-of-Extremists-using-Tweets
PDF
NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...
PPTX
Building an LDA topic model using Wikipedia
PDF
Intro to Exhibit Workshop
PPTX
Identifying Topics in Social Media Posts using DBpedia
PPTX
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...
PDF
Linked data for knowledge curation in humanities research
PPTX
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
PDF
Topic models, vector semantics and applications
PPTX
A framework for real time semantic social media analysis
PDF
Temporal information extraction in the general and clinical domain
PPTX
Reimagining the Digital Monograph: Improving the Discovery and Use of Scholar...
Exploiting Linked Open Data and Natural Language Processing for Classificati...
Effective Semantics for Engineering NLP Systems
News construction from microblogging post using open data
Towards Linked Ontologies and Data on the Semantic Web
Language of Politics on Twitter - 03 Analysis
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
The Unreasonable Effectiveness of Metadata
Analyzing-Threat-Levels-of-Extremists-using-Tweets
NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...
Building an LDA topic model using Wikipedia
Intro to Exhibit Workshop
Identifying Topics in Social Media Posts using DBpedia
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...
Linked data for knowledge curation in humanities research
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Topic models, vector semantics and applications
A framework for real time semantic social media analysis
Temporal information extraction in the general and clinical domain
Reimagining the Digital Monograph: Improving the Discovery and Use of Scholar...
Ad

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
1_Introduction to advance data techniques.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Foundation of Data Science unit number two notes
Supervised vs unsupervised machine learning algorithms
Fluorescence-microscope_Botany_detailed content
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
1_Introduction to advance data techniques.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
Business Acumen Training GuidePresentation.pptx
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction-to-Cloud-ComputingFinal.pptx
Foundation of Data Science unit number two notes

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

  • 1. Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs A. Elizabeth Cano (@pixarelli) amparo.cano@open.ac.uk Yulan He y.he9@aston.ac.uk Harith Alani h.alani@open.ac.uk 1
  • 3. Introduction Representing Topics in Dynamic Environments Techniques for topic classification of Social Media are sensitive to the evolution of topics #Jan24 dead Egypt protest security Egypt Pres Morsi Tehran Syrian uprising Boston bombing suspect Watertown Obama strategy ISIS 3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24 Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising #Boston bombing suspect “pinned down” on boat in Watertown Why Obama needs to rethink his entire ISIS strategy… 2011 2012 2013 2014 3
  • 4. Introduction Challenges • Keeping updated model requires regular retuning. • Manual annotation expensive Questions • Which feature types provide a more stable representation of a topic? 4
  • 5. Introduction Previous work Using local features • Bag of Words (BoW)[Genc et al., 2011] • BoW + Bag of Entities (BoE) [Vitale et al., 2012] • BoW + BoE + Part of Speech (PoS) tagging [Munoz et al., 2011][Varga et al., 2012] Exploiting the link structure of a Knowledge Source • Exploiting categories containing entities [Michelson et al., 2010] • Relating tweets with Wikipedia resources[Milne et al., 2008] [Xu et al., 2011]. • Use of semantic features for topic classification [Cano et al., 2013] [Varga et al.,2014]. 5
  • 6. Introduction Topic Evolution Twitter Corpus Topic . . . . . . . . t t+1 . . . . Seman7c Lexical 6
  • 7. Introduction Characterising Topic Changes with DBpedia Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)! dbo:wikiPageWikiLink 3.8 DBPEDIA dbp:Budget_Control_Act_of_2011 3.7 DBPEDIA dbp:Al-Qaeda category:United_States_presidential_candidates,_2012 dbp:Hawaii dbo:birthPlace dbp:Barack_Obama rdf:type yago:PresidentOfTheUnitedStates rdfs:subClassOf dbo:Person dbo:author dbp:Michelle_Obama dbo:spouse skos:subject dbp:The_Audacity_of_Hope .. dbp:Dreams_from_My_Father . . category:Community_organisers . . category:Columbia_University_Alumni 3.6 DBPEDIA skos:subject dbo:leader dbp:United_States_National_Council dbp:National_Science_and_Techology dbo:wikiPageWikiLink 7
  • 8. Approach DBpedia Graph Snapshots Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where • R, P, C are finite sets whose elements are resources, properties and classes; • Y is a ternary relation Y ⊆ R× P ×C representing a hypergraph with ternary edges. • Y is a tripartite graph H (Y ) = V,D where the vertices are D = {{r, p, c} (r, p, c) ∈Y} • ft assigns a temporal marker to each ternary edge. 8
  • 9. Approach Semantic Representation of a Tweet <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbp: http://dbpedia.org/resource/ 9
  • 10. Approach Semantic Representation of a Tweet Class Features (rdf:type) <dbo:OfficeHolder> rdf:type <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> rdf:type <yago:NobelPeacePrizeLaureates> rdf:type <dbo:Country> rdf:type <dbo:Broadcaster> dbo: http://dbpedia.org/ontology/ 10
  • 11. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Property Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:nationality American dbprop:headquarters <dbp:Altanta> dbprop:languages <dbp:Egyptian_Arabic> 11 skos: http://dbpedia.org/resource/Category:
  • 12. Approach Semantic Representation of a Tweet Category Features (skos) <skos:PresidentsOfEgypt> dcterms:subject <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dcterms:subject dcterms:subject dcterms:subject <skos:English-language_television_stations> <skos:Presidents_of_the_United_States <skos:Arab_republics> 12 skos: http://dbpedia.org/resource/Category:
  • 13. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Resource Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:commander dbprop:headquarters <dbp:Altanta> <dbp:Death_Of_Osama_Bin_Laden> dbprop:languages <dbp:Egyptian_Arabic> 13 skos: http://dbpedia.org/resource/Category:
  • 14. Approach DBpedia Graph Snapshots I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t. Properties and Resources <dbp:Barack_Obama> DBpedia 3.6 3.7 3.8 …. prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:commander prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:wikiPageWikiLink <UnitedStatesPresidentialCandidates> prop:wikiPageWikiLink <dbp:Death_Of_Osama_Bin_Laden> 14 <Budget_Control_Act_of_2011>
  • 15. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time. DBpedia Graph Topic graph in DBpedia Graph ? 15
  • 16. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: • Class-based Topic Relevance (ClsW) • Property-based Topic Relevance (PropW) • Category-based Topic Relevance (CatW) • Resource Relevance (ResW) 16
  • 17. Approach Semantic Feature Weighting Strategies Integrating weights into a Tweet representation DB_ t Wx ( f ) = DB_ t Nx ( f ) +1 F + DB_ t Nx ( f ') f '∈F Σ # %% $ & (( ' ∗ WDB_ t ( f ) #$ 1/2 &' Semantic feature f in a document x: Frequency with Laplace smoothing Weight derived from DB_t graph 17
  • 18. Experiments Framework for Twitter Topic Classification with DBpedia • Do semantic features built from DBpedia Graphs 18 aid on a cross-epoch topic classification of Tweets? • Which feature type provides a more stable topic representation over time?
  • 19. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 19
  • 20. Experiments Datasets Tweets 2010 2011 2013 Nov-Dec Aug Sep 1x106 1x106 1x106 Assigns a topic label from a pool of over 10 categories Violence Related Topics Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Perform Manual Annotation until 1K per year per Topic Negative set 1K per year for Topics other than the 3 12K annotated tweets 20
  • 21. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> 21
  • 22. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots 22
  • 23. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 23
  • 24. Experiments Datasets LEX 24 W&C D&A L&C NEG 2010 2011 2013 2010 2011 2013 2010 2011 2013 2010 2011 2013 SEMANTIC BoW Category Property Resource Class
  • 25. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots Topic Labelled Microposts 2010 2011 2013 Build Topic Classifier DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 25
  • 26. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test 26
  • 27. Experiments Epoch Scenarios Same epoch Scenario (Trained on 2010- Tested on 2010) All the experiments reported in our paper were conducted using a 10-­‐fold cross valida7on seMng Disaster_Acc Law_Crime War_Conflict F1 F1 F1 BoW 0.831 0.765 0.844 Category 0.697 0.650 0.744 Property 0.680 0.639 0.720 Resource 0.692 0.637 0.762 Class 0.633 0.583 0.637 27
  • 28. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test Cross-epoch Scenario train test t t+1 28
  • 29. Experiments Epoch Scenarios Cross-epoch Scenario (Trained on 2010- Tested on X) Disaster_Acc Cross- Epoch 2010-2011 2010-2013 2011-2013 Average F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675 29
  • 30. Experiments Epoch Scenarios Averaged Cross-epoch Scenarios Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1 BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665 30
  • 31. Conclusions • Semantic Features are much slower to decay than lexical features. • Semantic representation improve performance in cross-time setting scenarios. • Class based features alone achieve on average a gain of 7% over lexical features on cross-epoch setting scenarios. 31
  • 32. Future Work • Concept-drift tracking for transfer learning using Linked Data sources. • Study cross-epoch transfer learning approaches using semantic features. 32