Exploiting Text and Network Context for Geolocation of Social Media Users
Afshin Rahimi,♥
Duy Vu,♠
Trevor Cohn,♥
and Timothy Baldwin♥
♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne
OVERVIEW
Task: Find the location of Twitter users based on text and net-
work information
Previous Shortcoming: No comparison of text-based and
network-based models, no use of both.
Datasets: 3 Twitter geolocation datasets:
GeoText, Twitter-US, Twitter-World.
Sample Format: userid, text, mention-list, latitude/longitude
YOU ARE WHERE YOUR WORDS SAY YOU ARE
Usage of mountain in U.S.
TEXT-BASED MODEL (LR)
Logistic regression with l1 regularisation
over k-d tree discretisation of latitude/longitude.
130 120 110 100 90 80 70 60
Longitude
25
30
35
40
45
50
Latitude
YOU ARE WHERE YOUR FRIENDS ARE
Most of our online interactions are local.
Twitter mention
NETWORK-BASED MODEL (LP)
Label Propagation in @-mention Network:
• Build an @-mention network.
• Initialise the location of training nodes with their
known location.
• Iteratively update non-training nodes’ location
to the median of their neighbours.
• Converges after 10 iterations.
NETWORK VERSUS TEXT
• For connected users, Network-based models
are more accurate.
• For disconnected users (about 20% of the
nodes), Text-based models are more accurate.
• Solution: Utilise both text and network informa-
tion together!
LABEL PROPAGATION OVER TEXT PREDICTIONS
• Initialise training nodes with their known location
and test nodes with their text-based prediction.
• Iteratively update the location of non-training
nodes to the median of their neighbours.
• Converges after 10 iterations.
• Isolated test nodes will keep their text-based
prediction.
DENVER’S TOP FEATURES
RESULTS
State of the art results over all three datasets!
GEOTEXT TwitterUS TwitterWorld0
100
200
300
400
500
600
MedianErrorinkm
Text-based Method (LR)
Network-based Method (LP)
Hybrid Method (LP-LR)
Wing and Baldrige (2014)
Ahmed et al. (2013)

More Related Content

PDF
Flux of MEME - DOW 1st semester
PPTX
Ieee social com-groupinabox-v2
PDF
Classification of Computer Networks
PDF
Kno we scape2014-thess-bouchoumarkhoff
PPTX
TMRA2009 Key Note
PDF
ENHANCING AVAILABILITY FOR DISTRIBUTED REPLICATED SERVICES CONSIDERING NETWOR...
PDF
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
PPTX
Unit 302
Flux of MEME - DOW 1st semester
Ieee social com-groupinabox-v2
Classification of Computer Networks
Kno we scape2014-thess-bouchoumarkhoff
TMRA2009 Key Note
ENHANCING AVAILABILITY FOR DISTRIBUTED REPLICATED SERVICES CONSIDERING NETWOR...
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...
Unit 302

Similar to Exploiting Text and Network Context for Geolocation of Social Media Users (20)

PDF
geolocation twitter network text geotagging
PPT
Vitus Masters Defense
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PDF
Data Tactics Data Science Brown Bag (April 2014)
PDF
Data Science and Analytics Brown Bag
RTF
Resume6272016
PDF
Social Computing Research with Apache Spark
PDF
SE-IT DSA THEORY SYLLABUS
PDF
Named Entity Recognition using Tweet Segmentation
PPTX
Contextual Ontology Alignment - ESWC 2011
ODP
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
PDF
1808.10245v1 (1).pdf
PDF
How Graph Databases used in Police Department?
PDF
090626cc tech-summit
PPTX
A technical paper presentation on Evaluation of Deep Learning techniques in S...
PPTX
03 interlinking-dass
PPT
MiningEmailSocialNetworks
PPTX
Information Management Trends 2009
PPSX
Computer Networks Foundation
PDF
Resume_new
geolocation twitter network text geotagging
Vitus Masters Defense
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Data Tactics Data Science Brown Bag (April 2014)
Data Science and Analytics Brown Bag
Resume6272016
Social Computing Research with Apache Spark
SE-IT DSA THEORY SYLLABUS
Named Entity Recognition using Tweet Segmentation
Contextual Ontology Alignment - ESWC 2011
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
1808.10245v1 (1).pdf
How Graph Databases used in Police Department?
090626cc tech-summit
A technical paper presentation on Evaluation of Deep Learning techniques in S...
03 interlinking-dass
MiningEmailSocialNetworks
Information Management Trends 2009
Computer Networks Foundation
Resume_new
Ad

Recently uploaded (20)

PDF
Microsoft Core Cloud Services powerpoint
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPT
Image processing and pattern recognition 2.ppt
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Microsoft 365 products and services descrption
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Global Data and Analytics Market Outlook Report
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Steganography Project Steganography Project .pptx
PPT
statistic analysis for study - data collection
PPTX
Leprosy and NLEP programme community medicine
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Microsoft Core Cloud Services powerpoint
retention in jsjsksksksnbsndjddjdnFPD.pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Image processing and pattern recognition 2.ppt
Pilar Kemerdekaan dan Identi Bangsa.pptx
Microsoft 365 products and services descrption
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
SET 1 Compulsory MNH machine learning intro
Global Data and Analytics Market Outlook Report
SAP 2 completion done . PRESENTATION.pptx
Steganography Project Steganography Project .pptx
statistic analysis for study - data collection
Leprosy and NLEP programme community medicine
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Introduction to Inferential Statistics.pptx
CYBER SECURITY the Next Warefare Tactics
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Ad

Exploiting Text and Network Context for Geolocation of Social Media Users

  • 1. Exploiting Text and Network Context for Geolocation of Social Media Users Afshin Rahimi,♥ Duy Vu,♠ Trevor Cohn,♥ and Timothy Baldwin♥ ♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne OVERVIEW Task: Find the location of Twitter users based on text and net- work information Previous Shortcoming: No comparison of text-based and network-based models, no use of both. Datasets: 3 Twitter geolocation datasets: GeoText, Twitter-US, Twitter-World. Sample Format: userid, text, mention-list, latitude/longitude YOU ARE WHERE YOUR WORDS SAY YOU ARE Usage of mountain in U.S. TEXT-BASED MODEL (LR) Logistic regression with l1 regularisation over k-d tree discretisation of latitude/longitude. 130 120 110 100 90 80 70 60 Longitude 25 30 35 40 45 50 Latitude YOU ARE WHERE YOUR FRIENDS ARE Most of our online interactions are local. Twitter mention NETWORK-BASED MODEL (LP) Label Propagation in @-mention Network: • Build an @-mention network. • Initialise the location of training nodes with their known location. • Iteratively update non-training nodes’ location to the median of their neighbours. • Converges after 10 iterations. NETWORK VERSUS TEXT • For connected users, Network-based models are more accurate. • For disconnected users (about 20% of the nodes), Text-based models are more accurate. • Solution: Utilise both text and network informa- tion together! LABEL PROPAGATION OVER TEXT PREDICTIONS • Initialise training nodes with their known location and test nodes with their text-based prediction. • Iteratively update the location of non-training nodes to the median of their neighbours. • Converges after 10 iterations. • Isolated test nodes will keep their text-based prediction. DENVER’S TOP FEATURES RESULTS State of the art results over all three datasets! GEOTEXT TwitterUS TwitterWorld0 100 200 300 400 500 600 MedianErrorinkm Text-based Method (LR) Network-based Method (LP) Hybrid Method (LP-LR) Wing and Baldrige (2014) Ahmed et al. (2013)