SlideShare a Scribd company logo
ELIS – Multimedia Lab
Fréderic Godin, Viktor Slavkovikj, Wesley De
Neve, Benjamin Schrauwen and Rik Van de Walle
Using Topic Models for
Twitter Hashtag Recommendation
Multimedia Lab, Ghent University – iMinds, Belgium
Reservoir Lab, Ghent University, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (1)
Indexing
Search
Linking
General Topic
Memes Grouping
Information retrieval
3
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (2)
±10% of tweets contain a hashtag
3% of the hashtags are used more than 5 times
Indexing
Search
Linking
General Topic
Memes
Grouping
4
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Goal
Suggest keywords that resemble the general topic of a tweet
and that could be used as a hashtag
Promote hashtags for effective indexing
Allow for effective search of tweets through hashtags
Reduce the use of sparse hashtags
5
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Architectural overview
Basic filterTweet
Language
identification
Topic
distribution
Hashtag
suggestion
Hashtagged
tweet
6
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Basic filter
Clean up the tweet: URLs, special HTML entities, digits,
punctuations, the hash character, …
During training:
Remove tweets with just one word
Remove retweets
7
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Language identification
Why We need to build a language-dependent topic model.
Goal Build unsupervised classifier that discriminates between
English and non-English tweets.
How Using Naive Bayes and the Expectation-Maximization
algorithm + character n-gram features
Result Evaluation on a test set of 1000 randomly selected tweets
Lui & Baldwin (LangID.py) Our algorithm
Precision 97.9% 97.0%
Recall 91.8% 97.8%
F1 94.8% 97.4%
8
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Calculating the topic distribution
Idea Find the general topic(s) of a tweet
How Using Latent Dirichlet Allocation to find
the topic distribution in an unsupervised manner
Training 1.8 million tweets pre-filtered on 4000 keywords
200 topics, α=0.1, β=0.1
Example “Please RT!! sign Bernie Sanders petition for the
fiscal cliff! http://..”
0 1 2 3 57 199
[0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05]
Topic 57:
1. Fiscal
2. Political
3. President
…
9
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (1)
Idea Suggest a number of hashtags based on
the topic distribution of the tweet
How Sample the topic distribution and suggest
the top ranked keywords
Yay, we got sixth period today school business light time period
Please RT!! Sign Bernie Sanders
petition for the fiscall! Http://..
fiscal political traffic president policy
comfort, elegance, prettiness little good love relationship god
Example
10
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (2)
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10
Percentageoftweets(%)
Number of correctly suggested hashtags
5 hashtags
10 hashtags
Evaluation of 100 tweets
11
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Conclusions and Future Work
We built a hashtag recommendation system:
Suggests general keywords
Unsupervised
In the future:
Use more context information: semantic web,
social graph,…
Adopt a hybrid approach between general and specific
hashtags
12
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin

More Related Content

PPTX
Going online to enhance face to-face teaching dublin may 2014
PDF
Reverse Engineering Twitter Hashtag Algorithm
PPTX
Twitter Hashtag
PPT
PDF
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
PDF
Skip, residual and densely connected RNN architectures
PDF
Improving Language Modeling using Densely Connected Recurrent Neural Networks
PPTX
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Going online to enhance face to-face teaching dublin may 2014
Reverse Engineering Twitter Hashtag Algorithm
Twitter Hashtag
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Skip, residual and densely connected RNN architectures
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
project resource management chapter-09.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Chapter 5: Probability Theory and Statistics
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mushroom cultivation and it's methods.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Hybrid model detection and classification of lung cancer
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Tartificialntelligence_presentation.pptx
project resource management chapter-09.pdf
A comparative analysis of optical character recognition models for extracting...
TLE Review Electricity (Electricity).pptx
MIND Revenue Release Quarter 2 2025 Press Release
Chapter 5: Probability Theory and Statistics
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Web App vs Mobile App What Should You Build First.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Mushroom cultivation and it's methods.pdf
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Hybrid model detection and classification of lung cancer
Enhancing emotion recognition model for a student engagement use case through...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Ad
Ad

Using Topic Models for Twitter hashtag recommendation

  • 1. ELIS – Multimedia Lab Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Using Topic Models for Twitter Hashtag Recommendation Multimedia Lab, Ghent University – iMinds, Belgium Reservoir Lab, Ghent University, Belgium Image and Video Systems Lab, KAIST, South Korea
  • 2. 2 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (1) Indexing Search Linking General Topic Memes Grouping Information retrieval
  • 3. 3 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (2) ±10% of tweets contain a hashtag 3% of the hashtags are used more than 5 times Indexing Search Linking General Topic Memes Grouping
  • 4. 4 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Goal Suggest keywords that resemble the general topic of a tweet and that could be used as a hashtag Promote hashtags for effective indexing Allow for effective search of tweets through hashtags Reduce the use of sparse hashtags
  • 5. 5 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Architectural overview Basic filterTweet Language identification Topic distribution Hashtag suggestion Hashtagged tweet
  • 6. 6 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Basic filter Clean up the tweet: URLs, special HTML entities, digits, punctuations, the hash character, … During training: Remove tweets with just one word Remove retweets
  • 7. 7 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Language identification Why We need to build a language-dependent topic model. Goal Build unsupervised classifier that discriminates between English and non-English tweets. How Using Naive Bayes and the Expectation-Maximization algorithm + character n-gram features Result Evaluation on a test set of 1000 randomly selected tweets Lui & Baldwin (LangID.py) Our algorithm Precision 97.9% 97.0% Recall 91.8% 97.8% F1 94.8% 97.4%
  • 8. 8 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Calculating the topic distribution Idea Find the general topic(s) of a tweet How Using Latent Dirichlet Allocation to find the topic distribution in an unsupervised manner Training 1.8 million tweets pre-filtered on 4000 keywords 200 topics, α=0.1, β=0.1 Example “Please RT!! sign Bernie Sanders petition for the fiscal cliff! http://..” 0 1 2 3 57 199 [0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05] Topic 57: 1. Fiscal 2. Political 3. President …
  • 9. 9 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (1) Idea Suggest a number of hashtags based on the topic distribution of the tweet How Sample the topic distribution and suggest the top ranked keywords Yay, we got sixth period today school business light time period Please RT!! Sign Bernie Sanders petition for the fiscall! Http://.. fiscal political traffic president policy comfort, elegance, prettiness little good love relationship god Example
  • 10. 10 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (2) 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9 10 Percentageoftweets(%) Number of correctly suggested hashtags 5 hashtags 10 hashtags Evaluation of 100 tweets
  • 11. 11 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Conclusions and Future Work We built a hashtag recommendation system: Suggests general keywords Unsupervised In the future: Use more context information: semantic web, social graph,… Adopt a hybrid approach between general and specific hashtags
  • 12. 12 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin

Editor's Notes

  • #3: Footer: Micropost -> Microposts
  • #5: … and that could be used …Allow for effective search of tweets (through hashtags)
  • #8: Remove the full stopsLanguage dependent -> Language-dependentWhy? -> Why (for reasons of consistency)
  • #9: Those 4000 keywords are used to getsomemeaningfultweets. Otherwise the set was to big for training the algorithm. Ifyoutake a smaller sample than 4 days, thenagainyou of to few coherent tweets to train the model. Thosekeywordsdon’tbecome the most important keywordwithin a topic. Ex. Keyword president. The topic was fiscalcliff and politicalproblems.
  • #10: Misschienverduidelijken hoe je de verdeling van de topics bemonsterd?Op de vorige slide misschienookverduidelijken hoe je de topics hebtgeselecteerd?
  • #12: an hashtag -> a hashtagsocial graph -> social graph, …To suggest general keywords -> Suggests general keywordsFuture work: anderetechniekenom topics tebepalen? Bayesian inference, deep learning, … ;-)?