SlideShare a Scribd company logo
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
Using Hash tag Graph-Based Topic Model to Connect Semantically-Related Words
without Co-Occurrence in Micro blogs
ABSTRACT:
In this paper, we introduce a new topic model to understand the chaotic micro blogging
environment by using hashtag graphs. Inferring topics on Twitter becomes a vital but challenging
task in many important applications. The shortness and informality of tweets leads to extreme
sparse vector representations with a large vocabulary. This makes the conventional topic models
(e.g., Latent Dirichlet Allocation and Latent Semantic Analysis) fail to learn high quality topic
structures. Tweets are always showing up with rich user-generated hashtags. The hashtags make
tweets semi-structured inside and semantically related to each other. Since hashtags are utilized
as keywords in tweets to mark messages or to form conversations, they provide an additional
path to connect semantically related words. In this paper, treating tweets as semi-structured texts,
we propose a novel topic model, denoted as Hashtag Graph-based Topic Model (HGTM) to
discover topics of tweets. By utilizing hashtag relation information in hashtag graphs, HGTM is
able to discover word semantic relations even if words are not co-occurred within a specific
tweet. With this method, HGTM successfully alleviates the sparsity problem. Our investigation
illustrates that the user-contributed hashtags could serve as weakly-supervised information for
topic modeling, and the relation between hashtags could reveal latent semantic relation between
words. We evaluate the effectiveness of HGTM on tweet (hashtag) clustering and hashtag
classification problems. Experiments on two real-world tweet data sets show that HGTM has
strong capability to handle sparseness and noise problem in tweets. Furthermore, HGTM can
discover more distinct and coherent topics than the state-of-the-art baselines.
EXISTING SYSTEM:
 Although traditional methods have achieved success in uncovering topics for normal
documents (e.g., news articles, technical papers), the characteristics of tweets bring new
challenges and opportunities to them.
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
 Several methods have been proposed to tackle the serious noise and lack of context
problems in tweets. One intuitive method is to aggregate tweets as a long document.
 Hong, et al. aggregated tweets by the same user, the same word or the same hashtag.
 Mehrotra, et al. investigated different pooling schemes with hashtags for the later LDA
process.
 Weng, et al. introduced “a pseudo document” by collecting tweets under the same author.
 Yan, et al. clustered tweets by a non-negative matrix factorization.
DISADVANTAGES OF EXISTING SYSTEM:
 Compared with normal texts, tweets usually contain only a few words.
 The usage of informal language enlarges the size of the dictionary.
 They consider tweets as flat texts and ignore tag-related information contained in twitter
data. ATM (Author-Topic Model) just leverages tag information by a uniform
distribution of tags, but ignores the potential tag relation that is vitally helpful to build the
latent semantic relationship between words.
PROPOSED SYSTEM:
 We construct different kinds of hashtag graphs based on statistical information of hashtag
occurrence in a crowd sourcing manner that can be acquired without human efforts such
as labeling. Based on these hashtag graphs, we propose a novel framework of Hashtag
Graph based Topic Model (HGTM).
 The basic idea of HGTM is to project tweets into a coherent semantic space by using
latent variables via user-contributed hashtags.
 HGTM provides a robust way for noisy and sparse tweets, which is different from
traditional topic models since they normally consider only content information and ignore
explicit and potential semantic connection via noisy hashtags.
 HGTM is a probability generative model that incorporates such weakly-supervised
information based on a weighted hash tag graph. The model links tweets via both explicit
and potential tweet-hash tag relationship, so that hash tag relationship can connect
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
semantically-related words with or without co-occurrences, which alleviates severe
sparse and noise problem in short texts.
 In this paper, we extend the work and further explore the influence of different hash tag
graph construction methods and discuss more details about HGTM, including time
complexity analysis and the key process of hash tag assignment analysis.
ADVANTAGES OF PROPOSED SYSTEM:
 We evaluate HGTM on two real-world Twitter data sets to understand different kinds of
hash tag graphs and the working of HGTM on extensive tweet mining tasks such as
clustering, classification, and topic quality evaluation.
 Compared to the state-of-the-art methods, HGTM shows the ability of handling the
sparseness and noise problem in mining tweets by exploiting both explicit and potential
relations between hash tags and tweets.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
System : Pentium Dual Core.
Hard Disk : 120 GB.
Monitor : 15’’ LED
Input Devices : Keyboard, Mouse
Ram : 1GB.
SOFTWARE REQUIREMENTS:
Operating system : Windows 7.
Coding Language : JAVA/J2EE
Tool : Netbeans 7.2.1
Database : MYSQL

More Related Content

DOCX
Discovering emerging topics in social streams via link anomaly detection
PPT
Evolving social data mining and affective analysis
PDF
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
PDF
Mining social data
DOCX
Interpreting the public sentiment variations ons on twitter
PDF
Predicting Social Interactions from Different Sources of Location-based Knowl...
PDF
Graph-based Analysis and Opinion Mining in Social Network
PDF
Link prediction
Discovering emerging topics in social streams via link anomaly detection
Evolving social data mining and affective analysis
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
Mining social data
Interpreting the public sentiment variations ons on twitter
Predicting Social Interactions from Different Sources of Location-based Knowl...
Graph-based Analysis and Opinion Mining in Social Network
Link prediction

What's hot (19)

DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
DOCX
A novel incremental clustering for information extraction from social networks
PDF
Who to follow and why: link prediction with explanations
PDF
Pilkada DKI 2017 Social Network Model (Early Report)
PDF
IRJET- Link Prediction in Social Networks
PDF
Big Data Analytics : A Social Network Approach
PDF
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
PDF
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
PDF
Multiple Regression to Analyse Social Graph of Brand Awareness
PDF
Content-based link prediction
PDF
An updated look at social network extraction system a personal data analysis ...
PDF
Survey on Location Based Recommendation System Using POI
PDF
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
PDF
How to read computer vision-based networks?
PDF
Social Friend Overlying Communities Based on Social Network Context
PDF
Tweet Segmentation and Its Application to Named Entity Recognition
PDF
Big Data Social Network Analysis
PPTX
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
PDF
Computational culture
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
A novel incremental clustering for information extraction from social networks
Who to follow and why: link prediction with explanations
Pilkada DKI 2017 Social Network Model (Early Report)
IRJET- Link Prediction in Social Networks
Big Data Analytics : A Social Network Approach
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Multiple Regression to Analyse Social Graph of Brand Awareness
Content-based link prediction
An updated look at social network extraction system a personal data analysis ...
Survey on Location Based Recommendation System Using POI
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
How to read computer vision-based networks?
Social Friend Overlying Communities Based on Social Network Context
Tweet Segmentation and Its Application to Named Entity Recognition
Big Data Social Network Analysis
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
Computational culture
Ad

Similar to Using hash tag graph based topic model to connect semantically-related words without co-occurrence in micro blogs (20)

DOCX
Tweet segmentation and its application to named entity recognition
PDF
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
PDF
14420-Article Text-17938-1-2-20201228.pdf
PDF
SEGMENTING TWITTER HASHTAGS
PDF
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
PDF
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
DOCX
Deep feature based text clustering and its explanation
PPT
int.ere.st: SCOT-based Tag Sharing Services
PPTX
final review ppt of engineering hypothetic arm
PDF
Implementation of FAQ Pages using Chatbot
PDF
JFrank_1
PDF
User issues in top-down bottom-up tagging applications: FaceTag
PPT
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
PDF
Implementation of FAQ Pages using Chatbot
DOCX
Entity linking with a knowledge baseissues, techniques, and solutions
PDF
Predicting the Brand Popularity from the Brand Metadata
PDF
Cyber bullying detection and analysis.ppt.pdf
DOCX
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
PDF
Annotating Search Results from Web Databases
Tweet segmentation and its application to named entity recognition
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
14420-Article Text-17938-1-2-20201228.pdf
SEGMENTING TWITTER HASHTAGS
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Deep feature based text clustering and its explanation
int.ere.st: SCOT-based Tag Sharing Services
final review ppt of engineering hypothetic arm
Implementation of FAQ Pages using Chatbot
JFrank_1
User issues in top-down bottom-up tagging applications: FaceTag
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Implementation of FAQ Pages using Chatbot
Entity linking with a knowledge baseissues, techniques, and solutions
Predicting the Brand Popularity from the Brand Metadata
Cyber bullying detection and analysis.ppt.pdf
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Annotating Search Results from Web Databases
Ad

More from Shakas Technologies (20)

DOCX
A Review on Deep-Learning-Based Cyberbullying Detection
DOCX
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
DOCX
A Novel Framework for Credit Card.
DOCX
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
DOCX
NS2 Final Year Project Titles 2023- 2024
DOCX
MATLAB Final Year IEEE Project Titles 2023-2024
DOCX
Latest Python IEEE Project Titles 2023-2024
DOCX
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
DOCX
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
DOCX
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
DOCX
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
DOCX
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
DOCX
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
DOCX
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
DOCX
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
DOCX
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
DOCX
Fighting Money Laundering With Statistics and Machine Learning.docx
DOCX
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
DOCX
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
DOCX
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
A Review on Deep-Learning-Based Cyberbullying Detection
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Novel Framework for Credit Card.
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
NS2 Final Year Project Titles 2023- 2024
MATLAB Final Year IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Fighting Money Laundering With Statistics and Machine Learning.docx
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Lesson notes of climatology university.
PDF
Basic Mud Logging Guide for educational purpose
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
GDM (1) (1).pptx small presentation for students
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Sports Quiz easy sports quiz sports quiz
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Lesson notes of climatology university.
Basic Mud Logging Guide for educational purpose
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Anesthesia in Laparoscopic Surgery in India
GDM (1) (1).pptx small presentation for students
O7-L3 Supply Chain Operations - ICLT Program
Microbial diseases, their pathogenesis and prophylaxis
Sports Quiz easy sports quiz sports quiz

Using hash tag graph based topic model to connect semantically-related words without co-occurrence in micro blogs

  • 1. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com Using Hash tag Graph-Based Topic Model to Connect Semantically-Related Words without Co-Occurrence in Micro blogs ABSTRACT: In this paper, we introduce a new topic model to understand the chaotic micro blogging environment by using hashtag graphs. Inferring topics on Twitter becomes a vital but challenging task in many important applications. The shortness and informality of tweets leads to extreme sparse vector representations with a large vocabulary. This makes the conventional topic models (e.g., Latent Dirichlet Allocation and Latent Semantic Analysis) fail to learn high quality topic structures. Tweets are always showing up with rich user-generated hashtags. The hashtags make tweets semi-structured inside and semantically related to each other. Since hashtags are utilized as keywords in tweets to mark messages or to form conversations, they provide an additional path to connect semantically related words. In this paper, treating tweets as semi-structured texts, we propose a novel topic model, denoted as Hashtag Graph-based Topic Model (HGTM) to discover topics of tweets. By utilizing hashtag relation information in hashtag graphs, HGTM is able to discover word semantic relations even if words are not co-occurred within a specific tweet. With this method, HGTM successfully alleviates the sparsity problem. Our investigation illustrates that the user-contributed hashtags could serve as weakly-supervised information for topic modeling, and the relation between hashtags could reveal latent semantic relation between words. We evaluate the effectiveness of HGTM on tweet (hashtag) clustering and hashtag classification problems. Experiments on two real-world tweet data sets show that HGTM has strong capability to handle sparseness and noise problem in tweets. Furthermore, HGTM can discover more distinct and coherent topics than the state-of-the-art baselines. EXISTING SYSTEM:  Although traditional methods have achieved success in uncovering topics for normal documents (e.g., news articles, technical papers), the characteristics of tweets bring new challenges and opportunities to them.
  • 2. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com  Several methods have been proposed to tackle the serious noise and lack of context problems in tweets. One intuitive method is to aggregate tweets as a long document.  Hong, et al. aggregated tweets by the same user, the same word or the same hashtag.  Mehrotra, et al. investigated different pooling schemes with hashtags for the later LDA process.  Weng, et al. introduced “a pseudo document” by collecting tweets under the same author.  Yan, et al. clustered tweets by a non-negative matrix factorization. DISADVANTAGES OF EXISTING SYSTEM:  Compared with normal texts, tweets usually contain only a few words.  The usage of informal language enlarges the size of the dictionary.  They consider tweets as flat texts and ignore tag-related information contained in twitter data. ATM (Author-Topic Model) just leverages tag information by a uniform distribution of tags, but ignores the potential tag relation that is vitally helpful to build the latent semantic relationship between words. PROPOSED SYSTEM:  We construct different kinds of hashtag graphs based on statistical information of hashtag occurrence in a crowd sourcing manner that can be acquired without human efforts such as labeling. Based on these hashtag graphs, we propose a novel framework of Hashtag Graph based Topic Model (HGTM).  The basic idea of HGTM is to project tweets into a coherent semantic space by using latent variables via user-contributed hashtags.  HGTM provides a robust way for noisy and sparse tweets, which is different from traditional topic models since they normally consider only content information and ignore explicit and potential semantic connection via noisy hashtags.  HGTM is a probability generative model that incorporates such weakly-supervised information based on a weighted hash tag graph. The model links tweets via both explicit and potential tweet-hash tag relationship, so that hash tag relationship can connect
  • 3. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com semantically-related words with or without co-occurrences, which alleviates severe sparse and noise problem in short texts.  In this paper, we extend the work and further explore the influence of different hash tag graph construction methods and discuss more details about HGTM, including time complexity analysis and the key process of hash tag assignment analysis. ADVANTAGES OF PROPOSED SYSTEM:  We evaluate HGTM on two real-world Twitter data sets to understand different kinds of hash tag graphs and the working of HGTM on extensive tweet mining tasks such as clustering, classification, and topic quality evaluation.  Compared to the state-of-the-art methods, HGTM shows the ability of handling the sparseness and noise problem in mining tweets by exploiting both explicit and potential relations between hash tags and tweets. SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS: System : Pentium Dual Core. Hard Disk : 120 GB. Monitor : 15’’ LED Input Devices : Keyboard, Mouse Ram : 1GB. SOFTWARE REQUIREMENTS: Operating system : Windows 7. Coding Language : JAVA/J2EE Tool : Netbeans 7.2.1 Database : MYSQL