This study analyzes topic modeling using the Latent Dirichlet Allocation (LDA) algorithm on COVID-19 related tweets to understand trends and insights from social media during the pandemic. It examines the effect of pooling tweets based on hashtags to improve topic coherence scores, comparing regular LDA with LDA using collapsed Gibbs sampling. The research highlights the effectiveness of social media data in various domains, including public health and policymaking, through a comprehensive preprocessing and data collection process involving nearly 1 million tweets.
Related topics: