SlideShare a Scribd company logo
Analyzing social media to characterize HIV at-risk populations among MSM in San Diego
Narendran Thangarajan1, Dr. Nella Green3, Dr. Amarnath Gupta2, Dr. Susan Little3, Dr. Nadir Weibel1
Digital
Epidemiology
This research is funded by Frontier of Innovative Scholars Program, UCSD
and Center for AIDS Research, UCSD
1 Department of CSE, UC San Diego,	
  2 San Diego Supercomputer Center, 3 School of Medicine, UC San Diego
naren@ucsd.edu
35 MILLION people with AIDS worldwide.
1.2 MILLION people with AIDS in US.
660,000 total deaths caused by AIDS in US.
78% of the new infections in 2010 were MSM.
California (along with Florida) had the
highest number of HIV diagnoses in 2013.
Interesting recent trend - Proliferation of social networks and
real-time communication capabilities.
FISP CFAR
+ =
“Just treated a HIV infected person from location X.We should
probably conduct a PrEP intervention at X.”
“We should deploy peer education in locationY, most of our
patients are from there.”
Ineffective prevention strategies: 50,000 new HIV infections each year.Problem
Characterize and identify HIV at-risk MSM populations by studying
user sentiments and behaviors on social networks.
2015
2012
Salathé et. al. published “Digital Epidemiology” in PLoS
Computational Biology Journal
Solution
2014
Ginsberg et. al. published “Detecting influenza epidemics using
search engine query data” in Nature journal.
2008
Methods of using real-time social media technologies for
detection and remote monitoring of HIV outcomes - Sean D.
Young et. al., Elsevier Preventive Medicine, 2014.
Unraveling Abstinence and Relapse: Smoking Cessation
Reflected in Social Media - Dr. Elizabeth Murnane, CHI 2014.
1. Data collection, classification and refinementMethod
• Tweets are collected in real-time
through theTwitter Streaming API.
Twitter’s “filter hose” is used to collect
tweets from San Diego county.
• Each tweet is cleaned by removing
stop words, punctuations and
converting to lower case.
III. Migration from raw twitter data to social network graph
II. Improving the accuracy of HIV risk tweets classification using machine learning
To improve the accuracy of HIV
risk tweets classified, we
evaluated two linear classifiers -
SupportVector Machines (SVM)
and Logistic Regression with
different sets of features.
Feature Set SVM Logistic Regression
Bag of Words 15.73% 15.72%
Stop Word Removal 12.9% 12.98%
Domain Specific Terms 11.37% 7.42%
Tweeter information 17.12% 15.23%
Error rates using different linear classifiers
• The property graph model was
adopted as the data model for HIV at-
risk MSM twitter social network.
• 7 node types and 9 edge types were
identified as shown.
• Ontologies (shown in green) are used
to infer indirect relationships between
entities. For instance, it allows us to
query for users who post tweets
related to meth and sex venues.
• The resulting graph was materialized
in a graph database called Neo4J.
Results obtained using EDA queriesAnalysis
Exploratory Data Analysis queries helped understand the hidden patterns in
the HIV at-risk social network.
Querying the social graph to identify interesting communication structuresResults
Currently, we have a query-able HIV at-risk twitter network graph.
Proximity: How close are drug bucket
users to other homosexual bucket users in
terms of hop count?
Topics of interest: What are the main topics
in the discussions among people who are at
a one-hop following distance from their sub-
graph’s hubs?
Conversations: How many conversations
are happening among drug bucket users
alone , sex bucket users alone and across
drug bucket users and sex bucket users?”
Preferences: Identify two drug bucket users
who are most consulted by homosexual
people.
Current status and future worksFuture
(0) Drug (1) Homosexual (2) STI
(3) Sex (4) SexVenues
The HIV at-risk MSM social network
coupled with the real-world
HIV transmission network inferred using
phylodynamics from SD PIC will help us
understand if the actual sexual network can
be reconstructed using the social network.
Ultimately, this social network could predict
an individual’s future HIV transmission risk
enabling us to prevent it in real-time.
• Each tweet is classified as a HIV risk tweet if it falls in one
of the five HIV risk categories - Drug, SexVenues, Sex,
Homosexual, SexuallyTransmitted Infections.
• Classified tweets are refined further using exclusion and
inclusion lists of co-occurring words. e.g.“ice cold” doesn't
refer to meth (a drug commonly called “ice")
• After getting a refined set of HIV risk tweets, the relevant metadata (like tweeters
and the mentioned users) were fetched usingTwitter’s public APIs.
• Retweet and reply chains were pulled in recursively to ensure the original tweet
and the corresponding tweeter were part of the resulting social network graph.
Most active time of the day Most active day of the week Power-law distribution of tweets
Length of HIV risk tweets Tweets distribution across risk buckets Most co-occurring risk categories
• IRB approval and recruitment - Currently, we are collecting
twitter handles of people in the HIV transmission network and
those at risk of acquiring HIV. This enables us to compare the
structural similarities in the sexual network and the twitter
social network.
• Interactive data visualizations to enable visualizing the evolving
HIV at-risk social network to decipher underlying patterns in
network structure evolution and the corresponding changes in
SNA metrics.
• Computational model that captures the behavior of a HIV at-
risk user onTwitter.
Social	
  Network
Sexual	
  Network
• Collaboration with Harvard to identify change-points in the social
network structure.

More Related Content

DOCX
Malware propagation in large scale networks
PDF
Malware propagation in large scale networks
PDF
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
PDF
A network based model for predicting a hashtag break out in twitter
PPTX
FAKE NEWS DETECTION PPT
PDF
Malware Propagation in Large-Scale Networks
PDF
The Mathematics of Memes
PDF
Secure and Reliable Data Transmission in Generalized E-Mail
Malware propagation in large scale networks
Malware propagation in large scale networks
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
A network based model for predicting a hashtag break out in twitter
FAKE NEWS DETECTION PPT
Malware Propagation in Large-Scale Networks
The Mathematics of Memes
Secure and Reliable Data Transmission in Generalized E-Mail

What's hot (19)

PPTX
Visualizing Communication on Social Media: Making Big Data Acessible
DOC
Seminar Report Mine
PPT
presentation29
PAGES
Usability Review of Mashup Tools
PPTX
Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16
DOCX
Malware propagation in large scale networks
PDF
Microposts2015 - Social Spam Detection on Twitter
PDF
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
PPTX
友人関係と感染症伝搬をネットワークで理解する
PPTX
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
PPTX
Nanotweets
PPTX
presentation
PDF
Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...
PDF
Identifying and Characterizing User Communities on Twitter during Crisis Events
DOCX
Spammer taxonomy using scientific approach
PPTX
Detection and resolution of rumours in social media
PPTX
00 Social Influence Effects on Men's HIV Testing
PDF
Towards a More Holistic Approach on Online Abuse and Antisemitism
PPTX
Presentation-Detecting Spammers on Social Networks
Visualizing Communication on Social Media: Making Big Data Acessible
Seminar Report Mine
presentation29
Usability Review of Mashup Tools
Two Studies on Twitter Networks and Tweet Content in #ALS/#MND #HIC16
Malware propagation in large scale networks
Microposts2015 - Social Spam Detection on Twitter
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
友人関係と感染症伝搬をネットワークで理解する
CISummit 2013: Luke Matthews, Tracking the Electronic Metadata Trail of the S...
Nanotweets
presentation
Infografia: Cisco presenta primer Firewall de próxima generación enfocado en ...
Identifying and Characterizing User Communities on Twitter during Crisis Events
Spammer taxonomy using scientific approach
Detection and resolution of rumours in social media
00 Social Influence Effects on Men's HIV Testing
Towards a More Holistic Approach on Online Abuse and Antisemitism
Presentation-Detecting Spammers on Social Networks
Ad

Viewers also liked (18)

DOCX
Hindu palmistry symbols and signs .
PPTX
Tourem ver2.2 20140816
PDF
Power under Pressure
PDF
Website nightmares | Brenda Cordova | Web Design
PDF
Mindset or training
PPTX
Actividad 3.1
PDF
Estado del arte de las modalidades de ebusiness - perú y américa latina en el...
PPTX
Cordova Windows Installation
PPTX
El Ebusiness y sus componentes para una gestión exitosa
DOCX
Herón de alejandría ensayo
PDF
Happy Birthday Singapore
PDF
Portable Air Compressors
DOC
Resolucion admision demanda de alimentos y asignacion provicional
PPTX
Curso competitividad-laboral-competencia-trabajo yhon
PPTX
7 Letters_The Church of Ephesus
DOC
Audiencia y sentencia juicio de alimentos
PPTX
Bioteknologi KEL 5
Hindu palmistry symbols and signs .
Tourem ver2.2 20140816
Power under Pressure
Website nightmares | Brenda Cordova | Web Design
Mindset or training
Actividad 3.1
Estado del arte de las modalidades de ebusiness - perú y américa latina en el...
Cordova Windows Installation
El Ebusiness y sus componentes para una gestión exitosa
Herón de alejandría ensayo
Happy Birthday Singapore
Portable Air Compressors
Resolucion admision demanda de alimentos y asignacion provicional
Curso competitividad-laboral-competencia-trabajo yhon
7 Letters_The Church of Ephesus
Audiencia y sentencia juicio de alimentos
Bioteknologi KEL 5
Ad

Similar to Pirc net poster (20)

PPTX
Modeling Spread of Disease from Social Interactions
PDF
Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
PDF
How Should We Target Prevention Interventions?
PPTX
Ebola response in Liberia: A step towards real-time epidemic science
PDF
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
PDF
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
PDF
PPTX
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
PDF
Epidemiological Modeling of News and Rumors on Twitter
PPTX
Information Contagion through Social Media: Towards a Realistic Model of the ...
PPTX
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
PDF
1112 social media and public health
PDF
A Machine Learning Ensemble Model for the Detection of Cyberbullying
PDF
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
PDF
IRJET- Fake News Detection and Rumour Source Identification
PDF
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
PDF
A Machine Learning Ensemble Model for the Detection of Cyberbullying
PDF
Social network analysis and audience segmentation, presented by Jason Baldridge
PDF
Role of data science during covid times
PPTX
What's up at Kno.e.sis?
Modeling Spread of Disease from Social Interactions
Fattori - 50 abstracts of e patient. In collaborazione con Monica Daghio
How Should We Target Prevention Interventions?
Ebola response in Liberia: A step towards real-time epidemic science
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
Epidemiological Modeling of News and Rumors on Twitter
Information Contagion through Social Media: Towards a Realistic Model of the ...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
1112 social media and public health
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
IRJET- Fake News Detection and Rumour Source Identification
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
A Machine Learning Ensemble Model for the Detection of Cyberbullying
Social network analysis and audience segmentation, presented by Jason Baldridge
Role of data science during covid times
What's up at Kno.e.sis?

More from UC San Diego (20)

PDF
A primer on network devices
PDF
Datacenter traffic demand characterization
PDF
Smart Homes, Buildings and Internet-of-things
PDF
Social Networks analysis to characterize HIV at-risk populations - Progress a...
PDF
eyeTalk - A system for helping people affected by motor neuron problems
PDF
Ajaxism
PDF
Basic terminologies for a developer
PDF
Fields in computer science
PDF
Understanding computer networks
PDF
FOSS Introduction
PDF
Network Programming with Umit project
PDF
Introduction to Python
PDF
Airline reservation system db design
PDF
Workshop on Network Security
PPTX
Socket programming in Java (PPTX)
PDF
Socket programming using java
PDF
Routing basics
PDF
Technology Quiz
PDF
Android application development
PDF
Pervasive Web Application Architecture
A primer on network devices
Datacenter traffic demand characterization
Smart Homes, Buildings and Internet-of-things
Social Networks analysis to characterize HIV at-risk populations - Progress a...
eyeTalk - A system for helping people affected by motor neuron problems
Ajaxism
Basic terminologies for a developer
Fields in computer science
Understanding computer networks
FOSS Introduction
Network Programming with Umit project
Introduction to Python
Airline reservation system db design
Workshop on Network Security
Socket programming in Java (PPTX)
Socket programming using java
Routing basics
Technology Quiz
Android application development
Pervasive Web Application Architecture

Pirc net poster

  • 1. Analyzing social media to characterize HIV at-risk populations among MSM in San Diego Narendran Thangarajan1, Dr. Nella Green3, Dr. Amarnath Gupta2, Dr. Susan Little3, Dr. Nadir Weibel1 Digital Epidemiology This research is funded by Frontier of Innovative Scholars Program, UCSD and Center for AIDS Research, UCSD 1 Department of CSE, UC San Diego,  2 San Diego Supercomputer Center, 3 School of Medicine, UC San Diego naren@ucsd.edu 35 MILLION people with AIDS worldwide. 1.2 MILLION people with AIDS in US. 660,000 total deaths caused by AIDS in US. 78% of the new infections in 2010 were MSM. California (along with Florida) had the highest number of HIV diagnoses in 2013. Interesting recent trend - Proliferation of social networks and real-time communication capabilities. FISP CFAR + = “Just treated a HIV infected person from location X.We should probably conduct a PrEP intervention at X.” “We should deploy peer education in locationY, most of our patients are from there.” Ineffective prevention strategies: 50,000 new HIV infections each year.Problem Characterize and identify HIV at-risk MSM populations by studying user sentiments and behaviors on social networks. 2015 2012 Salathé et. al. published “Digital Epidemiology” in PLoS Computational Biology Journal Solution 2014 Ginsberg et. al. published “Detecting influenza epidemics using search engine query data” in Nature journal. 2008 Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes - Sean D. Young et. al., Elsevier Preventive Medicine, 2014. Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media - Dr. Elizabeth Murnane, CHI 2014. 1. Data collection, classification and refinementMethod • Tweets are collected in real-time through theTwitter Streaming API. Twitter’s “filter hose” is used to collect tweets from San Diego county. • Each tweet is cleaned by removing stop words, punctuations and converting to lower case. III. Migration from raw twitter data to social network graph II. Improving the accuracy of HIV risk tweets classification using machine learning To improve the accuracy of HIV risk tweets classified, we evaluated two linear classifiers - SupportVector Machines (SVM) and Logistic Regression with different sets of features. Feature Set SVM Logistic Regression Bag of Words 15.73% 15.72% Stop Word Removal 12.9% 12.98% Domain Specific Terms 11.37% 7.42% Tweeter information 17.12% 15.23% Error rates using different linear classifiers • The property graph model was adopted as the data model for HIV at- risk MSM twitter social network. • 7 node types and 9 edge types were identified as shown. • Ontologies (shown in green) are used to infer indirect relationships between entities. For instance, it allows us to query for users who post tweets related to meth and sex venues. • The resulting graph was materialized in a graph database called Neo4J. Results obtained using EDA queriesAnalysis Exploratory Data Analysis queries helped understand the hidden patterns in the HIV at-risk social network. Querying the social graph to identify interesting communication structuresResults Currently, we have a query-able HIV at-risk twitter network graph. Proximity: How close are drug bucket users to other homosexual bucket users in terms of hop count? Topics of interest: What are the main topics in the discussions among people who are at a one-hop following distance from their sub- graph’s hubs? Conversations: How many conversations are happening among drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” Preferences: Identify two drug bucket users who are most consulted by homosexual people. Current status and future worksFuture (0) Drug (1) Homosexual (2) STI (3) Sex (4) SexVenues The HIV at-risk MSM social network coupled with the real-world HIV transmission network inferred using phylodynamics from SD PIC will help us understand if the actual sexual network can be reconstructed using the social network. Ultimately, this social network could predict an individual’s future HIV transmission risk enabling us to prevent it in real-time. • Each tweet is classified as a HIV risk tweet if it falls in one of the five HIV risk categories - Drug, SexVenues, Sex, Homosexual, SexuallyTransmitted Infections. • Classified tweets are refined further using exclusion and inclusion lists of co-occurring words. e.g.“ice cold” doesn't refer to meth (a drug commonly called “ice") • After getting a refined set of HIV risk tweets, the relevant metadata (like tweeters and the mentioned users) were fetched usingTwitter’s public APIs. • Retweet and reply chains were pulled in recursively to ensure the original tweet and the corresponding tweeter were part of the resulting social network graph. Most active time of the day Most active day of the week Power-law distribution of tweets Length of HIV risk tweets Tweets distribution across risk buckets Most co-occurring risk categories • IRB approval and recruitment - Currently, we are collecting twitter handles of people in the HIV transmission network and those at risk of acquiring HIV. This enables us to compare the structural similarities in the sexual network and the twitter social network. • Interactive data visualizations to enable visualizing the evolving HIV at-risk social network to decipher underlying patterns in network structure evolution and the corresponding changes in SNA metrics. • Computational model that captures the behavior of a HIV at- risk user onTwitter. Social  Network Sexual  Network • Collaboration with Harvard to identify change-points in the social network structure.