SlideShare a Scribd company logo
NAACL Tutorial
Social Media Predictive Analytics
Svitlana Volkova1, Benjamin Van Durme1,2,
David Yarowsky1 and Yoram Bachrach3
1Center for Language and Speech Processing,
Johns Hopkins University,
2Human Language Technology Center of Excellence,
3Microsoft Research Cambridge
Tutorial Schedule
Part I: Theoretical Session (2:00 – 4:30pm)
Batch Prediction
Online Inference
Coffee Break (3:30 – 4:00pm)
Dynamic Learning and Prediction
Part II: Practice Session (4:30 – 5:30pm)
Code and Data
Tutorial Materials
• Slides:
– http://www.cs.jhu.edu/~svitlana/slides.pptx
• Code and Data:
– https://bitbucket.org/svolkova/queryingtwitter
– https://bitbucket.org/svolkova/attribute
– https://bitbucket.org/svolkova/psycho-demographics
• References:
– http://www.cs.jhu.edu/~svitlana/references.pdf
Social Media Obsession
Diverse
Billions of messages
Millions of users
What do they
think and feel?
Where do
they go?
What is their
demographics and
personality?
What do
they like?
What do
they buy?
First: a comment on privacy and ethics…
Why is language in social media so
interesting?
• Very Short – 140 chars
• Lexically divergent
• Abbreviated
• Multilingual
Why is language in social media so
challenging?
• Data drift
• User activeness => generalization
• Topical sparsity => relationship, politics
• Dynamic streaming nature
DEMO
Predictive Analytics Services
• Social Network Prediction –
https://apps.facebook.com/snpredictionapp/
• Twitter Psycho-Demographic Profile and Affect Inference –
http://twitterpredictor.cloudapp.net (pswd: twitpredMSR2014)
• My personality Project – http://mypersonality.org/wiki/doku.php
• You Are What You Like – http://youarewhatyoulike.com/
• Psycho-demographic trait predictions –
http://applymagicsauce.com/
• IBM Personality – https://watson-pi-demo.mybluemix.net
• World Well Being Project – http://wwbp.org
Applications: Retail
Personalized marketing
• Detecting opinions and emotions
users express about products or
services within targeted
populations
Personalized
recommendations and search
• Making recommendations based on
user emotions, demographics and
personality
Applications: Advertising
Online targeted advertising
• Targeting ads based on
predicted user demographics
• Matching the emotional tone
the user expects
Deliver adds fast
Deliver adds to a
true crowd
vs.
vs.
vs.
Applications: Polling
Real-time live polling
• Mining political opinions
• Voting predictions within certain demographics
Large-scale passive
polling
• Passive poling
regarding products
and services
vs.
Applications: Health
Large-scale real-time healthcare analytics
• Identifying smokers, drug addicts, healthy eaters,
people into sports (Paul and Dredze 2011)
• Monitoring flue-trends, food poisonings, chronic
illnesses (Culotta et. al. 2015)
Applications: HR
Recruitment and human resource
management
• Estimating emotional stability and
personality of the potential and
current employees
• Measuring the overall well-being of
the employees e.g., life satisfaction,
happiness (Schwartz et. al. 2013;
Volkova et. al., 2015)
• Monitor depression and stress level
(Coppersmith et. al. 2014)
User Attribute Prediction Task
Political Preference
Rao et al., 2010; Conover et al., 2011,
Pennacchiotti and Popescu, 2011;
Zamal et al., 2012; Cohen and Ruths,
2013; Volkova et. al, 2014
.
.
.
Communications
Gender
Garera and Yarowsky, 2009; Rao et
al., 2010; Burger et al., 2011; Van
Durme, 2012; Zamal et al., 2012;
Bergsma and Van Durme, 2013
Age
Rao et al., 2010; Zamal et al., 2012;
Cohen and Ruth, 2013; Nguyen et al.,
2011, 2013; Sap et al., 2014
…
…
…
…
AAAI 2015 Demo (joint work with Microsoft Research)
Income, Education Level, Ethnicity, Life Satisfaction, Optimism,
Personality, Showing Off, Self-Promoting
Tweets Revealing User Attributes
?
?
?
?
Supervised Models
Classification: binary (SVM) – gender, age, political, ethnicity
• Goswami et. al., 2009; Rao et al. 2010; Burger et al. 2011; Mislove et al.
2012; Nguyen et al. 2011; Nguyen et al. 2013;
• Pennacchiotti and Popescu 2011; Connover et. al. 2011; Filippova et. al.
2012; Van Durme 2012; Bergsma et. al. 2012, 2013; Bergsma and Van
Durme 2013;
• Zamal et al. 2012; Ciot et. al. 2013; Cohen and Ruths 2013;
• Schwartz et. al. 2013; Sap et. al., 2014; Kern et. al., 2014; Schwartz et. al.
2013; Golbeck et. al. 2011; Kosinski et. al. 2013;
• Volkova et. al. 2014; Volkova et al. 2015.
Unsupervised and Generative Models
• name morphology for gender & ethnicity prediction - Rao et al. 2011;
• large-scale clustering - Bergsma et. al. 2013; Culotta et. al. 2015;
• demographic language variations - Eisenstein et al. 2010; O’Connor et
al. 2010; Eisenstein et. al. 2014.
*Rely on more than lexical features e.g., network, streaming
Existing Approaches ~1K Tweets*
….
…
….
…
….
…
….
…
….
…
….
…
….
…
….
…
Does an average Twitter user produce
thousands of tweets?
*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and
Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen
et al., 2013
Tweets as a
document
How Active are Twitter Users?
Attributed Social Network
User Local Neighborhoods a.k.a. Social Circles
Approaches
Static (Batch)
Prediction
Streaming
(Online) Inference
Dynamic (Iterative)
Learning and Prediction
• Offline
training
• Offline
predictions
+ Neighbor
content
• Offline training
+ Online
predictions over
time
• Exploring 6 types
of neighborhoods
• Online predictions
• Relying on neighbors
+ Iterative re-training
+ Active learning
+ Rationale annotation
Topical sparsity
Data drift
Streaming nature
Model generalization
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
How to get data? Twitter API
• Twitter API: https://dev.twitter.com/overview/api
• Twitter API Status:https://dev.twitter.com/overview/status
• Twitter API Rate Limits:
https://dev.twitter.com/rest/public/rate-limits
Querying Twitter API
• Twitter Developer Account => access key and token
https://dev.twitter.com/oauth/overview/application-
owner-access-tokens
twitter = Twython(APP_KEY, APP_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
I. Access 1% Twitter Firehouse and sample from it
II. Query Twitter API to get:
 user timelines (up to 3200 tweets) from userIDs
 tweet json objects from tweetIDs
 lists of friendIDs (5000 per query) from userIDs
JSON Objects
MongoDB: http://docs.mongodb.org/manual/tutorial/getting-started/
Add predictions: sentiment,
attributes, emotions
How to get labeled data?
• Supervised classification in a new domain:
– Labeled data ≈ ground truth
– Costly and time consuming to get!
• Ways to get ≈“ground truth” annotations:
 Fun psychological tests (voluntarily): myPersonality project
 Profile info: Facebook e.g., relationship, gender, age but sparse for Twitter
 Self reports: “I am a republican…” (Volkova et al. 2013), “Happy
##th/st/nd/rd birthday to me” (Zamal et. al. 2012), “I have been diagnosed
with …” (Coppersmith et. al. 2014), “I am a writer …” (Beller at. al., 2014)
 Distant supervision: following Obama vs. Romney (Zamal et. al. 2012),
emotion hashtags (Mohammad et. al, 2014), user name (Burger et. al., 2011)
 Crowdsourcing: subjective perceived annotations (Volkova et. al.2015),
rationales (Bergsma et. al., 2013, Volkova et. al, 2014; 2015)
Attribute
Model
ΦA(u)
UL
UP
Twitter Social Graph
friend hashtag
reply
@mentionfollower
retweet
I. Candidate-Centric (distant
supervision)
1,031 users
II. Geo-Centric (self-reports)
270 users
III. Politically Active (distant
supervision)*
371 users (Dem; Rep)
IV. Age (self-reports)*
387 users (18 – 23; 23 - 25)
V. Gender (name)*
384 users (Male; Female)
Balanced datasets
*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013
Code, data and trained models for gender, age, political preference prediction http://www.cs.jhu.edu/~svitlana/
10 - 20 neighbors
of 6 types per user
What types of neighbors lead to the best
attribute predictions?
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
Classification Model
• Logistic regression = max entropy = log linear models
– Map discrete inputs w to binary output y
• Other options: SVM, NB
wi = 0,1{ }
y = M,F{ }
hair eat co
ol
wor
k
… xbo
x
Femal
e
1 1 0 0 … 0
Male 0 1 0 1 … 1
Male 0 0 1 1 … 1
http://scikit-learn.org/stable/modules/generated/
sklearn.linear_model.LogisticRegression.html
Labeledusers
(Training)
Vocabulary size
hair eat co
ol
wor
k
… xbo
x
? 0 1 0 0 … 1
Feature vector
Test user
Features (I)
• Lexical:
– normalized counts/binary ngrams (Goswami el. al. 2010; Rao et. al.
2010; Pennacchiotti and Popescu 2011; Ngyen et. al. 2013; Ciot et. al.
2013; Van Durme 2012; Kern et. al. 2014; Volkova et. al. 2014;
Volkova and Van Durme 2015)
– class-based highly predictive (Bergsma and Van Durme 2013),
rationales (Volkova and Yarowsky 2014); character-based
(Peersman et. al. 2011), stems, co-stems, lemmas (Zamal et. al.
2012; Cohen et. al. 2014)
• Socio-linguistic, syntactic and stylistic:
– syntax and style (Shler et. al. 2006; Cheng at. al., 2011), smiles,
excitement, emoticons and psycho-linguistic (Rao et. al. 2010;
Marquardt et. al. 2014; Kokkos et. sl. 2014; Hovy 2015)
– lexicon features (Sap et. al. 2014); linguistic inquiry and word
count (LIWC) (Mukherjee et. al. 2010; Fink et. al. 2012)
Features (II)
• Communication behavior: response/retweet/tweet frequency,
retweeting tendency (Connover et. al. 2011; Golbeck et. al. 2011;
Pennacchiotti and Popescu 2011; Preotic at. al. 2015)
• Network structure: follower-following ratio, neighborhood size,
in/out degree, degree of connectivity (Bamman et. al. 2012;
Filippova 2012; Zamal et. al. 2012, Culotta et. al. 2015)
• Other: likes (Bachrach et. al. 2012; Kosinski et. al. 2014), name or
census (Burger et. al. 2011; Liu and Ruths 2013), links/images
(Rosenthal and McKeown 2011)
• Topics: word embeddings, LDA topics, word clusters (Preotic at.
al. 2015)
hair eat coo
l
wor
k
… xbo
x
RT neig
h
image
s
….
Female 1 1 0 0 … 0 0.3 30 0.5 ….
Batch Experiments
• Log-linear word unigram models:
(I) Users vs. (II) Neighbors and (III) User-Neighbor
• Evaluate different neighborhood types:
– varying neighborhood size n=[1, 2, 5, 10] and
content amount t=[5, 10, 15, 25, 50, 100, 200]
– 10-fold cross validation with 100 random
restarts for every n and t parameter combination
F = argmaxa P A = a T( )
User Model
Fu
=
D if
1
1+e-q f u ³ 0.5
R otherwise.
ì
í
ï
î
ï
Train Graph
vi
Test Graph
t :…Ron Paul not a fan of Chris Christie
ft
vj
: w1 = 0,w2 = 0,…,wn = 0[ ]
t : Washington Post Columnist…
ft
vi
: w1 =1,w2 =1,…,wn = 0[ ]
vj
t : We're watching you House @GOP
ft
vk
= w1 =1,w2 =1,…,wn = 0[ ]
vk -?
Neighbor Model
HLTCOE Text Meeting, June 09 2014
Train Graph
vi
Test Graph
t :Obama: I'd defend @MajorCBS
ft
N vi( )
= w1,w2,…,wn[ ]
vj
t :@FoxNews: WATCH LIVE
ft
N vk( )
= w1,w2,…,wn[ ]
t : The Lyin King #RepMovies
ft
N vj( )
= w1,w2,…,wn[ ]
vk -?
F
N u( )
=
D if
1
1+e-q f
N u( ) ³ 0.5
R otherwise.
ì
í
ï
î
ï
Joint User-Neighbor Model
Train Graph
vi
Test Graph
t :…Ron Paul not a fan of Christie
The Lyin King #RepublicanMovies
ft
vj +N vj( )
= w1,w2,…,wn[ ]
t : Washington Post Columnist…
Obama: I'd defend @MajorCBS
ft
vi+N vi( )
= w1,w2,…,wn[ ]
vj
t :@FoxNews: WATCH LIVE
We're watching you House @GOP
ft
vk +N vk( )
= w1,w2,…,wn[ ]
vk -?
Learning on user and neighbor features jointly (not prefixing features)
F
u+N u( )
=
D if
1
1+e-q f
u+N u( ) ³ 0.5
R otherwise.
ì
í
ï
î
ï
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
Gender Prediction
?
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
retweet.counts
usermention.counts
0.73
5 10 20 50 100 500
0.500.600.700.80
Tweets Per User
Accuracy
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
friend.counts
usermention.binary
Neighbor: 0.63
User-Neigh: 0.73
User: 0.82
40
Lexical Markers for Gender
Gender Prediction Quality
Approach Users Tweets Features Accuracy
Rao et al., 2010 1K 405 BOW+socioling 0.72
Burger et al., 2011 184K 22 username, BOW 0.92
Zamal et al., 2012 384 10K neighbor BOW 0.80
Bergsma et al., 2013 33.8K − BOW, clusters 0.90
JHU models 383 200/2K BOW user/neigh 0.82/0.73
• This is not a direct comparison => Twitter data sharing
restrictions
• Poor generalization: different datasets = different
sampling and annotation biases
Age Prediction
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
follower.counts
friend.counts
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
friend.counts
retweet.counts
5 10 20 50 100 500
0.500.600.70
Tweets Per User
Accuracy
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
?
User: 0.77
Neighbor: 0.72
User-Neigh: 0.77
18 – 23 23 – 25
Lexical Markers for Age
Age Prediction Quality
?
Approach Users Tweets Groups Features Accuracy
Rao et al., 2010 2K 1183 <=30; > 30 BOW+socioling 0.74
Zamal et al., 2012 386 10K 18 – 23; 23 - 25 neighbor BOW 0.80
JHU models 381 200/2K 18 – 23; 23 - 25 BOW/neighbors 0.77/0.74
• This is not a direct comparison!
• Performance for different age groups
• Sampling and annotation biases
Political Preference
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
friend.counts
retweet.binary
5
10
15
20
0 50 100 150 200
Tweets
Neighbors
friend.counts
retweet.binary
usermention.binary
5 10 20 50 100 500
0.550.650.750.85
Tweets Per User
Accuracy
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
Useruni
Userbin
User
bi
Usertri
UserOnlyZLR
?
0.91
User: 0.89
User-Neigh: 0.92
Neighbor: 0.91
Lexical Markers for Political
Preferences
Model Generalization
• Political preference classification is not easy!
• Topical sparsity: average users rarely tweet about politics
0.57
0.67 0.690.72 0.75
0.870.89 0.91 0.92
0.00
0.20
0.40
0.60
0.80
1.00
User Neighbor User-Neighbor
Accuracy
Geo-centric Cand-centric Active
Approach Users Tweets Features Accuracy
Bergsma et al., 2013 400 5K BOW, clusters 0.82
Pennacchiiotti 2011 10.3K − BOW, network 0.89
Conover et al., 2011 1K 1K BOW, network 0.95
Zamal et al., 2012 400 1K neighbor BOW 0.91
JHU active 371 200 BOW user/neigh 0.89/0.92
JHU cand centric 1,051 200 BOW user/neigh 0.72/0.75
Political Preference Prediction Quality
JHU geo-centric 270 200 BOW user/neigh 0.57/0.67
Cohen et al., 2013 262 1K BOW, network 0.68
Politically Active Users (sampling/annotation bias)
Random /Average Users
Querying more neighbors with less
tweets is better than querying more
tweets from the existing neighbors
Limited
Twitter
API Calls
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Optimizing Twitter API Calls
Cand-Centric Graph: Friend Circle
?
Summary: Static Prediction
• Features: Binary (political) vs. count-based features (age, gender)
• Homophily: “neighbors give you away” => users with no content
• Attribute assortativity: similarity with neighbors depends on
attribute types
• Content from more neighbors per user >> additional content from
the existing neighbors
• Generalization of the classifiers
FollowerFriend Retweet
Mention MentionFriend
N
UN
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
Iterative Bayesian Predictions
Time
t1 t2 tk…
Pt1
R Tt1( )= 0.52
Ptk
R Ttk( )= 0.77
tk-1
Pt2
R Tt2( )= 0.62
?
P a = R T( )=
P tk a = R( )×P a = R( )
k
Õ
P tk a = R( )×P a = R( )
k
Õ + P tk a = D( )×P a = D( )
k
Õ
Pt2
R Ttk-1( )= 0.65
t0
?
Class prior
Likelihood
Posterior
Cand-Centric Graph: Posterior Updates
0.5
0.6
0.7
0.8
0.9
1.0
0 20 40 60
p(Republican|T)
0.3
0.4
0.5
blican|T)
t2
?
t0 …
Time
t1 tk-1 t2
?
t0 …
Time
t1 tk-1
0.5
0.6
0 20 40 60
0.0
0.1
0.2
0.3
0.4
0.5
0 20 40 60
Tweet Stream (T)
p(Republican|T)
Cand-Centric: Prediction Time (1)
300
400
500
0 1 2 3 4 5
Time in Weeks
Users
User-Neighbor
300
400
500
0 5 10 15
Time in Weeks
Users
0.75
0.95
User Stream
Dem
Rep
Prediction confidence: 0.95 vs. 0.75
Democrats are easier to predict than republicans
Dem
Rep
Usersclassified
correctly
Cand-Centric Graph: Prediction Time (2)
0.02
12 20
0.01
19
8.9
0.002
1.2
3.2
0.001
3.5
1.1
0.001
0.01
0.1
1
10
100
Weeks(logscale)
How much time does it take to classify 100 users with
75% confidence?
Compare: User Stream vs. Joint User-Neighbor Stream
Cand-centric Geo-Centric Active
60
Batch vs. Online Performance
0.99
0.84
0.89
0.99
0.88
0.99
0.0
0.2
0.4
0.6
0.8
1.0
Cand Geo Active
User Stream User-Neighbor Stream
0.72
0.57
0.75
0.75
0.67
0.86
0.0
0.2
0.4
0.6
0.8
1.0
Cand Geo Active
Accuracy
User Batch Neighbor Batch
?
61
Summary: Online Inference
• Homophily: Neighborhood content is useful*
• Lessons learned from batch predictions:
– Age: user-follower or user-mention joint stream
– Gender: user-friend joint stream
– Political: user-mention and user-retweet joint stream
• Streaming models >> batch models
• Activeness: tweeting frequency matters a lot!
• Generalization of the classifiers: data sampling and
annotation biases
*Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b;
Golbeck et al., 2011; Zamal et al., 2012; Volkova et. al., 2014
Part I Outline
I. Batch Prediction
i. How to collect and annotate data?
ii. What models and features to use?
iii. Which neighbors are the most predictive?
II. Online Inference
i. How to predict from a stream?
I. Dynamic (Iterative) Learning and Prediction
i. How to learn and predict on the fly?
Iterative Batch Learning
Time
R
D
?
?
t1
t0 t1
tkt2 …
t1
LabeledUnlabeled
t1
t1
Pt1
R t1( )= 0.52 Ptk
R t1…tm( )= 0.77
 Iterative Batch
Retraining (IB)
 Iterative Batch
with Rationale
Filtering (IBR)
?
tm…
tm
t2 …
t2 …
tm
t2 …
Active Learning
LabeledUnlabeled
F u,t1( )
F n,t1( )
1-Jan-2011 1-Feb-2011 1-Nov-2011 1-Dec-2011
Time
…
…
t0 t1 tk-1 tk
u
ni Î N
Pt0
R t1( )= 0.5 Pt1
R t1…t5( )= 0.55 Ptk-1
R t1…t100( )= 0.77 >q
 Active Without
Oracle (AWOO)
 Active With
Rationale Filtering
(AWR)
 Active With Oracle
(AWO)
Annotator Rationales
Rationales are explicitly highlighted ngrams in tweets that best
justified why the annotators made their labeling decisions
feature norms
(psychology),
feature sparsity
Bergsma and Van Durme, 2013; Volkova and Yarowsky, 2014; Volkova and Van Durme, 2015
Alternative: Rationale Weighting
• Annotator rationales for gender, age and political:
http://www.cs.jhu.edu/~svitlana/rationales.html
• Multiple languages: English, Spanish
• Portable to other languages
Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. Svitlana
Volkova and David Yarowsky. NIPS Workshop on Personalization: Methods and Applications 2014.
Performance Metrics
• Accuracy over time:
• Find optimal models:
– Data steam type (user, friend, user + friend)
– Time (more correctly classified users faster)
– Prediction quality (better accuracy over time)
Aq,t =
#correctly classified
#above threshold
=
TD+TR
D+ R
Results: Iterative Batch Learning
0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
250
300
Mar Jun Sep
Accuracy
Correctlyclassified
user user + friend
0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
250
300
Mar Jun Sep
Accuracy
Correctlyclassified
user user + friend
IB: higher recall IBR: higher precision
Time: # correctly classified users increases over time
IB faster, IBR slower
Data stream selection:
User + friend stream > user stream
Results: Active Learning
AWOO: higher recall AWR: higher precision
Time:
Unlike IB/IBR models, AWOO/AWR models classify
more users correctly faster (in Mar) but then plateaus
0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
250
300
Mar Jun Sep
Accuracy
Correctlyclassified
user user + friend
0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
250
300
Mar Jun Sep
Accuracy
Correctlyclassified
user user + friend
0.5
0.6
0.7
0.8
0.9
1.0
Mar Jun Sep
Accuracy
IB: user
IBR: user
0.5
0.6
0.7
0.8
0.9
1.0
Mar Jun Sep
Accuracy
AWOO: user
AWR: user
0.5
0.6
0.7
0.8
0.9
1.0
Mar Jun Sep
Accuracy
IBR: user + friend
IB: user + friend
0.5
0.6
0.7
0.8
0.9
1.0
Mar Jun Sep
Accuracy
AWR: user + friend
AWOO: user + friend
batch < active
user+friend>user Results: Model Quality
Active with Oracle Annotations
50 160 182 198 213 234
50
125
200
275
350
Feb Apr Jun Aug Oct Dec
Cumul.requests
toOracle
Users in training for user only model
user
user + friend
1.0
1.7
16.8
34.0
30.5
63.9
47.9
103.0
71.3
157.4
122.9
271.0
50
100
150
200
250
Feb Apr Jun Aug Oct Dec
Correctlyclassified
user
friend
Oracle is
100% correct
Thousands of tweets in training
Summary: Dynamic Learning and
Prediction
• Active learning > iterative batch
• N, UN > U: “neighbors give you away”
• Higher confidence => higher precision,
lower confidence => higher recall (as expected)
• Rationales significantly improve results
Practical Recommendations:
Models for Targeted Advertising
Prediction
quality
(better accuracy
over time)
Time
(correctly
classified
users
faster)
Data steam
(user, friend
or joint)
Models with
rationale
filtering
IBR, AWR
Higher
confidence
threshold 0.95
Models
without
rationale
filtering
IB, AWOO
Lower
confidence
threshold 0.55
User + Friend > User
Recap: Why these models are good?
• Models streaming nature of social media
• Limited user content => take advantage of
neighbor content
• Actively learn from crowdsourced rationales
• Learn on the fly => data drift
• Predict from multiple streams => topical sparsity
• Flexible extendable framework:
– More features: word embeddings, interests, profile info,
tweeting behavior
Software Requirements
• Python: https://www.python.org/downloads/ python –V
• Pip: https://pip.pypa.io/en/latest/installing.html
python get-pip.py
• Twython: https://pypi.python.org/pypi/twython/
pip install twython
• matplotlib 1.3.1:
http://sourceforge.net/projects/matplotlib/files/matplotlib/
• numpy 1.8.0: http://sourceforge.net/projects/numpy/files/NumPy/
• scipy 0.13: http://sourceforge.net/projects/scipy/files/scipy/
• scikit-learn 0.14.1: http://sourceforge.net/projects/scikit-learn/files/
python -c "import sklearn; print sklearn.__version__"
python -c "import numpy; print numpy.version.version"
python -c "import scipy; print scipy.version.version"
python -c "import matplotlib; print matplotlib.__version__"
Part II. Practice Session Outline
• Details on data collection and annotation
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for online inference
– Bayesian updates from multiple data streams
Part II. Practice Session Outline
• Details on data collection and annotation
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for online inference
– Bayesian updates from multiple data streams
JHU: Data Overview and Annotation
Scheme
friend hashtag
reply
@mentionfollower
retweet
Political Preferences:
– Candidate-Centric = 1,031 users (follow candidates)
– Geo-Centric = 270 users (self-reports in DE, MD, VA)
– Politically Active* = 371 users (active & follow cand)
Age (self-reports)*: 387 users
Gender (name)*: 384 users
10 - 20 neighbors of each of 6 types
Details on Twitter data collection:
http://www.cs.jhu.edu/~svitlana/data/data_collection.pdf
*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013
Explain relationships
Links to Download JHU Attribute Data
• How does the data look like?
– graph_type.neighbor_type.tsv e.g., cand-centric.follower.tsv
• JHU gender and age:
http://www.cs.jhu.edu/~svitlana/data/graph_gender_age.tar.gz
• JHU politically active*:
http://www.cs.jhu.edu/~svitlana/data/graph_zlr.tar.gz
• JHU candidate-
centric:http://www.cs.jhu.edu/~svitlana/data/graph_cand.tar.gz
• JHU geo-
centric:http://www.cs.jhu.edu/~svitlana/data/geo_cand.tar.gz
Code to query Twitter API
• Repo: https://bitbucket.org/svolkova/queryingtwitter
– get lists of friends/followers for a user
– 200 recent tweets for k randomly sampled
retweeted or mentioned users
– tweets for a list of userIDs
JSON
Objects
Extract text fields
time, #friends
Tweet
Collection
userIDs/tweetIDs
Part II. Practice Session Outline
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
MSR: Psycho-Demographic Annotations
via Crowdsourcing
5K profiles
0.0 0.5 1.0
Intelligence
Relationship
Religion
Political
Education
Optimism
Income
Life…
Age
Children
Gender
Ethnicity
Cohen's Kappa
(2% random sample)
Attribute
Models
ΦA(u)
UL
UP
5K Millions!
Trusted
crowd
$6/hour
quality
control
MSR: Emotion Annotations via
Distant Supervision
6 Ekman’s Emotions hashtags (Mohammad et al.’14)
+ emotion synonym hashtags
Part II. Practice Session
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– Tweet-based: emotions
– User-based: psycho-demographic attributes
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
How to get MSR models and code?
https://bitbucket.org/svolkova/psycho-demographics
1. Load models for 15 psycho-demographic attributes + emotions
2. Extract features from input tweets
3. Apply pre-trained models to make predictions for input tweets
Predictive Models
Supervised text classification
Log-linear models
User-based:
• Lexical: normalized binary/count-based ngrams
• Affect: emotions, sentiments
Tweet-based:
• BOW + Negation, Stylistic +0.3F1
• Socio-linguistic and stylistic:
• Elongations Yaay, woooow,
• Capitalization COOL, Mixed Punctuation ???!!!
• Hashtags and Emoticons
F u( ) =
a0
1
1+e-q×f
³ 0.5,
a1 otherwise.
ì
í
ï
îï
Tweet-based: Emotion Prediction
0.62
0.64
0.77
0.79
0.80
0.92
0.00 0.50 1.00
Sadness
Surprise
Fear
Joy
Anger
Disgust
F1 score (higher is better)
6 classes: joy, sadness, fear, surprise, disgust, anger
F1=0.78 (Roberts’12 0.67, Qadir’13 0.53,
Mohammad’14 0.49)
User-Based: Attribute Prediction
Religion
Relationship
Age
Political
Children
Optimism
Life Satisfaction
Income
Intelligence
Education
Gender
Race
ROC AUC
0.0
0.2
0.4
0.6
0.8
1.0
0.63
0.63
0.66
0.72
0.72
0.72
0.72
0.73
0.75
0.77
0.9
0.93
Relationship
Religion
Children
Political
Age
Intelligence
Optimism
Life Satisfaction
Income
Education
Gender
Ethnicity
ROC AUC
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.74
0.74
0.8
0.82
0.83
0.83
0.83
0.84
0.85
0.88
0.95
0.97
EmoSentOut
EmoSentDiff
Above + Lexical
+0.04
+0.05
+0.11
+0.12
+0.12
+0.11
+0.08
+0.17
+0.10
+0.08
+0.11
+0.11
Gain over
BOW
Relationship
Religion
Children
Political
Age
Intelligence
Optimism
ife Satisfaction
Income
Education
Gender
Ethnicity
ROC AUC
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.74
0.74
0.8
0.82
0.83
0.83
0.83
0.84
0.85
0.88
0.95
0.97
EmoSentOut
EmoSentDiff
Above + Lexical
F u( ) =
a0
1
1+e-q×f
³ 0.5,
a1 otherwise.
ì
í
ï
îï
Predicting Demographics from User
Outgoing Emotions and Opinions
disgust
negative
sadness
fear
s_score
anger
surprise
e_score
joy
positive
neutral
Emotion and Opinion Features
i
r
e
i
g
a
c
p
r
r
o
l
-2 0 1 2
Column Z-Score
0
Color Key
and Histogram
Count
disgust
negative
sadness
fear
s_score
anger
surprise
e_score
joy
positive
neutral
intelligence
race
education
income
gender
age
children
political
religion
relationship
optimism
life_satisf
PredictedAttributes
-2 0 1 2
Column Z-Score
0
Color Key
and Histogram
Count
0.76
0.76
0.58
0.58
0.65
0.66
0.72
0.76
0.75
0.76
0.69
0.73
AUC ROC
Satisfied
Optimist
Dissatisfied
Pessimist
No Kids
Below 25 y.o.
Female Male
1/3 attributes AUC >=75%
How to get JHU models and code?
 Ex1: Train and test batch models
 Ex2: Train a model from a training file and save it
 Ex3: Predict an attribute using a pre-trained model and plot
iterative updates
 Ex4: Predict and plot iterative updates for multiple attributes
using pre-trained models from a single communication stream
 Ex5: Predict and plot iterative updates for multiple attributes
from multiple communication streams
https://bitbucket.org/svolkova/attribute
Ex1. Train/Test Batch Models
• Run as e.g., for gender:
• Customize features and model type/parameters:
Accuracy
Ex2. Save Pre-trained Models
• Run as e.g., age:
• Customize features (process.py), model type and
parameters (predict.py)
Part II. Practice Session
• Data and annotation schema description
– JHU: gender, age and political preferences
– MSR: emotions, opinions and psycho-demographics
• Python examples for static inference:
– User-based: psycho-demographics
– Tweet-based: emotions, opinions
• Python examples for streaming inference:
– Bayesian updates from multiple data streams
Recap: Iterative Bayesian Updates
Time
t1 t2 tk…
Pt1
R Tt1( )= 0.52
Ptk
R Ttk( )= 0.77
tk-1
Pt2
R Tt2( )= 0.62
?
P a = R T( )=
P tk a = R( )×P a = R( )
k
Õ
P tk a = R( )×P a = R( )
k
Õ + P tk a = D( )×P a = D( )
k
Õ
Pt2
R Ttk-1( )= 0.65
t0
?
Class prior
Likelihood
Posterior
Ex3. Iterative Updates for a Single
Attribute from a Single Stream
Ex4. Iterative Updates for Multiple
Attributes from a Single Stream
Steps:
1. Loading Models
2. Processing data
3. Setting up train/test priors
4. Making Predictions
5. Plotting results
Joint User-Neighbor Streams
friend hashtag
reply
@mentionfollower
retweet
Ex5. Iterative Updates for Multiple
Attributes from Joint Streams
Likelihood Posterior
Questions?
http://www.cs.jhu.edu/~svitlana/
svitlana@jhu.edu
References: http://www.cs.jhu.edu/~svitlana/references.pdf
Slides: http://www.cs.jhu.edu/~svitlana/slides.pptx

More Related Content

PDF
A survey on automatic detection of hate speech in text
PDF
Usability Report - Discovery Tools
PDF
Ijcai ip-2015 cyberbullying-final
PPT
Micro-Blogging - How to Use Twitter for Teaching and Learning
PPTX
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
PDF
Lecture: Semantic Word Clouds
PDF
Eavesdropping on the Twitter Microblogging Site
DOC
Lei_Resume-it.doc
A survey on automatic detection of hate speech in text
Usability Report - Discovery Tools
Ijcai ip-2015 cyberbullying-final
Micro-Blogging - How to Use Twitter for Teaching and Learning
What’s in a Country Name – Twitter Hashtag Analysis of #singapore
Lecture: Semantic Word Clouds
Eavesdropping on the Twitter Microblogging Site
Lei_Resume-it.doc

Similar to NAACL Tutorial
Social Media Predictive Analytics (20)

PDF
Impact the UX of Your Website with Contextual Inquiry
PDF
Impact your Library UX with Contextual Inquiry
PDF
ESWC 2014 Tutorial Part 4
PDF
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
PPTX
Integrating Technology-rich Assignments in the Curriculum
PPTX
Working with Social Media Data: Ethics & good practice around collecting, usi...
PDF
Tools for (Almost) Real-Time Social Media Analysis
PDF
Stefanie Haustein, Timothy D. Bowman, Benoît Macaluso, Cassidy R. Sugimoto &...
PDF
IRJET - Social Media Intelligence Tools
PPTX
BiographyNet: Linking the world of History
PDF
Lecture 5: Personalization on the Social Web (2014)
PPTX
Contextual Inquiry: How Ethnographic Research can Impact the UX of Your Website
PPTX
Generative AI and Educational Development: opportunity, challenge, or potenti...
PPTX
Learning Analytics - CET Seminar 2012
PDF
Visualising activity in learning networks using open data and educational ...
PPT
KOTESOL 2007 Presentation
PDF
Lecture 4: How can we MINE, ANALYSE & VISUALISE the Social Web? (2014)
PPT
4.16.15 Slides, “Enhancing Early Career Researcher Profiles: VIVO & ORCID Int...
PPTX
Influencing the MOOC agenda - analysis of #MOOC Twitter Data
PPTX
WAPWG Jan 2020 Sloan cosmos workshop
Impact the UX of Your Website with Contextual Inquiry
Impact your Library UX with Contextual Inquiry
ESWC 2014 Tutorial Part 4
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
Integrating Technology-rich Assignments in the Curriculum
Working with Social Media Data: Ethics & good practice around collecting, usi...
Tools for (Almost) Real-Time Social Media Analysis
Stefanie Haustein, Timothy D. Bowman, Benoît Macaluso, Cassidy R. Sugimoto &...
IRJET - Social Media Intelligence Tools
BiographyNet: Linking the world of History
Lecture 5: Personalization on the Social Web (2014)
Contextual Inquiry: How Ethnographic Research can Impact the UX of Your Website
Generative AI and Educational Development: opportunity, challenge, or potenti...
Learning Analytics - CET Seminar 2012
Visualising activity in learning networks using open data and educational ...
KOTESOL 2007 Presentation
Lecture 4: How can we MINE, ANALYSE & VISUALISE the Social Web? (2014)
4.16.15 Slides, “Enhancing Early Career Researcher Profiles: VIVO & ORCID Int...
Influencing the MOOC agenda - analysis of #MOOC Twitter Data
WAPWG Jan 2020 Sloan cosmos workshop
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
Introduction to Knowledge Engineering Part 1
Miokarditis (Inflamasi pada Otot Jantung)
ISS -ESG Data flows What is ESG and HowHow
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
Business Acumen Training GuidePresentation.pptx
Business Analytics and business intelligence.pdf
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
oil_refinery_comprehensive_20250804084928 (1).pptx
Database Infoormation System (DBIS).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Ad

NAACL Tutorial
Social Media Predictive Analytics

  • 1. NAACL Tutorial Social Media Predictive Analytics Svitlana Volkova1, Benjamin Van Durme1,2, David Yarowsky1 and Yoram Bachrach3 1Center for Language and Speech Processing, Johns Hopkins University, 2Human Language Technology Center of Excellence, 3Microsoft Research Cambridge
  • 2. Tutorial Schedule Part I: Theoretical Session (2:00 – 4:30pm) Batch Prediction Online Inference Coffee Break (3:30 – 4:00pm) Dynamic Learning and Prediction Part II: Practice Session (4:30 – 5:30pm) Code and Data
  • 3. Tutorial Materials • Slides: – http://www.cs.jhu.edu/~svitlana/slides.pptx • Code and Data: – https://bitbucket.org/svolkova/queryingtwitter – https://bitbucket.org/svolkova/attribute – https://bitbucket.org/svolkova/psycho-demographics • References: – http://www.cs.jhu.edu/~svitlana/references.pdf
  • 4. Social Media Obsession Diverse Billions of messages Millions of users
  • 5. What do they think and feel? Where do they go? What is their demographics and personality? What do they like? What do they buy?
  • 6. First: a comment on privacy and ethics…
  • 7. Why is language in social media so interesting? • Very Short – 140 chars • Lexically divergent • Abbreviated • Multilingual
  • 8. Why is language in social media so challenging? • Data drift • User activeness => generalization • Topical sparsity => relationship, politics • Dynamic streaming nature
  • 10. Predictive Analytics Services • Social Network Prediction – https://apps.facebook.com/snpredictionapp/ • Twitter Psycho-Demographic Profile and Affect Inference – http://twitterpredictor.cloudapp.net (pswd: twitpredMSR2014) • My personality Project – http://mypersonality.org/wiki/doku.php • You Are What You Like – http://youarewhatyoulike.com/ • Psycho-demographic trait predictions – http://applymagicsauce.com/ • IBM Personality – https://watson-pi-demo.mybluemix.net • World Well Being Project – http://wwbp.org
  • 11. Applications: Retail Personalized marketing • Detecting opinions and emotions users express about products or services within targeted populations Personalized recommendations and search • Making recommendations based on user emotions, demographics and personality
  • 12. Applications: Advertising Online targeted advertising • Targeting ads based on predicted user demographics • Matching the emotional tone the user expects Deliver adds fast Deliver adds to a true crowd vs. vs. vs.
  • 13. Applications: Polling Real-time live polling • Mining political opinions • Voting predictions within certain demographics Large-scale passive polling • Passive poling regarding products and services vs.
  • 14. Applications: Health Large-scale real-time healthcare analytics • Identifying smokers, drug addicts, healthy eaters, people into sports (Paul and Dredze 2011) • Monitoring flue-trends, food poisonings, chronic illnesses (Culotta et. al. 2015)
  • 15. Applications: HR Recruitment and human resource management • Estimating emotional stability and personality of the potential and current employees • Measuring the overall well-being of the employees e.g., life satisfaction, happiness (Schwartz et. al. 2013; Volkova et. al., 2015) • Monitor depression and stress level (Coppersmith et. al. 2014)
  • 16. User Attribute Prediction Task Political Preference Rao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013; Volkova et. al, 2014 . . . Communications Gender Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van Durme, 2013 Age Rao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013; Nguyen et al., 2011, 2013; Sap et al., 2014 … … … … AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting
  • 17. Tweets Revealing User Attributes ? ? ? ?
  • 18. Supervised Models Classification: binary (SVM) – gender, age, political, ethnicity • Goswami et. al., 2009; Rao et al. 2010; Burger et al. 2011; Mislove et al. 2012; Nguyen et al. 2011; Nguyen et al. 2013; • Pennacchiotti and Popescu 2011; Connover et. al. 2011; Filippova et. al. 2012; Van Durme 2012; Bergsma et. al. 2012, 2013; Bergsma and Van Durme 2013; • Zamal et al. 2012; Ciot et. al. 2013; Cohen and Ruths 2013; • Schwartz et. al. 2013; Sap et. al., 2014; Kern et. al., 2014; Schwartz et. al. 2013; Golbeck et. al. 2011; Kosinski et. al. 2013; • Volkova et. al. 2014; Volkova et al. 2015. Unsupervised and Generative Models • name morphology for gender & ethnicity prediction - Rao et al. 2011; • large-scale clustering - Bergsma et. al. 2013; Culotta et. al. 2015; • demographic language variations - Eisenstein et al. 2010; O’Connor et al. 2010; Eisenstein et. al. 2014. *Rely on more than lexical features e.g., network, streaming
  • 19. Existing Approaches ~1K Tweets* …. … …. … …. … …. … …. … …. … …. … …. … Does an average Twitter user produce thousands of tweets? *Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013 Tweets as a document
  • 20. How Active are Twitter Users?
  • 21. Attributed Social Network User Local Neighborhoods a.k.a. Social Circles
  • 22. Approaches Static (Batch) Prediction Streaming (Online) Inference Dynamic (Iterative) Learning and Prediction • Offline training • Offline predictions + Neighbor content • Offline training + Online predictions over time • Exploring 6 types of neighborhoods • Online predictions • Relying on neighbors + Iterative re-training + Active learning + Rationale annotation Topical sparsity Data drift Streaming nature Model generalization
  • 23. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 24. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 25. How to get data? Twitter API • Twitter API: https://dev.twitter.com/overview/api • Twitter API Status:https://dev.twitter.com/overview/status • Twitter API Rate Limits: https://dev.twitter.com/rest/public/rate-limits
  • 26. Querying Twitter API • Twitter Developer Account => access key and token https://dev.twitter.com/oauth/overview/application- owner-access-tokens twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET) I. Access 1% Twitter Firehouse and sample from it II. Query Twitter API to get:  user timelines (up to 3200 tweets) from userIDs  tweet json objects from tweetIDs  lists of friendIDs (5000 per query) from userIDs
  • 28. How to get labeled data? • Supervised classification in a new domain: – Labeled data ≈ ground truth – Costly and time consuming to get! • Ways to get ≈“ground truth” annotations:  Fun psychological tests (voluntarily): myPersonality project  Profile info: Facebook e.g., relationship, gender, age but sparse for Twitter  Self reports: “I am a republican…” (Volkova et al. 2013), “Happy ##th/st/nd/rd birthday to me” (Zamal et. al. 2012), “I have been diagnosed with …” (Coppersmith et. al. 2014), “I am a writer …” (Beller at. al., 2014)  Distant supervision: following Obama vs. Romney (Zamal et. al. 2012), emotion hashtags (Mohammad et. al, 2014), user name (Burger et. al., 2011)  Crowdsourcing: subjective perceived annotations (Volkova et. al.2015), rationales (Bergsma et. al., 2013, Volkova et. al, 2014; 2015) Attribute Model ΦA(u) UL UP
  • 29. Twitter Social Graph friend hashtag reply @mentionfollower retweet I. Candidate-Centric (distant supervision) 1,031 users II. Geo-Centric (self-reports) 270 users III. Politically Active (distant supervision)* 371 users (Dem; Rep) IV. Age (self-reports)* 387 users (18 – 23; 23 - 25) V. Gender (name)* 384 users (Male; Female) Balanced datasets *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 Code, data and trained models for gender, age, political preference prediction http://www.cs.jhu.edu/~svitlana/ 10 - 20 neighbors of 6 types per user What types of neighbors lead to the best attribute predictions?
  • 30. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 31. Classification Model • Logistic regression = max entropy = log linear models – Map discrete inputs w to binary output y • Other options: SVM, NB wi = 0,1{ } y = M,F{ } hair eat co ol wor k … xbo x Femal e 1 1 0 0 … 0 Male 0 1 0 1 … 1 Male 0 0 1 1 … 1 http://scikit-learn.org/stable/modules/generated/ sklearn.linear_model.LogisticRegression.html Labeledusers (Training) Vocabulary size hair eat co ol wor k … xbo x ? 0 1 0 0 … 1 Feature vector Test user
  • 32. Features (I) • Lexical: – normalized counts/binary ngrams (Goswami el. al. 2010; Rao et. al. 2010; Pennacchiotti and Popescu 2011; Ngyen et. al. 2013; Ciot et. al. 2013; Van Durme 2012; Kern et. al. 2014; Volkova et. al. 2014; Volkova and Van Durme 2015) – class-based highly predictive (Bergsma and Van Durme 2013), rationales (Volkova and Yarowsky 2014); character-based (Peersman et. al. 2011), stems, co-stems, lemmas (Zamal et. al. 2012; Cohen et. al. 2014) • Socio-linguistic, syntactic and stylistic: – syntax and style (Shler et. al. 2006; Cheng at. al., 2011), smiles, excitement, emoticons and psycho-linguistic (Rao et. al. 2010; Marquardt et. al. 2014; Kokkos et. sl. 2014; Hovy 2015) – lexicon features (Sap et. al. 2014); linguistic inquiry and word count (LIWC) (Mukherjee et. al. 2010; Fink et. al. 2012)
  • 33. Features (II) • Communication behavior: response/retweet/tweet frequency, retweeting tendency (Connover et. al. 2011; Golbeck et. al. 2011; Pennacchiotti and Popescu 2011; Preotic at. al. 2015) • Network structure: follower-following ratio, neighborhood size, in/out degree, degree of connectivity (Bamman et. al. 2012; Filippova 2012; Zamal et. al. 2012, Culotta et. al. 2015) • Other: likes (Bachrach et. al. 2012; Kosinski et. al. 2014), name or census (Burger et. al. 2011; Liu and Ruths 2013), links/images (Rosenthal and McKeown 2011) • Topics: word embeddings, LDA topics, word clusters (Preotic at. al. 2015) hair eat coo l wor k … xbo x RT neig h image s …. Female 1 1 0 0 … 0 0.3 30 0.5 ….
  • 34. Batch Experiments • Log-linear word unigram models: (I) Users vs. (II) Neighbors and (III) User-Neighbor • Evaluate different neighborhood types: – varying neighborhood size n=[1, 2, 5, 10] and content amount t=[5, 10, 15, 25, 50, 100, 200] – 10-fold cross validation with 100 random restarts for every n and t parameter combination F = argmaxa P A = a T( )
  • 35. User Model Fu = D if 1 1+e-q f u ³ 0.5 R otherwise. ì í ï î ï Train Graph vi Test Graph t :…Ron Paul not a fan of Chris Christie ft vj : w1 = 0,w2 = 0,…,wn = 0[ ] t : Washington Post Columnist… ft vi : w1 =1,w2 =1,…,wn = 0[ ] vj t : We're watching you House @GOP ft vk = w1 =1,w2 =1,…,wn = 0[ ] vk -?
  • 36. Neighbor Model HLTCOE Text Meeting, June 09 2014 Train Graph vi Test Graph t :Obama: I'd defend @MajorCBS ft N vi( ) = w1,w2,…,wn[ ] vj t :@FoxNews: WATCH LIVE ft N vk( ) = w1,w2,…,wn[ ] t : The Lyin King #RepMovies ft N vj( ) = w1,w2,…,wn[ ] vk -? F N u( ) = D if 1 1+e-q f N u( ) ³ 0.5 R otherwise. ì í ï î ï
  • 37. Joint User-Neighbor Model Train Graph vi Test Graph t :…Ron Paul not a fan of Christie The Lyin King #RepublicanMovies ft vj +N vj( ) = w1,w2,…,wn[ ] t : Washington Post Columnist… Obama: I'd defend @MajorCBS ft vi+N vi( ) = w1,w2,…,wn[ ] vj t :@FoxNews: WATCH LIVE We're watching you House @GOP ft vk +N vk( ) = w1,w2,…,wn[ ] vk -? Learning on user and neighbor features jointly (not prefixing features) F u+N u( ) = D if 1 1+e-q f u+N u( ) ³ 0.5 R otherwise. ì í ï î ï
  • 38. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 39. Gender Prediction ? 5 10 15 20 0 50 100 150 200 Tweets Neighbors retweet.counts usermention.counts 0.73 5 10 20 50 100 500 0.500.600.700.80 Tweets Per User Accuracy Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR 5 10 15 20 0 50 100 150 200 Tweets Neighbors friend.counts usermention.binary Neighbor: 0.63 User-Neigh: 0.73 User: 0.82 40
  • 41. Gender Prediction Quality Approach Users Tweets Features Accuracy Rao et al., 2010 1K 405 BOW+socioling 0.72 Burger et al., 2011 184K 22 username, BOW 0.92 Zamal et al., 2012 384 10K neighbor BOW 0.80 Bergsma et al., 2013 33.8K − BOW, clusters 0.90 JHU models 383 200/2K BOW user/neigh 0.82/0.73 • This is not a direct comparison => Twitter data sharing restrictions • Poor generalization: different datasets = different sampling and annotation biases
  • 42. Age Prediction 5 10 15 20 0 50 100 150 200 Tweets Neighbors follower.counts friend.counts 5 10 15 20 0 50 100 150 200 Tweets Neighbors friend.counts retweet.counts 5 10 20 50 100 500 0.500.600.70 Tweets Per User Accuracy Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR ? User: 0.77 Neighbor: 0.72 User-Neigh: 0.77 18 – 23 23 – 25
  • 44. Age Prediction Quality ? Approach Users Tweets Groups Features Accuracy Rao et al., 2010 2K 1183 <=30; > 30 BOW+socioling 0.74 Zamal et al., 2012 386 10K 18 – 23; 23 - 25 neighbor BOW 0.80 JHU models 381 200/2K 18 – 23; 23 - 25 BOW/neighbors 0.77/0.74 • This is not a direct comparison! • Performance for different age groups • Sampling and annotation biases
  • 45. Political Preference 5 10 15 20 0 50 100 150 200 Tweets Neighbors friend.counts retweet.binary 5 10 15 20 0 50 100 150 200 Tweets Neighbors friend.counts retweet.binary usermention.binary 5 10 20 50 100 500 0.550.650.750.85 Tweets Per User Accuracy Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR Useruni Userbin User bi Usertri UserOnlyZLR ? 0.91 User: 0.89 User-Neigh: 0.92 Neighbor: 0.91
  • 46. Lexical Markers for Political Preferences
  • 47. Model Generalization • Political preference classification is not easy! • Topical sparsity: average users rarely tweet about politics 0.57 0.67 0.690.72 0.75 0.870.89 0.91 0.92 0.00 0.20 0.40 0.60 0.80 1.00 User Neighbor User-Neighbor Accuracy Geo-centric Cand-centric Active
  • 48. Approach Users Tweets Features Accuracy Bergsma et al., 2013 400 5K BOW, clusters 0.82 Pennacchiiotti 2011 10.3K − BOW, network 0.89 Conover et al., 2011 1K 1K BOW, network 0.95 Zamal et al., 2012 400 1K neighbor BOW 0.91 JHU active 371 200 BOW user/neigh 0.89/0.92 JHU cand centric 1,051 200 BOW user/neigh 0.72/0.75 Political Preference Prediction Quality JHU geo-centric 270 200 BOW user/neigh 0.57/0.67 Cohen et al., 2013 262 1K BOW, network 0.68 Politically Active Users (sampling/annotation bias) Random /Average Users
  • 49. Querying more neighbors with less tweets is better than querying more tweets from the existing neighbors Limited Twitter API Calls
  • 50. Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ?
  • 51. Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ?
  • 52. Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ?
  • 53. Optimizing Twitter API Calls Cand-Centric Graph: Friend Circle ?
  • 54. Summary: Static Prediction • Features: Binary (political) vs. count-based features (age, gender) • Homophily: “neighbors give you away” => users with no content • Attribute assortativity: similarity with neighbors depends on attribute types • Content from more neighbors per user >> additional content from the existing neighbors • Generalization of the classifiers FollowerFriend Retweet Mention MentionFriend N UN
  • 55. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 56. Iterative Bayesian Predictions Time t1 t2 tk… Pt1 R Tt1( )= 0.52 Ptk R Ttk( )= 0.77 tk-1 Pt2 R Tt2( )= 0.62 ? P a = R T( )= P tk a = R( )×P a = R( ) k Õ P tk a = R( )×P a = R( ) k Õ + P tk a = D( )×P a = D( ) k Õ Pt2 R Ttk-1( )= 0.65 t0 ? Class prior Likelihood Posterior
  • 57. Cand-Centric Graph: Posterior Updates 0.5 0.6 0.7 0.8 0.9 1.0 0 20 40 60 p(Republican|T) 0.3 0.4 0.5 blican|T) t2 ? t0 … Time t1 tk-1 t2 ? t0 … Time t1 tk-1 0.5 0.6 0 20 40 60 0.0 0.1 0.2 0.3 0.4 0.5 0 20 40 60 Tweet Stream (T) p(Republican|T)
  • 58. Cand-Centric: Prediction Time (1) 300 400 500 0 1 2 3 4 5 Time in Weeks Users User-Neighbor 300 400 500 0 5 10 15 Time in Weeks Users 0.75 0.95 User Stream Dem Rep Prediction confidence: 0.95 vs. 0.75 Democrats are easier to predict than republicans Dem Rep Usersclassified correctly
  • 59. Cand-Centric Graph: Prediction Time (2) 0.02 12 20 0.01 19 8.9 0.002 1.2 3.2 0.001 3.5 1.1 0.001 0.01 0.1 1 10 100 Weeks(logscale) How much time does it take to classify 100 users with 75% confidence? Compare: User Stream vs. Joint User-Neighbor Stream Cand-centric Geo-Centric Active 60
  • 60. Batch vs. Online Performance 0.99 0.84 0.89 0.99 0.88 0.99 0.0 0.2 0.4 0.6 0.8 1.0 Cand Geo Active User Stream User-Neighbor Stream 0.72 0.57 0.75 0.75 0.67 0.86 0.0 0.2 0.4 0.6 0.8 1.0 Cand Geo Active Accuracy User Batch Neighbor Batch ? 61
  • 61. Summary: Online Inference • Homophily: Neighborhood content is useful* • Lessons learned from batch predictions: – Age: user-follower or user-mention joint stream – Gender: user-friend joint stream – Political: user-mention and user-retweet joint stream • Streaming models >> batch models • Activeness: tweeting frequency matters a lot! • Generalization of the classifiers: data sampling and annotation biases *Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012; Volkova et. al., 2014
  • 62. Part I Outline I. Batch Prediction i. How to collect and annotate data? ii. What models and features to use? iii. Which neighbors are the most predictive? II. Online Inference i. How to predict from a stream? I. Dynamic (Iterative) Learning and Prediction i. How to learn and predict on the fly?
  • 63. Iterative Batch Learning Time R D ? ? t1 t0 t1 tkt2 … t1 LabeledUnlabeled t1 t1 Pt1 R t1( )= 0.52 Ptk R t1…tm( )= 0.77  Iterative Batch Retraining (IB)  Iterative Batch with Rationale Filtering (IBR) ? tm… tm t2 … t2 … tm t2 …
  • 64. Active Learning LabeledUnlabeled F u,t1( ) F n,t1( ) 1-Jan-2011 1-Feb-2011 1-Nov-2011 1-Dec-2011 Time … … t0 t1 tk-1 tk u ni Î N Pt0 R t1( )= 0.5 Pt1 R t1…t5( )= 0.55 Ptk-1 R t1…t100( )= 0.77 >q  Active Without Oracle (AWOO)  Active With Rationale Filtering (AWR)  Active With Oracle (AWO)
  • 65. Annotator Rationales Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling decisions feature norms (psychology), feature sparsity Bergsma and Van Durme, 2013; Volkova and Yarowsky, 2014; Volkova and Van Durme, 2015
  • 66. Alternative: Rationale Weighting • Annotator rationales for gender, age and political: http://www.cs.jhu.edu/~svitlana/rationales.html • Multiple languages: English, Spanish • Portable to other languages Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales. Svitlana Volkova and David Yarowsky. NIPS Workshop on Personalization: Methods and Applications 2014.
  • 67. Performance Metrics • Accuracy over time: • Find optimal models: – Data steam type (user, friend, user + friend) – Time (more correctly classified users faster) – Prediction quality (better accuracy over time) Aq,t = #correctly classified #above threshold = TD+TR D+ R
  • 68. Results: Iterative Batch Learning 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 250 300 Mar Jun Sep Accuracy Correctlyclassified user user + friend 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 250 300 Mar Jun Sep Accuracy Correctlyclassified user user + friend IB: higher recall IBR: higher precision Time: # correctly classified users increases over time IB faster, IBR slower Data stream selection: User + friend stream > user stream
  • 69. Results: Active Learning AWOO: higher recall AWR: higher precision Time: Unlike IB/IBR models, AWOO/AWR models classify more users correctly faster (in Mar) but then plateaus 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 250 300 Mar Jun Sep Accuracy Correctlyclassified user user + friend 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 250 300 Mar Jun Sep Accuracy Correctlyclassified user user + friend
  • 70. 0.5 0.6 0.7 0.8 0.9 1.0 Mar Jun Sep Accuracy IB: user IBR: user 0.5 0.6 0.7 0.8 0.9 1.0 Mar Jun Sep Accuracy AWOO: user AWR: user 0.5 0.6 0.7 0.8 0.9 1.0 Mar Jun Sep Accuracy IBR: user + friend IB: user + friend 0.5 0.6 0.7 0.8 0.9 1.0 Mar Jun Sep Accuracy AWR: user + friend AWOO: user + friend batch < active user+friend>user Results: Model Quality
  • 71. Active with Oracle Annotations 50 160 182 198 213 234 50 125 200 275 350 Feb Apr Jun Aug Oct Dec Cumul.requests toOracle Users in training for user only model user user + friend 1.0 1.7 16.8 34.0 30.5 63.9 47.9 103.0 71.3 157.4 122.9 271.0 50 100 150 200 250 Feb Apr Jun Aug Oct Dec Correctlyclassified user friend Oracle is 100% correct Thousands of tweets in training
  • 72. Summary: Dynamic Learning and Prediction • Active learning > iterative batch • N, UN > U: “neighbors give you away” • Higher confidence => higher precision, lower confidence => higher recall (as expected) • Rationales significantly improve results
  • 73. Practical Recommendations: Models for Targeted Advertising Prediction quality (better accuracy over time) Time (correctly classified users faster) Data steam (user, friend or joint) Models with rationale filtering IBR, AWR Higher confidence threshold 0.95 Models without rationale filtering IB, AWOO Lower confidence threshold 0.55 User + Friend > User
  • 74. Recap: Why these models are good? • Models streaming nature of social media • Limited user content => take advantage of neighbor content • Actively learn from crowdsourced rationales • Learn on the fly => data drift • Predict from multiple streams => topical sparsity • Flexible extendable framework: – More features: word embeddings, interests, profile info, tweeting behavior
  • 75. Software Requirements • Python: https://www.python.org/downloads/ python –V • Pip: https://pip.pypa.io/en/latest/installing.html python get-pip.py • Twython: https://pypi.python.org/pypi/twython/ pip install twython • matplotlib 1.3.1: http://sourceforge.net/projects/matplotlib/files/matplotlib/ • numpy 1.8.0: http://sourceforge.net/projects/numpy/files/NumPy/ • scipy 0.13: http://sourceforge.net/projects/scipy/files/scipy/ • scikit-learn 0.14.1: http://sourceforge.net/projects/scikit-learn/files/ python -c "import sklearn; print sklearn.__version__" python -c "import numpy; print numpy.version.version" python -c "import scipy; print scipy.version.version" python -c "import matplotlib; print matplotlib.__version__"
  • 76. Part II. Practice Session Outline • Details on data collection and annotation – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for online inference – Bayesian updates from multiple data streams
  • 77. Part II. Practice Session Outline • Details on data collection and annotation – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for online inference – Bayesian updates from multiple data streams
  • 78. JHU: Data Overview and Annotation Scheme friend hashtag reply @mentionfollower retweet Political Preferences: – Candidate-Centric = 1,031 users (follow candidates) – Geo-Centric = 270 users (self-reports in DE, MD, VA) – Politically Active* = 371 users (active & follow cand) Age (self-reports)*: 387 users Gender (name)*: 384 users 10 - 20 neighbors of each of 6 types Details on Twitter data collection: http://www.cs.jhu.edu/~svitlana/data/data_collection.pdf *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 Explain relationships
  • 79. Links to Download JHU Attribute Data • How does the data look like? – graph_type.neighbor_type.tsv e.g., cand-centric.follower.tsv • JHU gender and age: http://www.cs.jhu.edu/~svitlana/data/graph_gender_age.tar.gz • JHU politically active*: http://www.cs.jhu.edu/~svitlana/data/graph_zlr.tar.gz • JHU candidate- centric:http://www.cs.jhu.edu/~svitlana/data/graph_cand.tar.gz • JHU geo- centric:http://www.cs.jhu.edu/~svitlana/data/geo_cand.tar.gz
  • 80. Code to query Twitter API • Repo: https://bitbucket.org/svolkova/queryingtwitter – get lists of friends/followers for a user – 200 recent tweets for k randomly sampled retweeted or mentioned users – tweets for a list of userIDs JSON Objects Extract text fields time, #friends Tweet Collection userIDs/tweetIDs
  • 81. Part II. Practice Session Outline • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for streaming inference: – Bayesian updates from multiple data streams
  • 82. MSR: Psycho-Demographic Annotations via Crowdsourcing 5K profiles 0.0 0.5 1.0 Intelligence Relationship Religion Political Education Optimism Income Life… Age Children Gender Ethnicity Cohen's Kappa (2% random sample) Attribute Models ΦA(u) UL UP 5K Millions! Trusted crowd $6/hour quality control
  • 83. MSR: Emotion Annotations via Distant Supervision 6 Ekman’s Emotions hashtags (Mohammad et al.’14) + emotion synonym hashtags
  • 84. Part II. Practice Session • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – Tweet-based: emotions – User-based: psycho-demographic attributes • Python examples for streaming inference: – Bayesian updates from multiple data streams
  • 85. How to get MSR models and code? https://bitbucket.org/svolkova/psycho-demographics 1. Load models for 15 psycho-demographic attributes + emotions 2. Extract features from input tweets 3. Apply pre-trained models to make predictions for input tweets
  • 86. Predictive Models Supervised text classification Log-linear models User-based: • Lexical: normalized binary/count-based ngrams • Affect: emotions, sentiments Tweet-based: • BOW + Negation, Stylistic +0.3F1 • Socio-linguistic and stylistic: • Elongations Yaay, woooow, • Capitalization COOL, Mixed Punctuation ???!!! • Hashtags and Emoticons F u( ) = a0 1 1+e-q×f ³ 0.5, a1 otherwise. ì í ï îï
  • 87. Tweet-based: Emotion Prediction 0.62 0.64 0.77 0.79 0.80 0.92 0.00 0.50 1.00 Sadness Surprise Fear Joy Anger Disgust F1 score (higher is better) 6 classes: joy, sadness, fear, surprise, disgust, anger F1=0.78 (Roberts’12 0.67, Qadir’13 0.53, Mohammad’14 0.49)
  • 88. User-Based: Attribute Prediction Religion Relationship Age Political Children Optimism Life Satisfaction Income Intelligence Education Gender Race ROC AUC 0.0 0.2 0.4 0.6 0.8 1.0 0.63 0.63 0.66 0.72 0.72 0.72 0.72 0.73 0.75 0.77 0.9 0.93 Relationship Religion Children Political Age Intelligence Optimism Life Satisfaction Income Education Gender Ethnicity ROC AUC 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.74 0.74 0.8 0.82 0.83 0.83 0.83 0.84 0.85 0.88 0.95 0.97 EmoSentOut EmoSentDiff Above + Lexical +0.04 +0.05 +0.11 +0.12 +0.12 +0.11 +0.08 +0.17 +0.10 +0.08 +0.11 +0.11 Gain over BOW Relationship Religion Children Political Age Intelligence Optimism ife Satisfaction Income Education Gender Ethnicity ROC AUC 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.74 0.74 0.8 0.82 0.83 0.83 0.83 0.84 0.85 0.88 0.95 0.97 EmoSentOut EmoSentDiff Above + Lexical F u( ) = a0 1 1+e-q×f ³ 0.5, a1 otherwise. ì í ï îï
  • 89. Predicting Demographics from User Outgoing Emotions and Opinions disgust negative sadness fear s_score anger surprise e_score joy positive neutral Emotion and Opinion Features i r e i g a c p r r o l -2 0 1 2 Column Z-Score 0 Color Key and Histogram Count disgust negative sadness fear s_score anger surprise e_score joy positive neutral intelligence race education income gender age children political religion relationship optimism life_satisf PredictedAttributes -2 0 1 2 Column Z-Score 0 Color Key and Histogram Count 0.76 0.76 0.58 0.58 0.65 0.66 0.72 0.76 0.75 0.76 0.69 0.73 AUC ROC Satisfied Optimist Dissatisfied Pessimist No Kids Below 25 y.o. Female Male 1/3 attributes AUC >=75%
  • 90. How to get JHU models and code?  Ex1: Train and test batch models  Ex2: Train a model from a training file and save it  Ex3: Predict an attribute using a pre-trained model and plot iterative updates  Ex4: Predict and plot iterative updates for multiple attributes using pre-trained models from a single communication stream  Ex5: Predict and plot iterative updates for multiple attributes from multiple communication streams https://bitbucket.org/svolkova/attribute
  • 91. Ex1. Train/Test Batch Models • Run as e.g., for gender: • Customize features and model type/parameters: Accuracy
  • 92. Ex2. Save Pre-trained Models • Run as e.g., age: • Customize features (process.py), model type and parameters (predict.py)
  • 93. Part II. Practice Session • Data and annotation schema description – JHU: gender, age and political preferences – MSR: emotions, opinions and psycho-demographics • Python examples for static inference: – User-based: psycho-demographics – Tweet-based: emotions, opinions • Python examples for streaming inference: – Bayesian updates from multiple data streams
  • 94. Recap: Iterative Bayesian Updates Time t1 t2 tk… Pt1 R Tt1( )= 0.52 Ptk R Ttk( )= 0.77 tk-1 Pt2 R Tt2( )= 0.62 ? P a = R T( )= P tk a = R( )×P a = R( ) k Õ P tk a = R( )×P a = R( ) k Õ + P tk a = D( )×P a = D( ) k Õ Pt2 R Ttk-1( )= 0.65 t0 ? Class prior Likelihood Posterior
  • 95. Ex3. Iterative Updates for a Single Attribute from a Single Stream
  • 96. Ex4. Iterative Updates for Multiple Attributes from a Single Stream Steps: 1. Loading Models 2. Processing data 3. Setting up train/test priors 4. Making Predictions 5. Plotting results
  • 97. Joint User-Neighbor Streams friend hashtag reply @mentionfollower retweet
  • 98. Ex5. Iterative Updates for Multiple Attributes from Joint Streams Likelihood Posterior

Editor's Notes

  • #5: University of Cambridge: You Are What You Like - http://youarewhatyoulike.com/ Psycho-demographic trait prediction engine - http://applymagicsauce.com/
  • #7: stress explicitly that there may be rules and regulations governing how you use different sorts of social media data the audience should know that privacy concerns are a thing they should keep in mind
  • #8: not enough evidence per user
  • #9: not enough evidence per user annotation cost
  • #10: http://twitterpredictor.cloudapp.net
  • #13: Reaching your crowd fast and easy!
  • #14: http://mashable.com/2015/02/16/1-800-flowers-valentines-day/
  • #17: What is social media predictive analytics?
  • #18: The signal that can reveal user attributes is in their language
  • #21: Limitations of the existing “unrealistic and unfair” models trained on thousands of tweets!
  • #34: Add income paper!
  • #35: Add income paper!
  • #43: Different datasets But if the twitter users are average, the models should generalize
  • #44: Slide about: regression, different age buckets (18 – 23, <25 or >25)
  • #47: Also run the experiments with prefixed features but got lower performance compared to these experiments where we treat user and neighbor features equally
  • #49: Mention data sampling and annotation biases again!
  • #56: When we rely on neighbor content the prediction accuracy varies across the attributes=> assortativity level
  • #58: Talk about the unbalanced priors and test/train class prior differences
  • #59: As time goes by and we get more evidence (tweets) our belief about a user being R or D improves
  • #60: Dashed line = 75%
  • #61: Dotted line = 75%
  • #62: Add HMM comparison slide
  • #67: Early months=> huge boost compared to latter month User tweets are more relevant than UN and N Mention other ways to incorporate rationales! Scalability to other languages!
  • #69: Threshold: 0.95; 0.75; 0.5
  • #72: Accuracy IBR > IB, AWR > AWOO IBR: higher precision, IB: higher recall
  • #73: Upper bound for classification UN > N > U UN and N are batter but require more requests to Turk (more money) from the very beginning Tradeoff between faster predictions with more resources vs. longer time to wait and save money
  • #74: Always use a joint stream (explore network content) Deliver ads faster: AWOO, IB Deliver ads to more accurate crowd: AWR Models without rational filtering require more computation(high-dimensional feature vectors)
  • #76: What we can and can’s do! Can predict but want to predict better Move from binary to multiclass => very hard Changing attributes
  • #77: Scikit-learn: http://scikit-learn.org/stable/install.html pip install -U numpy scipy matplotlib scikit-learn python -c "import sklearn; print sklearn.__version__" python -c "import numpy; print numpy.version.version" python -c "import scipy; print scipy.version.version"
  • #78: loading models, extracting features, making predictions
  • #79: loading models, extracting features, making predictions
  • #80: Explain social relationship types
  • #83: loading models, extracting features, making predictions
  • #84: Quality control!!!
  • #85: Previously used successfully for sentiment using  For sentiment we used 2 other existing datasets No emotion class! Cleaning – remove short tweets, hashtags only at the end, only english
  • #86: loading models, extracting features, making predictions
  • #88: Add income paper!
  • #89: For emotion detection, we focus on 6 Ekman’s emotions including joy, sadness, fear, surprise, anger and disgust. We lean the models using more advanced set of features in addition to lexical word ngram features such as: stylistic – elongated words, capitalization, mixed punctuation, hashtags and emoticons. We also take into account the clause-level negation.
  • #90: Report gender, political preference results for Volkova’s data (Zamal)
  • #91: Pessimists: Anger and Negative Optimists, Satisfied with Life: Joy and Positive Liberal: Disgust, Negative Conservative: Fear Older, No Kids: Sadness and Negative With Kids: Fear Female: Joy, Positive, Anger and Sadness Male: Neutral
  • #93: If you want user-based, just parse tsv files differently
  • #94: Tweet-based models for attribute predictions
  • #95: loading models, extracting features, making predictions
  • #96: Talk about the unbalanced priors and test/train class prior differences