From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
 Brendan O’Connor† Ramnath Balasubramanyan† Bryan R. Routledge§                                           Noah A. Smith†
brenocon@cs.cmu.edu                  rbalasub@cs.cmu.edu           routledge@cmu.edu                   nasmith@cs.cmu.edu

                       †                                                       §
                           School of Computer Science                          Tepper School of Business
                           Carnegie Mellon University                          Carnegie Mellon University


                              Abstract                             statistics derived from extremely simple text analysis tech-
                                                                   niques are demonstrated to correlate with polling data on
     We connect measures of public opinion measured from           consumer confidence and political opinion, and can also pre-
     polls with sentiment measured from text. We analyze
     several surveys on consumer confidence and political
                                                                   dict future movements in the polls. We find that temporal
     opinion over the 2008 to 2009 period, and find they            smoothing is a critically important issue to support a suc-
     correlate to sentiment word frequencies in contempora-        cessful model.
     neous Twitter messages. While our results vary across
     datasets, in several cases the correlations are as high as
     80%, and capture important large-scale trends. The re-                                      Data
     sults highlight the potential of text streams as a substi-
     tute and supplement for traditional polling.
                                                                   We begin by discussing the data used in this study: Twitter
                                                                   for the text data, and public opinion surveys from multiple
                                                                   polling organizations.
                           Introduction
If we want to know, say, the extent to which the U.S. pop-         Twitter Corpus
ulation likes or dislikes Barack Obama, an obvious thing to
do is to ask a random sample of people (i.e., poll). Survey        Twitter is a popular microblogging service in which users
and polling methodology, extensively developed through the         post messages that are very short: less than 140 characters,
20th century (Krosnick, Judd, and Wittenbrink 2005), gives         averaging 11 words per message. It is convenient for re-
numerous tools and techniques to accomplish representative         search because there are a very large number of messages,
public opinion measurement.                                        many of which are publicly available, and obtaining them
   With the dramatic rise of text-based social media, mil-         is technically simple compared to scraping blogs from the
lions of people broadcast their thoughts and opinions on a         web.
great variety of topics. Can we analyze publicly available            We use 1 billion Twitter messages posted over the years
data to infer population attitudes in the same manner that         2008 and 2009, collected by querying the Twitter API,1 as
public opinion pollsters query a population? If so, then min-      well as archiving the “Gardenhose” real-time stream. This
ing public opinion from freely available text content could        comprises a roughly uniform sample of public messages, in
be a faster and less expensive alternative to traditional polls.   the range of 100,000 to 7 million messages per day. (The
(A standard telephone poll of one thousand respondents eas-        primary source of variation is growth of Twitter itself; its
ily costs tens of thousands of dollars to run.) Such analysis      message volume increased by a factor of 50 over this two-
would also permit us to consider a greater variety of polling      year time period.)
questions, limited only by the scope of topics and opinions           Most Twitter users appear to live in the U.S., but we made
people broadcast. Extracting the public opinion from social        no systematic attempt to identify user locations or even mes-
media text provides a challenging and rich context to explore      sage language, though our analysis technique should largely
computational models of natural language, motivating new           ignore non-English messages.
research in computational linguistics.                                There probably exist many further issues with this text
   In this paper, we connect measures of public opinion            sample; for example, the demographics and communication
derived from polls with sentiment measured from analy-             habits of the Twitter user population probably changed over
sis of text from the popular microblogging site Twitter.           this time period, which should be adjusted for given our de-
We explicitly link measurement of textual sentiment in mi-         sire to measure attitudes in the general population. There
croblog messages through time, comparing to contempo-              are clear opportunities for better preprocessing and stratified
raneous polling data. In this preliminary work, summary            sampling to exploit these data.
Copyright c 2010, Association for the Advancement of Artificial
                                                                      1
Intelligence (www.aaai.org). All rights reserved.                         This scraping effort was conducted by Brendan Meeder.


Page 1 of 8
To appear in: Proceedings of the International AAAI Conference on Weblogs and Social Media, Washington, DC, May 2010.
Public Opinion Polls




                                                                                           −20
                                                                      Gallup Econ. Conf.
We consider several measures of consumer confidence and
political opinion, all obtained from telephone surveys to par-




                                                                                           −40
ticipants selected through random-digit dialing, a standard
technique in traditional polling (Chang and Krosnick 2003).
   Consumer confidence refers to how optimistic the pub-




                                                                                           −60
lic feels, collectively, about the health of the economy
and their personal finances. It is thought that high con-
sumer confidence leads to more consumer spending; this
                                                                                                 q
line of argument is often cited in the popular media and by




                                                                                           75
                                                                      Michigan ICS
policymakers (Greenspan 2002), and further relationships                                                                             Index
                                                                                                     q                           q                                 q
                                                                                                         q                                                     q
with economic activity have been studied (Ludvigson 2004;




                                                                                           65
                                                                                                                                                           q
Wilcox 2007). Knowing the public’s consumer confidence is                                                     q
                                                                                                                         q
                                                                                                                             q
                                                                                                                                                 q
                                                                                                                 q                           q
of great utility for economic policy making as well as busi-                                                                         q
                                                                                                                     q                               q q




                                                                                           55
ness planning.                                                                                                                           q

   Two well-known surveys that measure U.S. consumer




                                                                                                 2008−01
                                                                                                 2008−02
                                                                                                 2008−03
                                                                                                 2008−04
                                                                                                 2008−05
                                                                                                 2008−06
                                                                                                 2008−07
                                                                                                 2008−08
                                                                                                 2008−09
                                                                                                 2008−10
                                                                                                 2008−11
                                                                                                 2008−12
                                                                                                 2009−01
                                                                                                 2009−02
                                                                                                 2009−03
                                                                                                 2009−04
                                                                                                 2009−05
                                                                                                 2009−06
                                                                                                 2009−07
                                                                                                 2009−08
                                                                                                 2009−09
                                                                                                 2009−10
                                                                                                 2009−11
confidence are the Consumer Confidence Index from the
Consumer Board, and the Index of Consumer Sentiment
(ICS) from the Reuters/University of Michigan Surveys of
Consumers.2 We use the latter, as it is more extensively stud-
ied in economics, having been conducted since the 1950s.          Figure 1: Monthly Michigan ICS and daily Gallup consumer
The ICS is derived from answers to five questions adminis-         confidence poll.
tered monthly in telephone interviews with a nationally rep-
resentative sample of several hundred people; responses are
combined into the index score. Two of the questions, for          whether they would vote for Barack Obama or John McCain.
example, are:                                                     Many different organizations administered them throughout
                                                                  2008; we use a compilation provided by Pollster.com, con-
  “We are interested in how people are getting along fi-           sisting of 491 data points from 46 different polls.5 The data
  nancially these days. Would you say that you (and your          are shown in Figure 3.
  family living there) are better off or worse off finan-
  cially than you were a year ago?”
  “Now turning to business conditions in the country                                                         Text Analysis
  as a whole—do you think that during the next twelve
                                                                  From text, we are interested in assessing the population’s
  months we’ll have good times financially, or bad times,
                                                                  aggregate opinion on a topic. Immediately, the task can be
  or what?”
                                                                  broken down into two subproblems:
We also use another poll, the Gallup Organization’s “Eco-
nomic Confidence” index,3 which is derived from answers            1. Message retrieval: identify messages relating to the topic.
to two questions that ask interviewees to to rate the overall     2. Opinion estimation: determine whether these messages
economic health of the country. This only addresses a subset         express positive or negative opinions or news about the
of the issues that are incorporated into the ICS. We are inter-      topic.
ested in it because, unlike the ICS, it is administered daily
(reported as three-day rolling averages). Frequent polling        If there is enough training data, this could be formulated as
data are more convenient for our comparison purpose, since        a topic-sentiment model (Mei et al. 2007), in which the top-
we have fine-grained, daily Twitter data, but only over a two-     ics and sentiment of documents are jointly inferred. Our
year period. Both datasets are shown in Figure 1.                 dataset, however, is asymmetric, with millions of text mes-
   For political opinion, we use two sets of polls. The first is   sages per day (and millions of distinct vocabulary items) but
Gallup’s daily tracking poll for the presidential job approval    only a few hundred polling data points in each problem. It is
rating for Barack Obama over the course of 2009, which is         a challenging setting to estimate a useful model over the vo-
reported as 3-day rolling averages.4 These data are shown in      cabulary and messages. The signal-to-noise ratio is typical
Figure 2.                                                         of information retrieval problems: we are only interested in
   The second is a set of tracking polls during the 2008          information contained in a small fraction of all messages.
U.S. presidential election cycle, asking potential voters            We therefore opt to use a transparent, deterministic ap-
   2                                                              proach based on prior linguistic knowledge, counting in-
    Downloaded from http://www.sca.isr.umich.                     stances of positive-sentiment and negative-sentiment words
edu/.
  3                                                               in the context of a topic keyword.
    Downloaded from http://www.gallup.com/poll/
122840/gallup-daily-economic-indexes.aspx.
  4                                                                 5
    Downloaded from http://www.gallup.com/poll/                       Downloaded from http://www.pollster.com/
113980/Gallup-Daily-Obama-Job-Approval.aspx.                      polls/us/08-us-pres-ge-mvo.php


Page 2 of 8
0.012
Approve−Disapprove perc. diff.




                                                                                                                                                                                                                 0.10
                                                                                                                                                                                                  obama




                                                                                                                                                                                                                                                                     0.006
                                                                                                                                                                                                                                                              jobs
                                 60
                                 40




                                                                                                                                                                                                                                                                     0.000
                                                                                                                                                                                                                 0.00
                                                                                                                                                                                                                                     2008       2009   2010                  2008   2009   2010
                                 20




                                                                                                                                                                                                                 0.08
                                 0




                                                                                                                                                                                                                                                                     0.006
                                                                                                                                                                                                  mccain
                                      2009−01
                                                2009−02
                                                             2009−03
                                                                            2009−04
                                                                                      2009−05
                                                                                                2009−06
                                                                                                             2009−07
                                                                                                                            2009−08
                                                                                                                                       2009−09
                                                                                                                                                 2009−10
                                                                                                                                                              2009−11
                                                                                                                                                                              2009−12




                                                                                                                                                                                                                 0.04




                                                                                                                                                                                                                                                              job

                                                                                                                                                                                                                                                                     0.002
                                                                                                                                                                                                                 0.00
Figure 2: 2009 presidential job approval (Barack Obama).                                                                                                                                                                             2008       2009   2010                  2008   2009   2010




                                                                                                                                                                                                                 0.000 0.002 0.004
                                                                                                                                                                                                  economy
Percent Support for Candidate




                                                                                                                                                                      q
                                                                                                                                            q                   q     q
                                                     q                                                                                                  q q q qq q
                                                                                                                                                            q         q
                                                                                                                                                                      q
                                        q                                                                                                         q     q qq q qq qq
                                                                                                                                                          qq q q q q
                                                                                                                                                                q
                                 50




                                                  q            q                    q                 q                 qq               q        q qqqq qqq qq qq
                                                                                                                                                         qq    q qq q q
                                           q    qq                           q q                 qq q      q     q                         q q q qq qq q qq qqqq
                                                                                                                                                     qq
                                                                                                                                                      q     q q q   q
                                            q               q q q
                                                              q        q q     q     q        q q qq qqqq q q
                                                                                                        q             qq
                                                                                                                       q            q qq q q q q q qq q q q q
                                                                                                                                        qq
                                                                                                                                        q       q     q q q q q q
                                       qq q q q q  q        qq q q      qq     q qq q
                                                                                q           q q q qqq      q            q     qq    q q qqq q qqq qqqqq q
                                                                                                                                     q     qq q qq      q q q qq     q
                                       q      qqq q q q
                                                   q             q q q qq qq q qqq q qq q q q
                                                                                 q qq      q        q    q qq q qq qq q qqq qqq qqqq qqqqqq qq q
                                                                                                                  q qq q qq   q   q         q qq qq q
                                                                                                                                               qq qq            q qq
                                                                                                                                                                   q
                                         q q q q q q q qq q q q q q q
                                                      qq    q     q          qq q q q
                                                                             qq q q q q q q qq qq q
                                                                              q           q     q              qq q
                                                                                                                 q      qqqqqqq qq q q qqqq qq q qq qq q qq
                                                                                                                            q q qq
                                                                                                                            q    q         qq q qqq
                                                                                                                                             q                  q    q
                                       q        q      qq q
                                                        q      q    qqq q q qq q
                                                                          q q
                                                                           q          q q q qq  q        qq     q q qq qqqq
                                                                                                                    qq           qq q q q qqqqqq qq q qq qqq qq
                                                                                                                                  q q q qqqq q q q qq q q q
                                                                                                                                   qq          qq q q q               q
                                        q q
                                         q      q q q q qqq q qq q qqq q q
                                                q q      q
                                                         q    q q q q
                                                                q    q q q q       qq q q q qq q q q q q q q
                                                                                         qq q q q               q        q qq q q q
                                                                                                                         q            q q qq q qq qqq
                                                                                                                                              q q qq qq       qq qqqq
                                                                                                                                                              qq qq q
                                                                                                                                                               q q
                                       qq qqqqq q qq
                                                qq           q      q qq      qq q
                                                                              qq q
                                                                               qq        q   q qqqqq qqq q qqq q q
                                                                                                              q        q      q q
                                                                                                                               q q       q q q qq q q qq qq q q
                                                                                                                                             q    qq
                                                                                                                                                   q     q qq qq q q
                                                                                                                                                                q q
                                           q      q q       q q q
                                                                q
                                                                q       q q qq        q     q q q q qq q
                                                                                                   qq                q q q q qqqq
                                                                                                                              q   q
                                                                                                                                  q       q q q q qqqq q qq qq q
                                                                                                                                                  q q q     q q q
                                 40




                                        q
                                           q q
                                            q
                                              q
                                                  q
                                                           q
                                                              qq q
                                                               q
                                                               q
                                                                      q q       q
                                                                                   qq q
                                                                                  q qq
                                                                                         q
                                                                                            q
                                                                                                 q
                                                                                                 q q
                                                                                                     q
                                                                                                           q
                                                                                                                qq q qq qq qq
                                                                                                                qq
                                                                                                               q q
                                                                                                                      qq
                                                                                                                        q q  q
                                                                                                                                q
                                                                                                                                   q
                                                                                                                                      qq
                                                                                                                                     q q qq q
                                                                                                                                        qq
                                                                                                                                           qq
                                                                                                                                             q q
                                                                                                                                                   q
                                                                                                                                                      qq q qq q q
                                                                                                                                                          q qq q q
                                                                                                                                                       qq q
                                                                                                                                                         q q qq
                                                                                                                                                          q q qq
                                                                                                                                                                 q
                                                                                                                                                                                                                                     2008       2009   2010
                                         q        q q                           q           q              q   q      q      qq    q                              qq
                                                                                       q          q q q q q           q q
                                                                                                      qqq          q                  q    q
                                                                      qq                q               q       q
                                                                       q                             q q q
                                                               q                                  q
                                                                                                   q
                                 30




                                                                                                                                           q
                                                                                                                                                                                                  Figure 4: Fraction of Twitter messages containing various
                                                                                                                                                                                                  topic keywords, per day.
                                      2008−02

                                                   2008−03

                                                                       2008−04

                                                                                      2008−05

                                                                                                   2008−06

                                                                                                                       2008−07

                                                                                                                                      2008−08

                                                                                                                                                    2008−09

                                                                                                                                                                        2008−10

                                                                                                                                                                                        2008−11




                                                                                                                                                                                                  Opinion Estimation
                                                                                                                                                                                                  We derive day-to-day sentiment scores by counting positive
                                                                                                                                                                                                  and negative messages. Positive and negative words are de-
Figure 3: 2008 presidential elections, Obama vs. McCain
                                                                                                                                                                                                  fined by the subjectivity lexicon from OpinionFinder, a word
(blue and red). Each poll provides separate Obama and Mc-
                                                                                                                                                                                                  list containing about 1,600 and 1,200 words marked as pos-
Cain percentages (one blue and one red point); lines are 7-
                                                                                                                                                                                                  itive and negative, respectively (Wilson, Wiebe, and Hoff-
day rolling averages.
                                                                                                                                                                                                  mann 2005).6 We do not use the lexicon’s distinctions be-
                                                                                                                                                                                                  tween weak and strong words.
Message Retrieval                                                                                                                                                                                    A message is defined as positive if it contains any positive
                                                                                                                                                                                                  word, and negative if it contains any negative word. (This
We only use messages containing a topic keyword, manually                                                                                                                                         allows for messages to be both positive and negative.) This
specified for each poll:                                                                                                                                                                           gives similar results as simply counting positive and negative
                                                                                                                                                                                                  words on a given day, since Twitter messages are so short
• For consumer confidence, we use economy, job, and jobs.                                                                                                                                          (about 11 words).
                                                                                                                                                                                                     We define the sentiment score xt on day t as the ratio
• For presidential approval, we use obama.                                                                                                                                                        of positive versus negative messages on the topic, counting
                                                                                                                                                                                                  from that day’s messages:
• For elections, we use obama and mccain.
                                                                                                                                                                                                                                                 countt (pos. word ∧ topic word)
   Each topic subset contained around 0.1–0.5% of all mes-                                                                                                                                                                           xt     =                                              (1)
                                                                                                                                                                                                                                                 countt (neg. word ∧ topic word)
sages on a given day, though with occasional spikes, as seen
                                                                                                                                                                                                                                                 p(pos. word | topic word, t)
in Figure 4. These appear to be driven by news events. All                                                                                                                                                                                  =
terms have a weekly cyclical structure, occurring more fre-                                                                                                                                                                                      p(neg. word | topic word, t)
quently on weekdays, especially in the middle of the week,                                                                                                                                        where the likelihoods are estimated as relative frequencies.
compared to weekends. (In the figure, this is most appar-                                                                                                                                            We performed casual inspection of the detected messages
ent for the term job since it has fewer spikes.) Nonetheless,                                                                                                                                     and found many examples of falsely detected sentiment. For
these fractions are small. In the earliest and smallest part of                                                                                                                                   example, the lexicon has the noun will as a weak positive
our dataset, the topic samples sometimes come out just sev-                                                                                                                                       word, but since we do not use a part-of-speech tagger, this
eral hundred messages per day; but by late 2008, there are
                                                                                                                                                                                                            6
thousands of messages per day for most datasets.                                                                                                                                                                Available at http://www.cs.pitt.edu/mpqa.


Page 3 of 8
5
causes thousands of false positives when it matches the verb
sense of will.7 Furthermore, recall is certainly very low,




                                                                                                4
since the lexicon is designed for well-written standard En-




                                                                         Sentiment Ratio
glish, but many messages on Twitter are written in an infor-




                                                                                                3
mal social media dialect of English, with different and al-
ternately spelled words, and emoticons as potentially useful




                                                                                                2
signals. Creating a more comprehensive lexicon with dis-
tributional similarity techniques could improve the system;
Velikovich et al. (2010) find that such a web-derived lexicon




                                                                                                1
substantially improves a lexicon-based sentiment classifier.




                                                                                                0
Comparison to Related Work
The sentiment analysis literature often focuses on analyzing




                                                                                                     2008−01
                                                                                                     2008−02
                                                                                                     2008−03
                                                                                                     2008−04
                                                                                                     2008−05
                                                                                                     2008−06
                                                                                                     2008−07
                                                                                                     2008−08
                                                                                                     2008−09
                                                                                                     2008−10
                                                                                                     2008−11
                                                                                                     2008−12
                                                                                                     2009−01
                                                                                                     2009−02
                                                                                                     2009−03
                                                                                                     2009−04
                                                                                                     2009−05
                                                                                                     2009−06
                                                                                                     2009−07
                                                                                                     2009−08
                                                                                                     2009−09
                                                                                                     2009−10
                                                                                                     2009−11
individual documents, or portions thereof (for a review, see
Pang and Lee, 2008). Our problem is related to work on sen-
timent information retrieval, such as the TREC Blog Track
competitions that have challenged systems to find and clas-
sify blog posts containing opinions on a given topic (Ounis,             Figure 5: Moving average MAt of sentiment ratio for jobs,
MacDonald, and Soboroff 2008).                                           under different windows k ∈ {1, 7, 30}: no smoothing
   The sentiment feature we consider, presence or absence                (gray), past week (magenta), and past month (blue). The
of sentiment words in a message, is one of the most basic                unsmoothed version spikes as high as 10, omitted for space.
ones used in the literature. If we view this system in the
traditional light—as subjectivity and polarity detection for                                    Moving Average Aggregate Sentiment
individual messages—it makes many errors, like all natural               Day-to-day, the sentiment ratio is volatile, much more than
language processing systems. However, we are only inter-                 most polls.9 Just like in the topic volume plots (Figure 4),
ested in aggregate sentiment. A high error rate merely im-               the sentiment ratio rapidly rises and falls each day. In order
plies the sentiment detector is a noisy measurement instru-              to derive a more consistent signal, and following the same
ment. With a fairly large number of measurements, these                  methodology used in public opinion polling, we smooth the
errors will cancel out relative to the quantity we are inter-            sentiment ratio with one of the simplest possible temporal
ested in estimating, aggregate public opinion.8 Furthermore,             smoothing techniques, a moving average over a window of
as Hopkins and King (2010) demonstrate, it can actually be               the past k days:
inaccurate to na¨vely use standard text analysis techniques,
                 ı
which are usually designed to optimize per-document classi-                                                 1
fication accuracy, when the goal is to assess aggregate pop-                                         MAt =     (xt−k+1 + xt−k+2 + ... + xt )
                                                                                                            k
ulation proportions.
   Several prior studies have estimated and made use of ag-              Smoothing is a critical issue. It causes the sentiment ratio
gregated text sentiment. The informal study by Lindsay                   to respond more slowly to recent changes, thus forcing con-
(2008) focuses on lexical induction in building a sentiment              sistent behavior to appear over longer periods of time. Too
classifier for a proprietary dataset of Facebook wall posts               much smoothing, of course, makes it impossible to see fine-
(a web conversation/microblog medium broadly similar to                  grained changes to aggregate sentiment. See Figure 5 for
Twitter), and demonstrates correlations to several polls con-            an illustration of different smoothing windows for the jobs
ducted during part of the 2008 presidential election. We are             topic.
unaware of other research validating text analysis against
traditional opinion polls, though a number of companies of-                                    Correlation Analysis: Is text sentiment a
fer text sentiment analysis basically for this purpose (e.g.,                                         leading indicator of polls?
Nielsen Buzzmetrics). There are at least several other stud-
ies that use time series of either aggregate text sentiment or           Figure 6 shows the jobs sentiment ratio compared to the two
good vs. bad news, including analyzing stock behavior based              different measures of consumer confidence, Gallup Daily
on text from blogs (Gilbert and Karahalios 2010), news arti-             and Michigan ICS. It is apparent that the sentiment ratio
cles (Lavrenko et al. 2000; Koppel and Shtrimberg 2004)                  captures the broad trends in the survey data. With 15-
and investor message boards (Antweiler and Frank 2004;                   day smoothing, it is reasonably correlated with Gallup at
Das and Chen 2007). Dodds and Danforth (2009) use an                     r = 73.1%. The most glaring difference is a region of
emotion word counting technique for purely exploratory                   high positive sentiment in May-June 2008. But otherwise,
analysis of several corpora.                                             the sentiment ratio seems to pick up on the downward slide
                                                                         of consumer confidence through 2008, and the rebound in
    7
      We tried manually removing this and several other frequently       February/March of 2009.
mismatching words, but it had little effect.
    8                                                                                      9
      There is an issue if errors correlate with variables relevant to        That the reported poll results are less volatile does not imply
public opinion; for example, if certain demographics speak in di-        that they are more accurate reflections of true population opinion
alects that are harder to analyze.                                       than the text.


Page 4 of 8
4.0




                                                                                       0.9
                                     k=15, lead=0                                                                           Text leads poll
                                     k=30, lead=50                                             Poll leads text
                               3.5




                                                                                       0.8
  Sentiment Ratio




                                                                                                                       qqqqq qqq
                                                                                                                         qq
                                                                                                                      qqqqqqq qqqq
                                                                                                                        qqq q qq
                                                                                                                     qqqq q qqqqqqq
                                                                                                                    qqq
                                                                                                                    qqq




                                                                Corr. against Gallup
                                                                                                                    qq       qq q
                                                                                                                              q
                                                                                                                              q
                               3.0




                                                                                                                  qq
                                                                                                                   q
                                                                                                                   q              q
                                                                                                                                  q
                                                                                                                  q
                                                                                                                  q                q
                                                                                                                                   q
                                                                                                                                   qq
                                                                                                                 q                  q




                                                                                       0.7
                                                                                                                 q
                                                                                                                 q                  qqq
                                                                                                                                     qqq
                                                                                                                q
                                                                                                                q
                                                                                                                q                     qq
                                                                                                                                       qq
                                                                                                               q
                                                                                                               q
                                                                                                               q                        q
                                                                                                                                        qq
                                                                                                                                         qq
                                                                                                              q
                                                                                                              q                          qq
                                                                                                                                          q
                                                                                                             qq                            q
                                                                                                                                           qq
                                                                                                             q                              q
                                                                                                                                            qq
                               2.5




                                                                                                            q
                                                                                                            qq                               qq
                                                                                                                                             qq q
                                                                                                                                              qq q
                                                                                                           q
                                                                                                           qq                                  qq q
                                                                                                                                               qq q
                                                                                                                                                qq q
                                                                                                                                                qq q
                                                                                                          q
                                                                                                          q
                                                                                                          qq                                       qq
                                                                                                                                                    q




                                                                                       0.6
                                                                                                         q
                                                                                                         q
                                                                                                         q                                          qq
                                                                                                        q
                                                                                                        q                                            q
                                                                                                       q
                                                                                                       qq                                            qq
                                                                                                      qq                                              q
                                                                                                      q                                               q
                               2.0




                                                                                                     qq                                                q
                                                                                                                                                       q
                                                                                                    qq
                                                                                                     q                                                 q
                                                                                                   qq
                                                                                                    q                                                   q
                                                                                                                                                        q
                                                                                                   q
                                                                                                   q




                                                                                       0.5
                                                                                                  q
                                                                                                  q
                                                                                                qq
                                                                                                qq
                                                                                                qq
                                                                                                 q
                                                                                               q
                                                                                               q
                                                                                               q
                               1.5




                                                                                                                           k=30




                                                                                       0.4
                                                                                                                       q   k=15
                                                                                                                           k=7
                               −20
  Gallup Economic Confidence




                                                     Index
                                                                                              −90        −50          −10          30 50 70 90
                               −30




                                                                                                               Text lead / poll lag
                               −40
                               −50




                                                                                       0.8
                               −60




                                                                                       0.6
                                                                Corr. against ICS

                                                                                       0.4




                                                     Index
                               75




                                                                                       0.2
  Michigan ICS

                               70




                                                                                       0.0
                               65




                                                                                                                           k=30
                               60




                                                                                       −0.2




                                                                                                                           k=60
                               55




                                                                                              −90        −50          −10          30 50 70 90
                                     2008−01
                                     2008−02
                                     2008−03
                                     2008−04
                                     2008−05
                                     2008−06
                                     2008−07
                                     2008−08
                                     2008−09
                                     2008−10
                                     2008−11
                                     2008−12
                                     2009−01
                                     2009−02
                                     2009−03
                                     2009−04
                                     2009−05
                                     2009−06
                                     2009−07
                                     2009−08
                                     2009−09
                                     2009−10
                                     2009−11




                                                                                                               Text lead / poll lag


                                                             Figure 7: Cross-correlation plots: sensitivity to lead and lag
Figure 6: Sentiment ratio and consumer confidence surveys.    for different smoothing windows. L > 0 means the text
Sentiment information captures broad trends in the survey    window completely precedes the poll, and L < −k means
data.                                                        the poll precedes the text. (The window straddles the poll
                                                             for L < −k < 0.) The L = −k positions are marked on
                                                             each curve. The two parameter settings shown in Figure 6
                                                             are highlighted with boxes.




Page 5 of 8
When consumer confidence changes, can this first be seen          and Gallup are correlated (best correlation is r = 86.4%
in the text sentiment measure, or in polls? If text sentiment      if Gallup is given its own smoothing and alignment at k =
responds faster to news events, a sentiment measure may be         30, L = 20), which supports the hypothesis that they are
useful for economic researchers and policymakers. We can           measuring similar things, and that Gallup is a leading in-
test this by looking at leading versions of text sentiment.        dicator for ICS. Fixed to 30-day smoothing, the sentiment
   First note that the text-poll correlation reported above is     ratio only achieves r = 63.5% under optimal lead L = 50.
the goodness-of-fit metric for fitting slope and bias parame-        So it is a weaker indicator than Gallup.
ters a, b in a one variable linear least-squares model:               Finally, we also experimented with sentiment ratios for
                               k−1
                                                                   the terms job and economy, which both correlate very poorly
                                                                   with the Gallup poll: 10% and 7% respectively (with the
                  yt = b + a         xt−j +    t
                                                                   default k = 15, L = 0).10
                               j=0
                                                                      This is a cautionary note on the common practice of stem-
for poll outcomes yt , daily sentiment ratios xj , Gaussian        ming words, which in information retrieval can have mixed
noise t , and a fixed hyperparameter k. A poll outcome is           effects on performance (Manning, Raghavan, and Sch¨ tze  u
compared to the k-day text sentiment window that ends on           2008, ch. 2). Here, stemming would have conflated job and
the same day as the poll.                                          jobs, severely degrading results.
   We introduce a lag hyperparameter L into the model, so
the poll is compared against the text window ending L days         Forecasting Analysis
before the poll outcome.                                           As a further validation, we can evaluate the model in a
                                k−1
                                                                   rolling forecast setting, by testing how well the text-based
                                                                   model can predict future values of the poll. For a lag L,
                 yt+L = b + a         xt−j +       t
                                                                   and a target forecast date t + L, we train the model only on
                                j=0
                                                                   historical data through day t − 1, then predict using the win-
Graphically, this is equivalent to taking one of the text senti-   dow ending on day t. The lag parameter L is how many days
ment lines on Figure 6 and shifting it to the right by L days,     in the future the forecasts are for. We repeat this model fit
then examining the correlation against the consumer confi-          and prediction procedure for most days. (We cannot forecast
dence polls below.                                                 early days in the data, since L + k initial days are necessary
   Polls are typically administered over an interval. The ICS      to cover the start of the text sentiment window, plus at least
is reported once per month (at the end of the month), and          several days for training.)
Gallup is reported for 3-day windows. We always consider
the last day of the poll’s window to be the poll date, which is
                                                                                                   −20
                                                                   Gallup Economic Confidence




                                                                                                           Gallup poll
the earliest possible day that the information could actually                                              Text forecasts, lead=30
be used. Therefore, we would expect both daily measures,
                                                                                                   −30




                                                                                                           Poll self−forecasts, lead=30
Gallup and text sentiment, to always lead ICS, since it mea-
sures phenomena occurring over the previous month.
                                                                                                   −40




   The sensitivity of text-poll correlation to smoothing win-
dow and lag parameters (k, L) is shown in Figure 7. The re-
                                                                                                   −50




gions corresponding to text preceding or following the poll
are marked. Correlation is higher for text leading the poll
                                                                                                   −60




and not the other way around, so text seems to be a leading
indicator. Gallup correlations fall off faster for poll-leads-
text than text-leads-poll, and the ICS has similar properties.
                                                                                                   20
                                                                   Text coef.




   If text and polls moved at random relative to each other,
these cross-correlation curves would stay close to 0. The
                                                                                                   5




fact they have peaks at all strongly suggests that the text sen-
                                                                                                   −10




timent measure captures information related to the polls.
   Also note that more smoothing increases the correlation:
                                                                                                         2008−01
                                                                                                         2008−02
                                                                                                         2008−03
                                                                                                         2008−04
                                                                                                         2008−05
                                                                                                         2008−06
                                                                                                         2008−07
                                                                                                         2008−08
                                                                                                         2008−09
                                                                                                         2008−10
                                                                                                         2008−11
                                                                                                         2008−12
                                                                                                         2009−01
                                                                                                         2009−02
                                                                                                         2009−03
                                                                                                         2009−04
                                                                                                         2009−05
                                                                                                         2009−06
                                                                                                         2009−07
                                                                                                         2009−08
                                                                                                         2009−09
                                                                                                         2009−10
                                                                                                         2009−11

for Gallup, 7-, 15-, and 30-day windows peak at r = 71.6%,
76.3%, and 79.4% respectively. The 7-day and 15-day win-
dows have two local peaks for correlation, corresponding to
shifts that give alternate alignments of two different humps
against the Gallup data, but the better-correlating 30-day         Figure 8: Rolling text-based forecasts (above), and the text
window smooths over these entirely. Furthermore, for the           sentiment (MAt ) coefficients a for each of the text forecast-
ICS, a 60-day window often achieves higher correlation than        ing models over time (below).
the 30-day window. These facts imply that the text sentiment                                    Results are shown in Figure 8. Forecasts for one month in
information is volatile, and if polls are believed to be a gold
standard, then it is best used to detect long-term trends.                                  10
                                                                       We inspected some of the matching messages to try to under-
   It is also interesting to consider ICS a gold standard and      stand this result, but since the sentiment detector is very noisy at the
compare correlations with Gallup and text sentiment. ICS           message level, it was difficult to understand what was happening.


Page 6 of 8
with "obama"Sentiment Ratio for "obama"
the future (that is, using past text from 44 through 30 days




                                                                                                                                     5
before the target date) achieve 77.5% correlation. This is
slightly worse than a baseline to predict the poll from its




                                                                                                                                     4
lagged self (yt+L ≈ b0 + b1 yt ), which has r = 80.4%.




                                                                                                                                     3
Adding the sentiment score to historical poll information as
a bivariate model (yt+L ≈ b0 + b1 yt + aMAt..t−k+1 ), yields




                                                                                                                                     2
a very small improvement (r = 81.0%).
   Inspecting the rolling forecasts and text model coefficient




                                                                                                                                     1
a is revealing. In 2008 and early 2009, text sentiment is




                                                                                          Frac. Messages


                                                                                                                                     0.15
a poor predictor of consumer confidence; for example, it
fails to reflect a hump in the polls in August and Septem-
ber 2008. The model learns a coefficient near zero (even




                                                                                                                                     0.00
negative), and makes predictions similar to the poll’s self-




                                                                  % Support Obama (Election)
predictions, which is possible since the poll’s most recent




                                                                                                                                                               % Pres. Job Approval
                                                                                                                                     55




                                                                                                                                                          70
values are absorbed into the bias term of the text-only model.
However, starting in mid-2009, text sentiment becomes a




                                                                                                                                     50




                                                                                                                                                          60
much better predictor, as it captures the general rise in con-
sumer confidence starting then (see Figure 6). This sug-




                                                                                                                                     45




                                                                                                                                                          50
gests qualitatively different phenomena are being captured
by the text sentiment measure at different times. From the




                                                                                                                                     40




                                                                                                                                                          40
perspective of time series modeling, future work should in-
vestigate techniques for deciding the importance of different




                                                                                                                                            2008−01
                                                                                                                                            2008−02
                                                                                                                                            2008−03
                                                                                                                                            2008−04
                                                                                                                                            2008−05
                                                                                                                                            2008−06
                                                                                                                                            2008−07
                                                                                                                                            2008−08
                                                                                                                                            2008−09
                                                                                                                                            2008−10
                                                                                                                                            2008−11
                                                                                                                                            2008−12
                                                                                                                                            2009−01
                                                                                                                                            2009−02
                                                                                                                                            2009−03
                                                                                                                                            2009−04
                                                                                                                                            2009−05
                                                                                                                                            2009−06
                                                                                                                                            2009−07
                                                                                                                                            2009−08
                                                                                                                                            2009−09
                                                                                                                                            2009−10
                                                                                                                                            2009−11
                                                                                                                                            2009−12
historical signals and time periods, such as vector autore-
gressions (e.g. Hamilton 1994).
   It is possible that the effectiveness of text changes over
this time period for reasons described earlier: Twitter itself
changed substantially over this time period. In 2008, the site    Figure 9: The sentiment ratio for obama (15-day window),
had far fewer users who were probably less representative         and fraction of all Twitter messages containing obama (day-
of the general population, and were using the site differently    by-day, no smoothing), compared to election polls (2008)
than users would later.                                           and job approval polls (2009).

Obama 2009 Job Approval and 2008 Elections                           We also found that the topic frequencies correlate with
We analyze the sentiment ratio for obama and compared             polls much more than the sentiment scores. First note that
it to two series of polls, presidential job approval in 2009,     the message volume for obama, shown in Figure 9, has the
and presidential election polls in 2008, as seen in Figure 9.     usual daily spikes like other words on Twitter shown in Fig-
The job approval poll is the most straightforward, being a        ure 4. Some of these spikes are very dramatic; for example,
steady decline since the start of the Obama presidency, per-      on November 5th, nearly 15% of all Twitter messages (in
haps with some stabilization in September or so. The sen-         our sample) mentioned the word obama.
timent ratio also generally declines during this period, with        Furthermore, the obama message volume substantially
r = 72.5% for k = 15.                                             correlates to the poll numbers. Even the raw volume has a
   However, in 2008 the sentiment ratio does not substan-         52% correlation to the polls, and the 15-day window version
tially correlate to the election polls (r = −8%); we compare      is up to r = 79%. Simple attention seems to be associated
to the percent of support for Obama, averaged over a 7-day        with popularity, at least for Obama. But the converse is not
window of tracking polls: the same information displayed          true for mccain; this word’s 15-day message volume also
in Figure 3). Lindsay (2008) found that his daily senti-          correlates to higher Obama ratings in the polls (r = 74%).
ment score was a leading indicator to one particular tracking     A simple explanation may be that frequencies of either term
poll (Rasmussen) over a 100-day period from June-October          mccain or obama are general indicators of elections news
2008. Our measure also roughly correlates to the same data,       and events, and most 2008 elections news and events were
though less strongly (r = 44% vs. r = 57%) and only at            favorable toward or good for Obama. Certainly, topic fre-
different lag parameters.                                         quency may not have a straightforward relationship to pub-
   The elections setting may be structurally more complex         lic opinion in a more general text-driven methodology for
than presidential job approval. In many of the tracking polls,    public opinion measurement, but given the marked effects it
people can choose to answer any Obama, McCain, unde-              has in these data, it is worthy of further exploration.
cided, not planning to vote, and third-party candidates. Fur-
thermore, the name of every candidate has its own sentiment                                                                                  Conclusion
ratio scores in the data. We might expect the sentiment for       In the paper we find that a relatively simple sentiment de-
mccain to be vary inversely with obama, but they in fact          tector based on Twitter data replicates consumer confidence
slightly correlate. It is also unclear how they should interact   and presidential job approval polls. While the results do not
as part of a model of voter preferences.                          come without caution, it is encouraging that expensive and

Page 7 of 8
time-intensive polling can be supplemented or supplanted         Gilbert, E., and Karahalios, K. 2010. Widespread worry
with the simple-to-gather text data that is generated from       and the stock market. In Proceedings of the International
online social networking. The results suggest that more ad-      Conference on Weblogs and Social Media.
vanced NLP techniques to improve opinion estimation may          Greenspan, A. 2002. Remarks at the Bay Area coun-
be very useful.                                                  cil conference, San Francisco, California.          http:
   The textual analysis could be substantially improved. Be-     //www.federalreserve.gov/boarddocs/
sides the clear need for a more well-suited lexicon, the         speeches/2002/20020111/default.htm.
modes of communication should be considered. When mes-
                                                                 Hamilton, J. D. 1994. Time Series Analysis. Princeton
sages are retweets (forwarded messages), should they be
                                                                 University Press.
counted? What about news headlines? Note that Twitter is
rapidly changing, and the experiments on recent (2009) data      Hopkins, D., and King, G. 2010. A method of automated
performed best, which suggests that it is evolving in a direc-   nonparametric content analysis for social science. Ameri-
tion compatible with our approach, which uses no Twitter-        can Journal of Political Science 54(1):229–247.
specific features at all.                                         Koppel, M., and Shtrimberg, I. 2004. Good news or bad
   In this work, we treat polls as a gold standard. Of course,   news? Let the market decide. In AAAI Spring Symposium
they are noisy indicators of the truth — as is evident in Fig-   on Exploring Attitude and Affect in Text: Theories and Ap-
ure 3 — just like extracted textual signals. Future work         plications.
should seek to understand how these different signals reflect     Krosnick, J. A.; Judd, C. M.; and Wittenbrink, B. 2005.
public opinion either as a hidden variable, or as measured       The measurement of attitudes. The Handbook of Attitudes
from more reliable sources like face-to-face interviews.         2176.
   Many techniques from traditional survey methodology           Lavrenko, V.; Schmill, M.; Lawrie, D.; Ogilvie, P.; Jensen,
can also be used again for automatic opinion measurement.        D.; and Allan, J. 2000. Mining of concurrent text and time
For example, polls routinely use stratified sampling and          series. In Proceedings of the 6th ACM SIGKDD Int’l Con-
weighted designs to ask questions of a representative sam-       ference on Knowledge Discovery and Data Mining Work-
ple of the population. Given that many social media sites        shop on Text Mining.
include user demographic information, such a design is a
sensible next step.                                              Lindsay, R.      2008.     Predicting polls with Lexicon.
   Eventually, we see this research progressing to align with    http://languagewrong.tumblr.com/post/
the more general goal of query-driven sentiment analysis         55722687/predicting-polls-with-lexicon.
where one can ask more varied questions of what people are       Ludvigson, S. C. 2004. Consumer confidence and con-
thinking based on text they are already writing. Modeling        sumer spending. The Journal of Economic Perspectives
traditional survey data is a useful application of sentiment     18(2):29–50.
analysis. But it is also a stepping stone toward larger and      Manning, C. D.; Raghavan, P.; and Sch¨ tze, H. 2008. In-
                                                                                                         u
more sophisticated applications.                                 troduction to Information Retrieval. Cambridge University
                                                                 Press, 1st edition.
                  Acknowledgments                                Mei, Q.; Ling, X.; Wondra, M.; Su, H.; and Zhai, C. X.
This work is supported by the Center for Applied Research        2007. Topic sentiment mixture: modeling facets and opin-
in Technology at the Tepper School of Business, and the          ions in weblogs. In Proceedings of the 16th International
Berkman Faculty Development Fund at Carnegie Mellon              conference on World Wide Web.
University. We would like to thank the reviewers for help-       Ounis, I.; MacDonald, C.; and Soboroff, I. 2008. On the
ful suggestions, Charles Franklin for advice in interpreting     TREC blog track. In Proceedings of the International Con-
election polling data, and Brendan Meeder for contribution       ference on Weblogs and Social Media.
of the Twitter scrape.
                                                                 Pang, B., and Lee, L. 2008. Opinion Mining and Sentiment
                       References                                Analysis. Now Publishers Inc.
 Antweiler, W., and Frank, M. Z. 2004. Is all that talk just     Velikovich, L.; Blair-Goldensohn, S.; Hannan, K.; and Mc-
 noise? the information content of internet stock message        Donald, R. 2010. The viability of web-dervied polarity
 boards. Journal of Finance 59(3):1259–1294.                     lexicons. In Proceedings of Human Language Technolo-
                                                                 gies: The 11th Annual Conference of the North American
 Chang, L. C., and Krosnick, J. A. 2003. National surveys        Chapter of the Association for Computational Linguistics.
 via RDD telephone interviewing vs. the internet: Compar-
 ing sample representativeness and response quality. Un-         Wilcox, J. 2007. Forecasting components of consumption
 published manuscript.                                           with components of consumer sentiment. Business Eco-
                                                                 nomics 42(4):2232.
 Das, S. R., and Chen, M. Y. 2007. Yahoo! for Amazon:
 Sentiment extraction from small talk on the web. Manage-        Wilson, T.; Wiebe, J.; and Hoffmann, P. 2005. Recog-
 ment Science 53(9):1375–1388.                                   nizing contextual polarity in phrase-level sentiment analy-
                                                                 sis. In Proceedings of the Conference on Human Language
 Dodds, P. S., and Danforth, C. M. 2009. Measuring the           Technology and Empirical Methods in Natural Language
 happiness of Large-Scale written expression: Songs, blogs,      Processing.
 and presidents. Journal of Happiness Studies 116.

Page 8 of 8

More Related Content

PPTX
Language change
PDF
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
PPTX
The influence of social media advertising on consumer.pptx
PDF
The Relevance of content analysis to the media
PPTX
Unit 7 perspectives on development
PDF
New Ways to Fund Media
PPTX
Unit 2 Human Development and Capability
PPTX
online media impact
Language change
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
The influence of social media advertising on consumer.pptx
The Relevance of content analysis to the media
Unit 7 perspectives on development
New Ways to Fund Media
Unit 2 Human Development and Capability
online media impact

Similar to From Tweets To Polls Linking Text Sentiment To Public Opinion Time Series (20)

PDF
487816 959(1)
PPTX
informs2024. Modern Statistics in Social Media Analysis. Inferring Unusual ...
PDF
Knowing your public
PDF
Measuring User Influence in Twitter
DOCX
INTS PAPER FINAL FINAL
PDF
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
PPTX
Burke Connecting The Dots Measuing Behavior Change With Digital Media
PDF
Principles of Persuasive Argumentation/ CVSuite Webinar #2 in series Data-dri...
PPTX
Convergence, Computation and Continuity: Challenges for PR in the 21st Century
PPSX
PPT
Participatory Policy Making
PDF
Understanding The Environment Of Demographics
PDF
Community Service Essay Ideas
PPTX
Immaa msu presentation flew
PDF
Future of cities: science of cities
PDF
Sma for national_security
PDF
Dr. Saleh Al - Najem - Social Media Ananlytics for national_security
PDF
Audience Genre Expectations In The Age Of Digital Media Leo W Jeffres
DOCX
Add a section to the paper you submittedIt is based on the paper (.docx
PDF
AFEL: Imbalance of Tag Recommendations
487816 959(1)
informs2024. Modern Statistics in Social Media Analysis. Inferring Unusual ...
Knowing your public
Measuring User Influence in Twitter
INTS PAPER FINAL FINAL
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
Burke Connecting The Dots Measuing Behavior Change With Digital Media
Principles of Persuasive Argumentation/ CVSuite Webinar #2 in series Data-dri...
Convergence, Computation and Continuity: Challenges for PR in the 21st Century
Participatory Policy Making
Understanding The Environment Of Demographics
Community Service Essay Ideas
Immaa msu presentation flew
Future of cities: science of cities
Sma for national_security
Dr. Saleh Al - Najem - Social Media Ananlytics for national_security
Audience Genre Expectations In The Age Of Digital Media Leo W Jeffres
Add a section to the paper you submittedIt is based on the paper (.docx
AFEL: Imbalance of Tag Recommendations
Ad

Recently uploaded (20)

PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Architecture types and enterprise applications.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
The various Industrial Revolutions .pptx
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Five Habits of High-Impact Board Members
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
2018-HIPAA-Renewal-Training for executives
DOCX
search engine optimization ppt fir known well about this
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
Chapter 5: Probability Theory and Statistics
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Enhancing plagiarism detection using data pre-processing and machine learning...
Architecture types and enterprise applications.pdf
Custom Battery Pack Design Considerations for Performance and Safety
The various Industrial Revolutions .pptx
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
sbt 2.0: go big (Scala Days 2025 edition)
Five Habits of High-Impact Board Members
Comparative analysis of machine learning models for fake news detection in so...
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Final SEM Unit 1 for mit wpu at pune .pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Zenith AI: Advanced Artificial Intelligence
2018-HIPAA-Renewal-Training for executives
search engine optimization ppt fir known well about this
The influence of sentiment analysis in enhancing early warning system model f...
Consumable AI The What, Why & How for Small Teams.pdf
Flame analysis and combustion estimation using large language and vision assi...
Chapter 5: Probability Theory and Statistics
Ad

From Tweets To Polls Linking Text Sentiment To Public Opinion Time Series

  • 1. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series Brendan O’Connor† Ramnath Balasubramanyan† Bryan R. Routledge§ Noah A. Smith† brenocon@cs.cmu.edu rbalasub@cs.cmu.edu routledge@cmu.edu nasmith@cs.cmu.edu † § School of Computer Science Tepper School of Business Carnegie Mellon University Carnegie Mellon University Abstract statistics derived from extremely simple text analysis tech- niques are demonstrated to correlate with polling data on We connect measures of public opinion measured from consumer confidence and political opinion, and can also pre- polls with sentiment measured from text. We analyze several surveys on consumer confidence and political dict future movements in the polls. We find that temporal opinion over the 2008 to 2009 period, and find they smoothing is a critically important issue to support a suc- correlate to sentiment word frequencies in contempora- cessful model. neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- Data sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. We begin by discussing the data used in this study: Twitter for the text data, and public opinion surveys from multiple polling organizations. Introduction If we want to know, say, the extent to which the U.S. pop- Twitter Corpus ulation likes or dislikes Barack Obama, an obvious thing to do is to ask a random sample of people (i.e., poll). Survey Twitter is a popular microblogging service in which users and polling methodology, extensively developed through the post messages that are very short: less than 140 characters, 20th century (Krosnick, Judd, and Wittenbrink 2005), gives averaging 11 words per message. It is convenient for re- numerous tools and techniques to accomplish representative search because there are a very large number of messages, public opinion measurement. many of which are publicly available, and obtaining them With the dramatic rise of text-based social media, mil- is technically simple compared to scraping blogs from the lions of people broadcast their thoughts and opinions on a web. great variety of topics. Can we analyze publicly available We use 1 billion Twitter messages posted over the years data to infer population attitudes in the same manner that 2008 and 2009, collected by querying the Twitter API,1 as public opinion pollsters query a population? If so, then min- well as archiving the “Gardenhose” real-time stream. This ing public opinion from freely available text content could comprises a roughly uniform sample of public messages, in be a faster and less expensive alternative to traditional polls. the range of 100,000 to 7 million messages per day. (The (A standard telephone poll of one thousand respondents eas- primary source of variation is growth of Twitter itself; its ily costs tens of thousands of dollars to run.) Such analysis message volume increased by a factor of 50 over this two- would also permit us to consider a greater variety of polling year time period.) questions, limited only by the scope of topics and opinions Most Twitter users appear to live in the U.S., but we made people broadcast. Extracting the public opinion from social no systematic attempt to identify user locations or even mes- media text provides a challenging and rich context to explore sage language, though our analysis technique should largely computational models of natural language, motivating new ignore non-English messages. research in computational linguistics. There probably exist many further issues with this text In this paper, we connect measures of public opinion sample; for example, the demographics and communication derived from polls with sentiment measured from analy- habits of the Twitter user population probably changed over sis of text from the popular microblogging site Twitter. this time period, which should be adjusted for given our de- We explicitly link measurement of textual sentiment in mi- sire to measure attitudes in the general population. There croblog messages through time, comparing to contempo- are clear opportunities for better preprocessing and stratified raneous polling data. In this preliminary work, summary sampling to exploit these data. Copyright c 2010, Association for the Advancement of Artificial 1 Intelligence (www.aaai.org). All rights reserved. This scraping effort was conducted by Brendan Meeder. Page 1 of 8 To appear in: Proceedings of the International AAAI Conference on Weblogs and Social Media, Washington, DC, May 2010.
  • 2. Public Opinion Polls −20 Gallup Econ. Conf. We consider several measures of consumer confidence and political opinion, all obtained from telephone surveys to par- −40 ticipants selected through random-digit dialing, a standard technique in traditional polling (Chang and Krosnick 2003). Consumer confidence refers to how optimistic the pub- −60 lic feels, collectively, about the health of the economy and their personal finances. It is thought that high con- sumer confidence leads to more consumer spending; this q line of argument is often cited in the popular media and by 75 Michigan ICS policymakers (Greenspan 2002), and further relationships Index q q q q q with economic activity have been studied (Ludvigson 2004; 65 q Wilcox 2007). Knowing the public’s consumer confidence is q q q q q q of great utility for economic policy making as well as busi- q q q q 55 ness planning. q Two well-known surveys that measure U.S. consumer 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 confidence are the Consumer Confidence Index from the Consumer Board, and the Index of Consumer Sentiment (ICS) from the Reuters/University of Michigan Surveys of Consumers.2 We use the latter, as it is more extensively stud- ied in economics, having been conducted since the 1950s. Figure 1: Monthly Michigan ICS and daily Gallup consumer The ICS is derived from answers to five questions adminis- confidence poll. tered monthly in telephone interviews with a nationally rep- resentative sample of several hundred people; responses are combined into the index score. Two of the questions, for whether they would vote for Barack Obama or John McCain. example, are: Many different organizations administered them throughout 2008; we use a compilation provided by Pollster.com, con- “We are interested in how people are getting along fi- sisting of 491 data points from 46 different polls.5 The data nancially these days. Would you say that you (and your are shown in Figure 3. family living there) are better off or worse off finan- cially than you were a year ago?” “Now turning to business conditions in the country Text Analysis as a whole—do you think that during the next twelve From text, we are interested in assessing the population’s months we’ll have good times financially, or bad times, aggregate opinion on a topic. Immediately, the task can be or what?” broken down into two subproblems: We also use another poll, the Gallup Organization’s “Eco- nomic Confidence” index,3 which is derived from answers 1. Message retrieval: identify messages relating to the topic. to two questions that ask interviewees to to rate the overall 2. Opinion estimation: determine whether these messages economic health of the country. This only addresses a subset express positive or negative opinions or news about the of the issues that are incorporated into the ICS. We are inter- topic. ested in it because, unlike the ICS, it is administered daily (reported as three-day rolling averages). Frequent polling If there is enough training data, this could be formulated as data are more convenient for our comparison purpose, since a topic-sentiment model (Mei et al. 2007), in which the top- we have fine-grained, daily Twitter data, but only over a two- ics and sentiment of documents are jointly inferred. Our year period. Both datasets are shown in Figure 1. dataset, however, is asymmetric, with millions of text mes- For political opinion, we use two sets of polls. The first is sages per day (and millions of distinct vocabulary items) but Gallup’s daily tracking poll for the presidential job approval only a few hundred polling data points in each problem. It is rating for Barack Obama over the course of 2009, which is a challenging setting to estimate a useful model over the vo- reported as 3-day rolling averages.4 These data are shown in cabulary and messages. The signal-to-noise ratio is typical Figure 2. of information retrieval problems: we are only interested in The second is a set of tracking polls during the 2008 information contained in a small fraction of all messages. U.S. presidential election cycle, asking potential voters We therefore opt to use a transparent, deterministic ap- 2 proach based on prior linguistic knowledge, counting in- Downloaded from http://www.sca.isr.umich. stances of positive-sentiment and negative-sentiment words edu/. 3 in the context of a topic keyword. Downloaded from http://www.gallup.com/poll/ 122840/gallup-daily-economic-indexes.aspx. 4 5 Downloaded from http://www.gallup.com/poll/ Downloaded from http://www.pollster.com/ 113980/Gallup-Daily-Obama-Job-Approval.aspx. polls/us/08-us-pres-ge-mvo.php Page 2 of 8
  • 3. 0.012 Approve−Disapprove perc. diff. 0.10 obama 0.006 jobs 60 40 0.000 0.00 2008 2009 2010 2008 2009 2010 20 0.08 0 0.006 mccain 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 2009−12 0.04 job 0.002 0.00 Figure 2: 2009 presidential job approval (Barack Obama). 2008 2009 2010 2008 2009 2010 0.000 0.002 0.004 economy Percent Support for Candidate q q q q q q q q qq q q q q q q q qq q qq qq qq q q q q q 50 q q q q qq q q qqqq qqq qq qq qq q qq q q q qq q q qq q q q q q q qq qq q qq qqqq qq q q q q q q q q q q q q q q q q qq qqqq q q q qq q q qq q q q q q qq q q q q qq q q q q q q q q qq q q q q q qq q q qq q qq q q q q q qqq q q qq q q qqq q qqq qqqqq q q qq q qq q q q qq q q qqq q q q q q q q qq qq q qqq q qq q q q q qq q q q qq q qq qq q qqq qqq qqqq qqqqqq qq q q qq q qq q q q qq qq q qq qq q qq q q q q q q q q qq q q q q q q qq q q qq q q q qq q q q q q q qq qq q q q q qq q q qqqqqqq qq q q qqqq qq q qq qq q qq q q qq q q qq q qqq q q q q q qq q q q qqq q q qq q q q q q q q qq q qq q q qq qqqq qq qq q q q qqqqqq qq q qq qqq qq q q q qqqq q q q qq q q q qq qq q q q q q q q q q q q qqq q qq q qqq q q q q q q q q q q q q q q q qq q q q qq q q q q q q q qq q q q q q qq q q q q q q qq q qq qqq q q qq qq qq qqqq qq qq q q q qq qqqqq q qq qq q q qq qq q qq q qq q q qqqqq qqq q qqq q q q q q q q q q q q qq q q qq qq q q q qq q q qq qq q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q qqqq q q q q q q q qqqq q qq qq q q q q q q q 40 q q q q q q q qq q q q q q q qq q q qq q q q q q q q qq q qq qq qq qq q q qq q q q q q qq q q qq q qq qq q q q qq q qq q q q qq q q qq q q q qq q q qq q 2008 2009 2010 q q q q q q q q qq q qq q q q q q q q q qqq q q q qq q q q q q q q q q q 30 q Figure 4: Fraction of Twitter messages containing various topic keywords, per day. 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 Opinion Estimation We derive day-to-day sentiment scores by counting positive and negative messages. Positive and negative words are de- Figure 3: 2008 presidential elections, Obama vs. McCain fined by the subjectivity lexicon from OpinionFinder, a word (blue and red). Each poll provides separate Obama and Mc- list containing about 1,600 and 1,200 words marked as pos- Cain percentages (one blue and one red point); lines are 7- itive and negative, respectively (Wilson, Wiebe, and Hoff- day rolling averages. mann 2005).6 We do not use the lexicon’s distinctions be- tween weak and strong words. Message Retrieval A message is defined as positive if it contains any positive word, and negative if it contains any negative word. (This We only use messages containing a topic keyword, manually allows for messages to be both positive and negative.) This specified for each poll: gives similar results as simply counting positive and negative words on a given day, since Twitter messages are so short • For consumer confidence, we use economy, job, and jobs. (about 11 words). We define the sentiment score xt on day t as the ratio • For presidential approval, we use obama. of positive versus negative messages on the topic, counting from that day’s messages: • For elections, we use obama and mccain. countt (pos. word ∧ topic word) Each topic subset contained around 0.1–0.5% of all mes- xt = (1) countt (neg. word ∧ topic word) sages on a given day, though with occasional spikes, as seen p(pos. word | topic word, t) in Figure 4. These appear to be driven by news events. All = terms have a weekly cyclical structure, occurring more fre- p(neg. word | topic word, t) quently on weekdays, especially in the middle of the week, where the likelihoods are estimated as relative frequencies. compared to weekends. (In the figure, this is most appar- We performed casual inspection of the detected messages ent for the term job since it has fewer spikes.) Nonetheless, and found many examples of falsely detected sentiment. For these fractions are small. In the earliest and smallest part of example, the lexicon has the noun will as a weak positive our dataset, the topic samples sometimes come out just sev- word, but since we do not use a part-of-speech tagger, this eral hundred messages per day; but by late 2008, there are 6 thousands of messages per day for most datasets. Available at http://www.cs.pitt.edu/mpqa. Page 3 of 8
  • 4. 5 causes thousands of false positives when it matches the verb sense of will.7 Furthermore, recall is certainly very low, 4 since the lexicon is designed for well-written standard En- Sentiment Ratio glish, but many messages on Twitter are written in an infor- 3 mal social media dialect of English, with different and al- ternately spelled words, and emoticons as potentially useful 2 signals. Creating a more comprehensive lexicon with dis- tributional similarity techniques could improve the system; Velikovich et al. (2010) find that such a web-derived lexicon 1 substantially improves a lexicon-based sentiment classifier. 0 Comparison to Related Work The sentiment analysis literature often focuses on analyzing 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 individual documents, or portions thereof (for a review, see Pang and Lee, 2008). Our problem is related to work on sen- timent information retrieval, such as the TREC Blog Track competitions that have challenged systems to find and clas- sify blog posts containing opinions on a given topic (Ounis, Figure 5: Moving average MAt of sentiment ratio for jobs, MacDonald, and Soboroff 2008). under different windows k ∈ {1, 7, 30}: no smoothing The sentiment feature we consider, presence or absence (gray), past week (magenta), and past month (blue). The of sentiment words in a message, is one of the most basic unsmoothed version spikes as high as 10, omitted for space. ones used in the literature. If we view this system in the traditional light—as subjectivity and polarity detection for Moving Average Aggregate Sentiment individual messages—it makes many errors, like all natural Day-to-day, the sentiment ratio is volatile, much more than language processing systems. However, we are only inter- most polls.9 Just like in the topic volume plots (Figure 4), ested in aggregate sentiment. A high error rate merely im- the sentiment ratio rapidly rises and falls each day. In order plies the sentiment detector is a noisy measurement instru- to derive a more consistent signal, and following the same ment. With a fairly large number of measurements, these methodology used in public opinion polling, we smooth the errors will cancel out relative to the quantity we are inter- sentiment ratio with one of the simplest possible temporal ested in estimating, aggregate public opinion.8 Furthermore, smoothing techniques, a moving average over a window of as Hopkins and King (2010) demonstrate, it can actually be the past k days: inaccurate to na¨vely use standard text analysis techniques, ı which are usually designed to optimize per-document classi- 1 fication accuracy, when the goal is to assess aggregate pop- MAt = (xt−k+1 + xt−k+2 + ... + xt ) k ulation proportions. Several prior studies have estimated and made use of ag- Smoothing is a critical issue. It causes the sentiment ratio gregated text sentiment. The informal study by Lindsay to respond more slowly to recent changes, thus forcing con- (2008) focuses on lexical induction in building a sentiment sistent behavior to appear over longer periods of time. Too classifier for a proprietary dataset of Facebook wall posts much smoothing, of course, makes it impossible to see fine- (a web conversation/microblog medium broadly similar to grained changes to aggregate sentiment. See Figure 5 for Twitter), and demonstrates correlations to several polls con- an illustration of different smoothing windows for the jobs ducted during part of the 2008 presidential election. We are topic. unaware of other research validating text analysis against traditional opinion polls, though a number of companies of- Correlation Analysis: Is text sentiment a fer text sentiment analysis basically for this purpose (e.g., leading indicator of polls? Nielsen Buzzmetrics). There are at least several other stud- ies that use time series of either aggregate text sentiment or Figure 6 shows the jobs sentiment ratio compared to the two good vs. bad news, including analyzing stock behavior based different measures of consumer confidence, Gallup Daily on text from blogs (Gilbert and Karahalios 2010), news arti- and Michigan ICS. It is apparent that the sentiment ratio cles (Lavrenko et al. 2000; Koppel and Shtrimberg 2004) captures the broad trends in the survey data. With 15- and investor message boards (Antweiler and Frank 2004; day smoothing, it is reasonably correlated with Gallup at Das and Chen 2007). Dodds and Danforth (2009) use an r = 73.1%. The most glaring difference is a region of emotion word counting technique for purely exploratory high positive sentiment in May-June 2008. But otherwise, analysis of several corpora. the sentiment ratio seems to pick up on the downward slide of consumer confidence through 2008, and the rebound in 7 We tried manually removing this and several other frequently February/March of 2009. mismatching words, but it had little effect. 8 9 There is an issue if errors correlate with variables relevant to That the reported poll results are less volatile does not imply public opinion; for example, if certain demographics speak in di- that they are more accurate reflections of true population opinion alects that are harder to analyze. than the text. Page 4 of 8
  • 5. 4.0 0.9 k=15, lead=0 Text leads poll k=30, lead=50 Poll leads text 3.5 0.8 Sentiment Ratio qqqqq qqq qq qqqqqqq qqqq qqq q qq qqqq q qqqqqqq qqq qqq Corr. against Gallup qq qq q q q 3.0 qq q q q q q q q q qq q q 0.7 q q qqq qqq q q q qq qq q q q q qq qq q q qq q qq q qq q q qq 2.5 q qq qq qq q qq q q qq qq q qq q qq q qq q q q qq qq q 0.6 q q q qq q q q q qq qq qq q q q 2.0 qq q q qq q q qq q q q q q 0.5 q q qq qq qq q q q q 1.5 k=30 0.4 q k=15 k=7 −20 Gallup Economic Confidence Index −90 −50 −10 30 50 70 90 −30 Text lead / poll lag −40 −50 0.8 −60 0.6 Corr. against ICS 0.4 Index 75 0.2 Michigan ICS 70 0.0 65 k=30 60 −0.2 k=60 55 −90 −50 −10 30 50 70 90 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 Text lead / poll lag Figure 7: Cross-correlation plots: sensitivity to lead and lag Figure 6: Sentiment ratio and consumer confidence surveys. for different smoothing windows. L > 0 means the text Sentiment information captures broad trends in the survey window completely precedes the poll, and L < −k means data. the poll precedes the text. (The window straddles the poll for L < −k < 0.) The L = −k positions are marked on each curve. The two parameter settings shown in Figure 6 are highlighted with boxes. Page 5 of 8
  • 6. When consumer confidence changes, can this first be seen and Gallup are correlated (best correlation is r = 86.4% in the text sentiment measure, or in polls? If text sentiment if Gallup is given its own smoothing and alignment at k = responds faster to news events, a sentiment measure may be 30, L = 20), which supports the hypothesis that they are useful for economic researchers and policymakers. We can measuring similar things, and that Gallup is a leading in- test this by looking at leading versions of text sentiment. dicator for ICS. Fixed to 30-day smoothing, the sentiment First note that the text-poll correlation reported above is ratio only achieves r = 63.5% under optimal lead L = 50. the goodness-of-fit metric for fitting slope and bias parame- So it is a weaker indicator than Gallup. ters a, b in a one variable linear least-squares model: Finally, we also experimented with sentiment ratios for k−1 the terms job and economy, which both correlate very poorly with the Gallup poll: 10% and 7% respectively (with the yt = b + a xt−j + t default k = 15, L = 0).10 j=0 This is a cautionary note on the common practice of stem- for poll outcomes yt , daily sentiment ratios xj , Gaussian ming words, which in information retrieval can have mixed noise t , and a fixed hyperparameter k. A poll outcome is effects on performance (Manning, Raghavan, and Sch¨ tze u compared to the k-day text sentiment window that ends on 2008, ch. 2). Here, stemming would have conflated job and the same day as the poll. jobs, severely degrading results. We introduce a lag hyperparameter L into the model, so the poll is compared against the text window ending L days Forecasting Analysis before the poll outcome. As a further validation, we can evaluate the model in a k−1 rolling forecast setting, by testing how well the text-based model can predict future values of the poll. For a lag L, yt+L = b + a xt−j + t and a target forecast date t + L, we train the model only on j=0 historical data through day t − 1, then predict using the win- Graphically, this is equivalent to taking one of the text senti- dow ending on day t. The lag parameter L is how many days ment lines on Figure 6 and shifting it to the right by L days, in the future the forecasts are for. We repeat this model fit then examining the correlation against the consumer confi- and prediction procedure for most days. (We cannot forecast dence polls below. early days in the data, since L + k initial days are necessary Polls are typically administered over an interval. The ICS to cover the start of the text sentiment window, plus at least is reported once per month (at the end of the month), and several days for training.) Gallup is reported for 3-day windows. We always consider the last day of the poll’s window to be the poll date, which is −20 Gallup Economic Confidence Gallup poll the earliest possible day that the information could actually Text forecasts, lead=30 be used. Therefore, we would expect both daily measures, −30 Poll self−forecasts, lead=30 Gallup and text sentiment, to always lead ICS, since it mea- sures phenomena occurring over the previous month. −40 The sensitivity of text-poll correlation to smoothing win- dow and lag parameters (k, L) is shown in Figure 7. The re- −50 gions corresponding to text preceding or following the poll are marked. Correlation is higher for text leading the poll −60 and not the other way around, so text seems to be a leading indicator. Gallup correlations fall off faster for poll-leads- text than text-leads-poll, and the ICS has similar properties. 20 Text coef. If text and polls moved at random relative to each other, these cross-correlation curves would stay close to 0. The 5 fact they have peaks at all strongly suggests that the text sen- −10 timent measure captures information related to the polls. Also note that more smoothing increases the correlation: 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 for Gallup, 7-, 15-, and 30-day windows peak at r = 71.6%, 76.3%, and 79.4% respectively. The 7-day and 15-day win- dows have two local peaks for correlation, corresponding to shifts that give alternate alignments of two different humps against the Gallup data, but the better-correlating 30-day Figure 8: Rolling text-based forecasts (above), and the text window smooths over these entirely. Furthermore, for the sentiment (MAt ) coefficients a for each of the text forecast- ICS, a 60-day window often achieves higher correlation than ing models over time (below). the 30-day window. These facts imply that the text sentiment Results are shown in Figure 8. Forecasts for one month in information is volatile, and if polls are believed to be a gold standard, then it is best used to detect long-term trends. 10 We inspected some of the matching messages to try to under- It is also interesting to consider ICS a gold standard and stand this result, but since the sentiment detector is very noisy at the compare correlations with Gallup and text sentiment. ICS message level, it was difficult to understand what was happening. Page 6 of 8
  • 7. with "obama"Sentiment Ratio for "obama" the future (that is, using past text from 44 through 30 days 5 before the target date) achieve 77.5% correlation. This is slightly worse than a baseline to predict the poll from its 4 lagged self (yt+L ≈ b0 + b1 yt ), which has r = 80.4%. 3 Adding the sentiment score to historical poll information as a bivariate model (yt+L ≈ b0 + b1 yt + aMAt..t−k+1 ), yields 2 a very small improvement (r = 81.0%). Inspecting the rolling forecasts and text model coefficient 1 a is revealing. In 2008 and early 2009, text sentiment is Frac. Messages 0.15 a poor predictor of consumer confidence; for example, it fails to reflect a hump in the polls in August and Septem- ber 2008. The model learns a coefficient near zero (even 0.00 negative), and makes predictions similar to the poll’s self- % Support Obama (Election) predictions, which is possible since the poll’s most recent % Pres. Job Approval 55 70 values are absorbed into the bias term of the text-only model. However, starting in mid-2009, text sentiment becomes a 50 60 much better predictor, as it captures the general rise in con- sumer confidence starting then (see Figure 6). This sug- 45 50 gests qualitatively different phenomena are being captured by the text sentiment measure at different times. From the 40 40 perspective of time series modeling, future work should in- vestigate techniques for deciding the importance of different 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 2009−12 historical signals and time periods, such as vector autore- gressions (e.g. Hamilton 1994). It is possible that the effectiveness of text changes over this time period for reasons described earlier: Twitter itself changed substantially over this time period. In 2008, the site Figure 9: The sentiment ratio for obama (15-day window), had far fewer users who were probably less representative and fraction of all Twitter messages containing obama (day- of the general population, and were using the site differently by-day, no smoothing), compared to election polls (2008) than users would later. and job approval polls (2009). Obama 2009 Job Approval and 2008 Elections We also found that the topic frequencies correlate with We analyze the sentiment ratio for obama and compared polls much more than the sentiment scores. First note that it to two series of polls, presidential job approval in 2009, the message volume for obama, shown in Figure 9, has the and presidential election polls in 2008, as seen in Figure 9. usual daily spikes like other words on Twitter shown in Fig- The job approval poll is the most straightforward, being a ure 4. Some of these spikes are very dramatic; for example, steady decline since the start of the Obama presidency, per- on November 5th, nearly 15% of all Twitter messages (in haps with some stabilization in September or so. The sen- our sample) mentioned the word obama. timent ratio also generally declines during this period, with Furthermore, the obama message volume substantially r = 72.5% for k = 15. correlates to the poll numbers. Even the raw volume has a However, in 2008 the sentiment ratio does not substan- 52% correlation to the polls, and the 15-day window version tially correlate to the election polls (r = −8%); we compare is up to r = 79%. Simple attention seems to be associated to the percent of support for Obama, averaged over a 7-day with popularity, at least for Obama. But the converse is not window of tracking polls: the same information displayed true for mccain; this word’s 15-day message volume also in Figure 3). Lindsay (2008) found that his daily senti- correlates to higher Obama ratings in the polls (r = 74%). ment score was a leading indicator to one particular tracking A simple explanation may be that frequencies of either term poll (Rasmussen) over a 100-day period from June-October mccain or obama are general indicators of elections news 2008. Our measure also roughly correlates to the same data, and events, and most 2008 elections news and events were though less strongly (r = 44% vs. r = 57%) and only at favorable toward or good for Obama. Certainly, topic fre- different lag parameters. quency may not have a straightforward relationship to pub- The elections setting may be structurally more complex lic opinion in a more general text-driven methodology for than presidential job approval. In many of the tracking polls, public opinion measurement, but given the marked effects it people can choose to answer any Obama, McCain, unde- has in these data, it is worthy of further exploration. cided, not planning to vote, and third-party candidates. Fur- thermore, the name of every candidate has its own sentiment Conclusion ratio scores in the data. We might expect the sentiment for In the paper we find that a relatively simple sentiment de- mccain to be vary inversely with obama, but they in fact tector based on Twitter data replicates consumer confidence slightly correlate. It is also unclear how they should interact and presidential job approval polls. While the results do not as part of a model of voter preferences. come without caution, it is encouraging that expensive and Page 7 of 8
  • 8. time-intensive polling can be supplemented or supplanted Gilbert, E., and Karahalios, K. 2010. Widespread worry with the simple-to-gather text data that is generated from and the stock market. In Proceedings of the International online social networking. The results suggest that more ad- Conference on Weblogs and Social Media. vanced NLP techniques to improve opinion estimation may Greenspan, A. 2002. Remarks at the Bay Area coun- be very useful. cil conference, San Francisco, California. http: The textual analysis could be substantially improved. Be- //www.federalreserve.gov/boarddocs/ sides the clear need for a more well-suited lexicon, the speeches/2002/20020111/default.htm. modes of communication should be considered. When mes- Hamilton, J. D. 1994. Time Series Analysis. Princeton sages are retweets (forwarded messages), should they be University Press. counted? What about news headlines? Note that Twitter is rapidly changing, and the experiments on recent (2009) data Hopkins, D., and King, G. 2010. A method of automated performed best, which suggests that it is evolving in a direc- nonparametric content analysis for social science. Ameri- tion compatible with our approach, which uses no Twitter- can Journal of Political Science 54(1):229–247. specific features at all. Koppel, M., and Shtrimberg, I. 2004. Good news or bad In this work, we treat polls as a gold standard. Of course, news? Let the market decide. In AAAI Spring Symposium they are noisy indicators of the truth — as is evident in Fig- on Exploring Attitude and Affect in Text: Theories and Ap- ure 3 — just like extracted textual signals. Future work plications. should seek to understand how these different signals reflect Krosnick, J. A.; Judd, C. M.; and Wittenbrink, B. 2005. public opinion either as a hidden variable, or as measured The measurement of attitudes. The Handbook of Attitudes from more reliable sources like face-to-face interviews. 2176. Many techniques from traditional survey methodology Lavrenko, V.; Schmill, M.; Lawrie, D.; Ogilvie, P.; Jensen, can also be used again for automatic opinion measurement. D.; and Allan, J. 2000. Mining of concurrent text and time For example, polls routinely use stratified sampling and series. In Proceedings of the 6th ACM SIGKDD Int’l Con- weighted designs to ask questions of a representative sam- ference on Knowledge Discovery and Data Mining Work- ple of the population. Given that many social media sites shop on Text Mining. include user demographic information, such a design is a sensible next step. Lindsay, R. 2008. Predicting polls with Lexicon. Eventually, we see this research progressing to align with http://languagewrong.tumblr.com/post/ the more general goal of query-driven sentiment analysis 55722687/predicting-polls-with-lexicon. where one can ask more varied questions of what people are Ludvigson, S. C. 2004. Consumer confidence and con- thinking based on text they are already writing. Modeling sumer spending. The Journal of Economic Perspectives traditional survey data is a useful application of sentiment 18(2):29–50. analysis. But it is also a stepping stone toward larger and Manning, C. D.; Raghavan, P.; and Sch¨ tze, H. 2008. In- u more sophisticated applications. troduction to Information Retrieval. Cambridge University Press, 1st edition. Acknowledgments Mei, Q.; Ling, X.; Wondra, M.; Su, H.; and Zhai, C. X. This work is supported by the Center for Applied Research 2007. Topic sentiment mixture: modeling facets and opin- in Technology at the Tepper School of Business, and the ions in weblogs. In Proceedings of the 16th International Berkman Faculty Development Fund at Carnegie Mellon conference on World Wide Web. University. We would like to thank the reviewers for help- Ounis, I.; MacDonald, C.; and Soboroff, I. 2008. On the ful suggestions, Charles Franklin for advice in interpreting TREC blog track. In Proceedings of the International Con- election polling data, and Brendan Meeder for contribution ference on Weblogs and Social Media. of the Twitter scrape. Pang, B., and Lee, L. 2008. Opinion Mining and Sentiment References Analysis. Now Publishers Inc. Antweiler, W., and Frank, M. Z. 2004. Is all that talk just Velikovich, L.; Blair-Goldensohn, S.; Hannan, K.; and Mc- noise? the information content of internet stock message Donald, R. 2010. The viability of web-dervied polarity boards. Journal of Finance 59(3):1259–1294. lexicons. In Proceedings of Human Language Technolo- gies: The 11th Annual Conference of the North American Chang, L. C., and Krosnick, J. A. 2003. National surveys Chapter of the Association for Computational Linguistics. via RDD telephone interviewing vs. the internet: Compar- ing sample representativeness and response quality. Un- Wilcox, J. 2007. Forecasting components of consumption published manuscript. with components of consumer sentiment. Business Eco- nomics 42(4):2232. Das, S. R., and Chen, M. Y. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Manage- Wilson, T.; Wiebe, J.; and Hoffmann, P. 2005. Recog- ment Science 53(9):1375–1388. nizing contextual polarity in phrase-level sentiment analy- sis. In Proceedings of the Conference on Human Language Dodds, P. S., and Danforth, C. M. 2009. Measuring the Technology and Empirical Methods in Natural Language happiness of Large-Scale written expression: Songs, blogs, Processing. and presidents. Journal of Happiness Studies 116. Page 8 of 8