SlideShare a Scribd company logo
A FRAMEWORK FOR REAL-TIME
SEMANTIC SOCIAL MEDIA
ANALYSIS
Diana Maynard, Ian Roberts,
Mark A. Greenwood, Kalina
Bontcheva
ZELIA BLAGA
DOMAIN
• collect and analyse large volume of social media content
• provides:
behavioral evidence
opinion mining
sentiment analysis
• case study: 2015 UK elections, Nesta-funded Political Futures Tracker
project [1]
2
CHALLENGES
• dynamic content
• reflection of author’s social and sentimental fluctuations
• nuances in communication such as sarcasm
• activity on social media sites is triggered by specific events (e.g.
sports events, celebrations, crises, news) and topics (e.g. global
warming, terrorism or immigration)
• Twitter: reactive medium, opinions tend to be event-driven rather
than topic-driven
3
RELATED WORK
• Twitris [2]
• sentiment analysis tool
• able to take a sample of social media chatter about a specific topic
and deduce real-time, large-scale, automated sentiment about the
specific topic
• cloud-based visualization
• static
• example: analyzed Twitter chatter leading up to the Great
Britain/European Union Membership Referendum (Brexit) on June
23rd; was able to predict six hours before the news broke that the
polls leaning toward the “remain” camp were incorrect
4
RELATED WORK
• TwitInfo [3]
• sentiment analysis tool
• used timeline to display tweet activity during a real-world event
• colour-coded for sentiment
• dynamic
• example: a football game
5
FRAMEWORK
• live processing system
1. Collector
2. Processor
3. Indexing and querying
6
FRAMEWORK
1. Collector
• receives tweets from Twitter via their streaming API and forwards
them to a reliable messaging queue; saves the raw JSON of the
tweets in backup files for later re-processing if necessary
• 2 constraints: track (textual filter that delivers all tweets that mention
specified keywords/ hashtags) and follow (a user ID filter that delivers
all tweets by specified Twitter users, as well as any tweet that is a
retweet of, or a reply to, a tweet by one of the specified users)
7
FRAMEWORK
2. Processor
• consumes tweets from the message queue, processes them with the
GATE analysis pipeline (Natural Language Processing) [4] and sends
the annotated documents to GATE Mimir [5] for indexing
• standalone Java application built using the Spring Boot framework
(message consumer application)
8
GATE
• GATE = Language Resources (ontologies, lexicons) + Processing
Resources (parsers, generators) + Visual Resources (GUI)
9
FRAMEWORK
3. GATE Mimir
• receives the annotated tweets and indexes their text and annotation
data, making it available for searching [5]
• not purely keyword based
• semantic-based search that can be performed over categories of
things
• can include synonyms, unknowns (e.g. amount of money), ranges
(e.g. values between 2 given numbers), restrictions to date periods,
domains etc.
10
MIMIR
• semantic search over text, document
structure, linguistic annotations, and formal
semantic knowledge
• indexes plain tweet text, structural metadata,
hashtags, mentions, semantic annotations,
and timestamps
• use Prospector to filter and visualize
correlations in large data sets (e.g. the most
frequent topics associated with positive
sentiment)
• temporal analytics – which topics are popular
over a time period
11
FRAMEWORK CHARACTERISTICS
• loosely coupled
• no direct dependency between the collector and processor
components
• communication is mediated through the message queue
• components can be distributed across different machines for higher
performance and/or robustness
• Mimir instance can easily sustain 10-15,000 tweets per minute
• multiple collectors can be run at the same time
12
ANALYSIS OF POLITICAL TWEETS
• long-term monitoring: list of all MPs, candidates and official party
accounts (560 MPs, 1811 candidates)
• collected 1.8 million tweets from 24 October 2014 until 13 February
2015, out of which 100k original, 700k replies, 1 million retweets
• for debates: track relevant hashtags (#leadersdebate, #BBCdebate),
and more general hashtags (#GE2015, #UKElection)
13
ANALYSIS OF POLITICAL TWEETS
• Semantic annotation done using TwitIE [6] - Named Entity
Recognition (identifying Persons, Places, Organisations etc.) and
Linking (mapping these to their respective URIs in Wikipedia or other
web-based knowledge sources)
• Sentiment analysis - detecting if a post conveys sentiment and if so,
whether it is positive or negative, the strength of this sentiment, and
whether the statement is sarcastic or not
• Topic detection - classifying terms according to the set of key
themes; performed by means of manually created gazetteer lists for
each topic
• MP and Candidate recognition
• Author recognition
14
TWITIE
• Information
Extraction
• GATE’s annotation +
ANNIE
• ANNIE = tokeniser,
sentence splitter, POS
tagger, gazetteer lists,
finite state transducer
• Gazetteer = lists such
as cities,
organisations, days of
the week; names of
useful indicators,
such as typical
company designators
(e.g. ‘Ltd.’), titles
15
TOPIC DETECTION EXAMPLE
16
SENTIMENT DETECTION EXAMPLE
17
SENTIMENT DETECTION EXAMPLE
18
• contains a wealth of useful information
on the domain of UK politics
• every UK MP is represented, along with
their constituency and the political party
to which they belong
• [7]
MIMIR QUERY
19
FUTURE THINKING
• identify phrases that may have a temporal orientation, and try to
guess what that orientation is
• temporal_thinking annotation [1]
temporal signals (e.g. simultaneously, as soon as, before, after)
soft signals (e.g. our children, planning, the long term, then-
current, the great war)
verb groups
• degree of future thinking = combined weights of relevant indicators
20
FUTURE THINKING
• Future tense: +0.65
• Past tense: -0.45
• Present perfect: -0.5
• Future timex: +1.0
• Past timex: -1.0
• Pre-2015 date: -1.0
• Pre-election date: -0.15
• Near future date: +0.5
• Far future date: +1.0
• Very far future date: +1.5
• Future temporal signal: +0.4
• Past temporal signal: -1.0
• Future soft indicator: +0.3
• Past soft indicator: +0.3
21
EXPLODING QUERIES
• Count all the sentiment expressions about the theme in tweets
written by party candidates for constituencies in region.
Count all the positive sentiment expressions about the “UK
economy” theme in tweets written by Labour candidates for
constituencies in Greater London.
Count all the negative sentiment expressions about the “UK
economy” theme in tweets written by Labour candidates for
constituencies in Greater London.
22
positive or negative
45 different political themes
7 main UK political parties
12 main
regions
VISUALISATIONS
• Top 10 topics mentioned by Green Party members
23
VISUALISATIONS
• Treemap showing the most frequent terms about climate change
mentioned by the Labour Party
24
VISUALISATIONS
• Dynamic Co-ccourence Matrix
25
VISUALISATIONS
• Choropleth depicting distribution of tweets about the economy
26
CLIMATE CHANGE ENGAGEMENT
• retweets, for self-gain and philanthropy, as a mean of disseminating
information to a wider audience and faster propagate a message
• 64.48% of climate change tweets were actually retweets
• high level of engagement (second in percentage of opinionated
tweets, after Europe)
• original tweets containing a URL are more likely to be retweeted
27
EVALUATION
• critical component: NLP processing
• must extract entities, topics and sentiments correctly
• used tools that had a high performance in previous analytics (like
TwitIE)
• Precision: 85.87%
• Recall: 53.05%
• can be adapted to different tasks (topics or groups of people)
28
CONCLUSIONS
• analysis is not limited to searching for relevant documents that
match a query
• questions can be expressed in a natural, broad way (e.g. „Which
political party talks the most about environmental topics?”, „Which
politician gets the most retweets when she/ he talks about climate
change?”)
• can lead to studies by social scientists, environmentalists and
politicians
• can be used for predictive analytics, "intent mining" - looking at
things like tendency to purchase, to prefer a brand, to switch etc.
• measure how the audience perceives the content and what kind of
emotional response it elicits
• in future, it could be combined with language generation for
creating content, e.g. writing news
29
BIBLIOGRAPHY
1. Diana Maynard, Ian Roberts, Mark A Greenwood, Leon Derczynski, Kalina Bontcheva. 2015.
Political Futures Tracker: Technical report. - Nesta working paper 15/12.
2. M. Nagarajan, K. Gomadam, A. Sheth, A. Ranabahu, R. Mutharaju, and A. Jadhav. Spatio-
temporal-thematic analysis of citizen sensor data: Challenges and experiences. In Web
Information Systems Engineering, pages 539–553, 2009.
3. A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. TwitInfo:
Aggregating and visualizing microblogs for event exploration. In Proceedings of the 2011
Conference on Human Factors in Computing Systems (CHI), pages 227–236, 2011.
4. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: an Architecture for
Development of Robust HLT Applications. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, 7–12 July 2002, ACL ’02, pages 168–175, Stroudsburg,
PA, USA, 2002. Association for Computational Linguistics.
5. V. Tablan, K. Bontcheva, I. Roberts, and H. Cunningham. Mímir: an open-source semantic search
framework for interactive information seeking and discovery. Journal of Web Semantics, 2014.
6. K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani. TwitIE: An
Open-Source Information Extraction Pipeline for Microblog Text. In Proceedings of the
International Conference on Recent Advances in Natural Language Processing. Association for
Computational Linguistics, 2013.
7. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia – a
crystallization point for the web of data. Journal of Web Semantics: Science, Services and Agents
on the World Wide Web, 7:154–165, 2009.
30
THANK YOU!
QUESTIONS?
31

More Related Content

PPTX
Social Media Mining: An Introduction
PDF
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
POT
Data mining on Social Media
PDF
Big Data presentation for Statistics Canada
PDF
TERRORIST WATCHER: AN INTERACTIVE WEBBASED VISUAL ANALYTICAL TOOL OF TERRORIS...
PDF
Beyond-Data-Literacy-2015
PDF
Data mining in social network
PDF
Big Data @ CBS
Social Media Mining: An Introduction
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Data mining on Social Media
Big Data presentation for Statistics Canada
TERRORIST WATCHER: AN INTERACTIVE WEBBASED VISUAL ANALYTICAL TOOL OF TERRORIS...
Beyond-Data-Literacy-2015
Data mining in social network
Big Data @ CBS

What's hot (13)

PPTX
Data mining for social media
PDF
Isi 2017 presentation on Big Data and bias
PDF
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
PPT
Evolving social data mining and affective analysis
PDF
Mining social data
PDF
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
DOCX
Individual project 2.20
PDF
Predicting Elections with Twitter
PDF
Big data for development
PPTX
Social Media Mining - Chapter 2 (Graph Essentials)
PDF
Secondary source qual
PDF
Some envivomx pres
Data mining for social media
Isi 2017 presentation on Big Data and bias
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
Evolving social data mining and affective analysis
Mining social data
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
Individual project 2.20
Predicting Elections with Twitter
Big data for development
Social Media Mining - Chapter 2 (Graph Essentials)
Secondary source qual
Some envivomx pres
Ad

Similar to A framework for real time semantic social media analysis (20)

PPTX
From Research to Applications: What Can We Extract with Social Media Sensing?
PDF
Document(2)
PDF
Big data analysis of news and social media content
PPT
Socialsensor project overview and topic discovery in tweeter streams
PPTX
Semanticnews 230913-final
PDF
Vakulenko PhD Status Report - 16 February 2016
PDF
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
PPTX
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
PPTX
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
PPTX
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
PDF
Vision about Social Networks Content Exploitation (EC Concertation meeting)
PDF
Tweet alert - semantic analysis in social networks for citizen opinion mining
PPTX
Citizen Sensor Data Mining, Social Media Analytics and Applications
PDF
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
PPTX
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
PDF
News construction from microblogging post using open data
PPTX
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
PDF
Osimo crossover-opinionminingv3
PPTX
Analysing Political Communication with AI and Data Scraping - Clean.pptx
PPTX
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
From Research to Applications: What Can We Extract with Social Media Sensing?
Document(2)
Big data analysis of news and social media content
Socialsensor project overview and topic discovery in tweeter streams
Semanticnews 230913-final
Vakulenko PhD Status Report - 16 February 2016
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Tweet alert - semantic analysis in social networks for citizen opinion mining
Citizen Sensor Data Mining, Social Media Analytics and Applications
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
News construction from microblogging post using open data
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
Osimo crossover-opinionminingv3
Analysing Political Communication with AI and Data Scraping - Clean.pptx
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Ad

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
Business Acumen Training GuidePresentation.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
.pdf is not working space design for the following data for the following dat...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
climate analysis of Dhaka ,Banglades.pptx
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
1_Introduction to advance data techniques.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
IBA_Chapter_11_Slides_Final_Accessible.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Major-Components-ofNKJNNKNKNKNKronment.pptx
IB Computer Science - Internal Assessment.pptx

A framework for real time semantic social media analysis

  • 1. A FRAMEWORK FOR REAL-TIME SEMANTIC SOCIAL MEDIA ANALYSIS Diana Maynard, Ian Roberts, Mark A. Greenwood, Kalina Bontcheva ZELIA BLAGA
  • 2. DOMAIN • collect and analyse large volume of social media content • provides: behavioral evidence opinion mining sentiment analysis • case study: 2015 UK elections, Nesta-funded Political Futures Tracker project [1] 2
  • 3. CHALLENGES • dynamic content • reflection of author’s social and sentimental fluctuations • nuances in communication such as sarcasm • activity on social media sites is triggered by specific events (e.g. sports events, celebrations, crises, news) and topics (e.g. global warming, terrorism or immigration) • Twitter: reactive medium, opinions tend to be event-driven rather than topic-driven 3
  • 4. RELATED WORK • Twitris [2] • sentiment analysis tool • able to take a sample of social media chatter about a specific topic and deduce real-time, large-scale, automated sentiment about the specific topic • cloud-based visualization • static • example: analyzed Twitter chatter leading up to the Great Britain/European Union Membership Referendum (Brexit) on June 23rd; was able to predict six hours before the news broke that the polls leaning toward the “remain” camp were incorrect 4
  • 5. RELATED WORK • TwitInfo [3] • sentiment analysis tool • used timeline to display tweet activity during a real-world event • colour-coded for sentiment • dynamic • example: a football game 5
  • 6. FRAMEWORK • live processing system 1. Collector 2. Processor 3. Indexing and querying 6
  • 7. FRAMEWORK 1. Collector • receives tweets from Twitter via their streaming API and forwards them to a reliable messaging queue; saves the raw JSON of the tweets in backup files for later re-processing if necessary • 2 constraints: track (textual filter that delivers all tweets that mention specified keywords/ hashtags) and follow (a user ID filter that delivers all tweets by specified Twitter users, as well as any tweet that is a retweet of, or a reply to, a tweet by one of the specified users) 7
  • 8. FRAMEWORK 2. Processor • consumes tweets from the message queue, processes them with the GATE analysis pipeline (Natural Language Processing) [4] and sends the annotated documents to GATE Mimir [5] for indexing • standalone Java application built using the Spring Boot framework (message consumer application) 8
  • 9. GATE • GATE = Language Resources (ontologies, lexicons) + Processing Resources (parsers, generators) + Visual Resources (GUI) 9
  • 10. FRAMEWORK 3. GATE Mimir • receives the annotated tweets and indexes their text and annotation data, making it available for searching [5] • not purely keyword based • semantic-based search that can be performed over categories of things • can include synonyms, unknowns (e.g. amount of money), ranges (e.g. values between 2 given numbers), restrictions to date periods, domains etc. 10
  • 11. MIMIR • semantic search over text, document structure, linguistic annotations, and formal semantic knowledge • indexes plain tweet text, structural metadata, hashtags, mentions, semantic annotations, and timestamps • use Prospector to filter and visualize correlations in large data sets (e.g. the most frequent topics associated with positive sentiment) • temporal analytics – which topics are popular over a time period 11
  • 12. FRAMEWORK CHARACTERISTICS • loosely coupled • no direct dependency between the collector and processor components • communication is mediated through the message queue • components can be distributed across different machines for higher performance and/or robustness • Mimir instance can easily sustain 10-15,000 tweets per minute • multiple collectors can be run at the same time 12
  • 13. ANALYSIS OF POLITICAL TWEETS • long-term monitoring: list of all MPs, candidates and official party accounts (560 MPs, 1811 candidates) • collected 1.8 million tweets from 24 October 2014 until 13 February 2015, out of which 100k original, 700k replies, 1 million retweets • for debates: track relevant hashtags (#leadersdebate, #BBCdebate), and more general hashtags (#GE2015, #UKElection) 13
  • 14. ANALYSIS OF POLITICAL TWEETS • Semantic annotation done using TwitIE [6] - Named Entity Recognition (identifying Persons, Places, Organisations etc.) and Linking (mapping these to their respective URIs in Wikipedia or other web-based knowledge sources) • Sentiment analysis - detecting if a post conveys sentiment and if so, whether it is positive or negative, the strength of this sentiment, and whether the statement is sarcastic or not • Topic detection - classifying terms according to the set of key themes; performed by means of manually created gazetteer lists for each topic • MP and Candidate recognition • Author recognition 14
  • 15. TWITIE • Information Extraction • GATE’s annotation + ANNIE • ANNIE = tokeniser, sentence splitter, POS tagger, gazetteer lists, finite state transducer • Gazetteer = lists such as cities, organisations, days of the week; names of useful indicators, such as typical company designators (e.g. ‘Ltd.’), titles 15
  • 18. SENTIMENT DETECTION EXAMPLE 18 • contains a wealth of useful information on the domain of UK politics • every UK MP is represented, along with their constituency and the political party to which they belong • [7]
  • 20. FUTURE THINKING • identify phrases that may have a temporal orientation, and try to guess what that orientation is • temporal_thinking annotation [1] temporal signals (e.g. simultaneously, as soon as, before, after) soft signals (e.g. our children, planning, the long term, then- current, the great war) verb groups • degree of future thinking = combined weights of relevant indicators 20
  • 21. FUTURE THINKING • Future tense: +0.65 • Past tense: -0.45 • Present perfect: -0.5 • Future timex: +1.0 • Past timex: -1.0 • Pre-2015 date: -1.0 • Pre-election date: -0.15 • Near future date: +0.5 • Far future date: +1.0 • Very far future date: +1.5 • Future temporal signal: +0.4 • Past temporal signal: -1.0 • Future soft indicator: +0.3 • Past soft indicator: +0.3 21
  • 22. EXPLODING QUERIES • Count all the sentiment expressions about the theme in tweets written by party candidates for constituencies in region. Count all the positive sentiment expressions about the “UK economy” theme in tweets written by Labour candidates for constituencies in Greater London. Count all the negative sentiment expressions about the “UK economy” theme in tweets written by Labour candidates for constituencies in Greater London. 22 positive or negative 45 different political themes 7 main UK political parties 12 main regions
  • 23. VISUALISATIONS • Top 10 topics mentioned by Green Party members 23
  • 24. VISUALISATIONS • Treemap showing the most frequent terms about climate change mentioned by the Labour Party 24
  • 26. VISUALISATIONS • Choropleth depicting distribution of tweets about the economy 26
  • 27. CLIMATE CHANGE ENGAGEMENT • retweets, for self-gain and philanthropy, as a mean of disseminating information to a wider audience and faster propagate a message • 64.48% of climate change tweets were actually retweets • high level of engagement (second in percentage of opinionated tweets, after Europe) • original tweets containing a URL are more likely to be retweeted 27
  • 28. EVALUATION • critical component: NLP processing • must extract entities, topics and sentiments correctly • used tools that had a high performance in previous analytics (like TwitIE) • Precision: 85.87% • Recall: 53.05% • can be adapted to different tasks (topics or groups of people) 28
  • 29. CONCLUSIONS • analysis is not limited to searching for relevant documents that match a query • questions can be expressed in a natural, broad way (e.g. „Which political party talks the most about environmental topics?”, „Which politician gets the most retweets when she/ he talks about climate change?”) • can lead to studies by social scientists, environmentalists and politicians • can be used for predictive analytics, "intent mining" - looking at things like tendency to purchase, to prefer a brand, to switch etc. • measure how the audience perceives the content and what kind of emotional response it elicits • in future, it could be combined with language generation for creating content, e.g. writing news 29
  • 30. BIBLIOGRAPHY 1. Diana Maynard, Ian Roberts, Mark A Greenwood, Leon Derczynski, Kalina Bontcheva. 2015. Political Futures Tracker: Technical report. - Nesta working paper 15/12. 2. M. Nagarajan, K. Gomadam, A. Sheth, A. Ranabahu, R. Mutharaju, and A. Jadhav. Spatio- temporal-thematic analysis of citizen sensor data: Challenges and experiences. In Web Information Systems Engineering, pages 539–553, 2009. 3. A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. TwitInfo: Aggregating and visualizing microblogs for event exploration. In Proceedings of the 2011 Conference on Human Factors in Computing Systems (CHI), pages 227–236, 2011. 4. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: an Architecture for Development of Robust HLT Applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 7–12 July 2002, ACL ’02, pages 168–175, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 5. V. Tablan, K. Bontcheva, I. Roberts, and H. Cunningham. Mímir: an open-source semantic search framework for interactive information seeking and discovery. Journal of Web Semantics, 2014. 6. K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani. TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics, 2013. 7. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia – a crystallization point for the web of data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 7:154–165, 2009. 30