SlideShare a Scribd company logo
New Methodologies for
 Capturing and Working
 with Publicly Available
 Twitter Data


Associate Professor Axel Bruns
@snurb_dot_info
http://mappingonlinepublics.net/
Queensland University of Technology
WHY TWITTER?

• Researching Twitter:
   – Significant world-wide social network
   – ~500 million accounts (but how many active?)
   – Varied range of uses: from phatic communication to emergency coordination
   – Healthy third-party ecosystem (for now)
   – Strong history of user innovation:
     @replies, #hashtags
   – Flat and open network structure:
     non-reciprocal following, public profiles by default
   – Good API for gathering (big) data for research
NEW MEDIA AND PUBLIC COMMUNICATION:
      MAPPING AUSTRALIAN USER -CREATED CONTENT
             IN ONLINE SOCIAL NETWORKS

•   Australian Research Council (ARC) Discovery Project (2010-13) – $410,000
     –   QUT (Brisbane), Sociomantic Labs (Berlin)
     –   First comprehensive study of Australian social media use
     –   Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr,
         YouTube as ‘networked publics’
     –   Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and
         communication studies – natively digital methods
     –   Studying society with the Internet (Richard Rogers)

      http://mappingonlinepublics.net/
A TWITTER RESEARCH TOOLKIT

• Data Gathering
   – yourTwapperkeeper + in-house crawler

• Data Processing
   – Gawk – open source, multiplatform, programmable command-line tool for
     processing CSV documents

• Textual Analysis
   – Leximancer – commercial, multiplatform: extracts key concepts from large
     corpora of text, examines and visualises concept co-occurrence
   – WordStat – commercial, PC-only text analysis tool; generates concept co-
     occurrence data that can be exported for visualisation

• Visualisation
   – Gephi – open source, multiplatform network visualisation tool
SO NOW WHAT?
APPROACHING TWITTER

• Possible research questions:
   – Hashtags as vehicles for ad hoc events and publics:
       • How do online publics form and dissolve? How do they interact, what
         structures do they form?
       • Where do they draw information from? What do they share?
       • Do they simply consist of the usual suspects? How insular and disconnected
         are online publics?
   – Hashtags in context:
       • How do different hashtag events compare? Are there common types of
         hashtags/publics?
       • How ‘big’ are they? What topics attract attention on Twitter?
       • What community (?) structures emerge?
DEVELOPING TWITTER METRICS

• Key data points available through the Twitter API:
    –   text:                contents of the tweet itself, in 140 characters or less
    –   to_user_id:          numerical ID of the tweet recipient (for @replies)
    –   from_user:           screen name of the tweet sender
    –   id:                  numerical ID of the tweet itself
    –   from_user_id:        numerical ID of the tweet sender
    –   iso_language_code:   code (e.g. en, de, fr, ...) of the sender’s default language
    –   source:              client software used to tweet (e.g. Web, Tweetdeck, ...)
    –   profile_image_url:   URL of the tweet sender’s profile picture
    –   geo_type:            format of the sender’s geographical coordinates
    –   geo_coordinates_0:   first element of the geographical coordinates
    –   geo_coordinates_1:   second element of the geographical coordinates
    –   created_at:          tweet timestamp in human-readable format
    –   time:                tweet timestamp as a numerical Unix timestamp
DEVELOPING TWITTER METRICS

• Additional data points from tweets:
    – original tweets:          tweets which are neither @reply nor retweet
    – retweets:                 tweets which contain RT @user… (or similar)
         • unedited retweets:            retweets which start with RT @user…
         • edited retweets:              retweets do not start with RT @user…
    – genuine @replies:         tweets which contain @user, but are not retweets
    – URL sharing:              tweets which contain URLs


• Potential uses:
    –   metrics per hashtag
    –   metrics per timeframe (day, hour, minute, second, …)
    –   metrics per user (or group of users)
    –   …
                                                          (Bruns & Stieglitz, forthcoming)
#QLDFLOODS @REPLIES
         authorities




                       mainstream
                         media
#ROYALWEDDING
#AUSPOL (FEB.-DEC. 2011)
HASHTAG METRICS
BEYOND HASHTAGS

• Publics on Twitter:
     – Micro:    @reply and retweet conversations
     – Meso:     follower/followee networks
     – Macro:    hashtag ‘communities’              (Bruns & Moe, forthcoming)


 Multiple overlapping publics / networks

•   What drives their formation and dissipation?
•   How do they interact and interweave?
•   How are they interleaved with the wider media ecology?
•   Twitter doesn’t contain publics: publics transcend Twitter
‘BIG DATA’ AND THE DIGITAL HUMANITIES

•    Emerging needs in Twitter research:
      – Unified, compatible methods and metrics for Twitter analysis
            Tools and approaches shared at http://mappingonlinepublics.net/
      – Powerful infrastructure for long-term, high-volume tracking of public
        communication on Twitter
            Data access requires substantial funding stream
      – Facilities for long-term data storage and preservation
            Key roles for National Libraries, National Archives
      – Integration with related datasets (e.g. MSM content)
            Need to address data interoperability questions
      – Robust frameworks for Internet research ethics
            Clear guidelines which take into account complex new public/private structures


•    Twitter as a test case for digital humanities research
      – Widespread, open, public platform for everyday communication
      – Tool for observing society at scale through Internet research
http://mappingonlinepublics.net/
@snurb_dot_info
@jeanburgess
@_StephenH
@DrTNitins
@timhighfield
@cdtavijit

More Related Content

PPTX
One Day in the Life of a National Twittersphere
PPTX
Mapping a National Twittersphere: A 'Big Data' Analysis of Australian Twitter...
PPTX
The Use of Twitter Hashtags in the Formation of Ad Hoc Publics
PPTX
CCI Winter School Social Media Presentation
PPTX
CCI Winter School Workshop on Digital Methods and Social Media Analytics
PPTX
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
PPTX
Exploring the Global Demographics of Twitter
PDF
Eavesdropping on the Twitter Microblogging Site
One Day in the Life of a National Twittersphere
Mapping a National Twittersphere: A 'Big Data' Analysis of Australian Twitter...
The Use of Twitter Hashtags in the Formation of Ad Hoc Publics
CCI Winter School Social Media Presentation
CCI Winter School Workshop on Digital Methods and Social Media Analytics
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
Exploring the Global Demographics of Twitter
Eavesdropping on the Twitter Microblogging Site

What's hot (18)

PDF
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
PDF
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
PPTX
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
PDF
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
PPTX
Dynamics of a Scandal: The Centrelink Robodebt Affair on Twitter
PPTX
Information Contagion through Social Media: Towards a Realistic Model of the ...
PDF
Greek independent media and the antifascist movement
PPTX
From Geographic Location to Network Location: The Potential of Big Social Data
PDF
Sas web 2010 lora-aroyo
PPTX
Layers of Communication: Forms of Talk on Twitter
PPTX
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
PPTX
Easy Data, Hard Data? Twitter Research and the Politics of Data Access
PPTX
Infotainment and the Impact of Connective Action: The Case of #MilkedDry
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PPT
Social Data and Multimedia Analytics for News and Events Applications
PPTX
Social Media in Selected Australian Federal and State Election Campaigns, 201...
PDF
Rogers studyingpoliticalissues mar2014_optimized_ii_
PPTX
Mapping Online Publics: New Methods for Twitter Research
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Dynamics of a Scandal: The Centrelink Robodebt Affair on Twitter
Information Contagion through Social Media: Towards a Realistic Model of the ...
Greek independent media and the antifascist movement
From Geographic Location to Network Location: The Potential of Big Social Data
Sas web 2010 lora-aroyo
Layers of Communication: Forms of Talk on Twitter
Twitter, Public Communication and the Media Ecology: The Case of the Queensla...
Easy Data, Hard Data? Twitter Research and the Politics of Data Access
Infotainment and the Impact of Connective Action: The Case of #MilkedDry
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Social Data and Multimedia Analytics for News and Events Applications
Social Media in Selected Australian Federal and State Election Campaigns, 201...
Rogers studyingpoliticalissues mar2014_optimized_ii_
Mapping Online Publics: New Methods for Twitter Research
Ad

Viewers also liked (11)

PPTX
New Methodologies for Researching News Discussion on Twitter
PPTX
Introduction to Social Media (Week 2)
PPT
Social Media: Understanding Online Communities
PPTX
Introduction to Social Media (Week 3)
PPTX
The Emergence of Trending Topics: The Dissemination of Breaking Stories on T...
PPTX
Media Ecologies and Methodological Innovation: The Case of Twitter
PPTX
Introduction to Social Media (Week 1)
PDF
Weller pleasures+perils social media
PDF
Weller social media as research data_psm15
PPTX
Researching Social Media in Times of Crisis
PDF
Digital Methods Tool Medley
New Methodologies for Researching News Discussion on Twitter
Introduction to Social Media (Week 2)
Social Media: Understanding Online Communities
Introduction to Social Media (Week 3)
The Emergence of Trending Topics: The Dissemination of Breaking Stories on T...
Media Ecologies and Methodological Innovation: The Case of Twitter
Introduction to Social Media (Week 1)
Weller pleasures+perils social media
Weller social media as research data_psm15
Researching Social Media in Times of Crisis
Digital Methods Tool Medley
Ad

Similar to New Methodologies for Capturing and Working with Publicly Available Twitter Data (20)

PPTX
Making Sense of Twitter: New Research Methods in the Digital Humanities
PPTX
Mapping Online Publics on Twitter
PPTX
Mapping Online Publics: Researching the Uses of Twitter
PPTX
Mapping the Australian Twittersphere
PPT
The evolution of research on social media
PPTX
Mapping Online Publics (Part 1)
PPTX
Social Media Analytics Research at the QUT Digital Media Research Centre
PDF
Twitter research overview
PDF
Political Discourses on Twitter: Networking Topics, Objects and People
PPT
Mapping Online Publics: Understanding How Australians Use Social Media
PPTX
Twitter, Big Data, and the Search for Meaning: Methodology in Progress
PPTX
Mapping Australian User-Created Content: Methodological, Technological and E...
PDF
Twitter Usage at Conferences
PPTX
Social Media in Australia: The Case of Twitter
PDF
Twitter And Society Katrin Weller Axel Bruns Jean Burgess Merja Mahrt
KEY
Mapping Online Publics
PPTX
Tracing Publics in the Australian Blogosphere: New Methods for International ...
PPTX
Twitter Pres slides3
PDF
Twitter: A Hands On Learning Session for Researchers
PPT
Twitter Presenting 2010
Making Sense of Twitter: New Research Methods in the Digital Humanities
Mapping Online Publics on Twitter
Mapping Online Publics: Researching the Uses of Twitter
Mapping the Australian Twittersphere
The evolution of research on social media
Mapping Online Publics (Part 1)
Social Media Analytics Research at the QUT Digital Media Research Centre
Twitter research overview
Political Discourses on Twitter: Networking Topics, Objects and People
Mapping Online Publics: Understanding How Australians Use Social Media
Twitter, Big Data, and the Search for Meaning: Methodology in Progress
Mapping Australian User-Created Content: Methodological, Technological and E...
Twitter Usage at Conferences
Social Media in Australia: The Case of Twitter
Twitter And Society Katrin Weller Axel Bruns Jean Burgess Merja Mahrt
Mapping Online Publics
Tracing Publics in the Australian Blogosphere: New Methods for International ...
Twitter Pres slides3
Twitter: A Hands On Learning Session for Researchers
Twitter Presenting 2010

More from Axel Bruns (20)

PPTX
How Discursive Alliances Shift: A Longitudinal Analysis of Australian Climate...
PPTX
‘Just Asking Questions’: Doing Our Own Research on Conspiratorial Ideation by...
PPTX
Detecting the Symptoms of Destructive Polarisation: The Practice Mapping Appr...
PPTX
Shifting Discursive Alliances: A Longitudinal Analysis of Australian Climate ...
PPTX
Beyond Interaction Networks: An Introduction to Practice Mapping
PPTX
Untangling the Furball: A Practice Mapping Approach to the Analysis of Multim...
PPTX
Polarisation In Newssharing: Reviewing the Evidence from Facebook and Twitter
PPTX
Polarisation via Search? Assessing the Political Spectrum of Google News Reco...
PPTX
Representation? Treaty? Polarisation in News and Social Media Debates about I...
PPTX
Facebook without the News: Link-Sharing Patterns during Meta’s Australian and...
PPTX
The Filter in Our (?) Heads: Digital Media and Polarisation
PPTX
The Twitter That Was: Reflections on Ten Years of #auspol
PPTX
Political Debates in Third Spaces? Football Fan Communities and the 2022 FIFA...
PPTX
‘If you don’t know, vote no’: Symptoms of Destructive Polarisation in the 202...
PPTX
Breaking Points – Five Symptoms of Constructive Agonism Turning into Destruct...
PPTX
“What Else Are They Talking About?”: A Large-Scale Longitudinal Analysis of M...
PPTX
Polarised Media Framing of Climate Protests
PPTX
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
PPTX
Dynamics of Destructive Polarisation in Mainstream and Social Media: The Case...
PPTX
Identifying the Symptoms of Destructive Polarisation
How Discursive Alliances Shift: A Longitudinal Analysis of Australian Climate...
‘Just Asking Questions’: Doing Our Own Research on Conspiratorial Ideation by...
Detecting the Symptoms of Destructive Polarisation: The Practice Mapping Appr...
Shifting Discursive Alliances: A Longitudinal Analysis of Australian Climate ...
Beyond Interaction Networks: An Introduction to Practice Mapping
Untangling the Furball: A Practice Mapping Approach to the Analysis of Multim...
Polarisation In Newssharing: Reviewing the Evidence from Facebook and Twitter
Polarisation via Search? Assessing the Political Spectrum of Google News Reco...
Representation? Treaty? Polarisation in News and Social Media Debates about I...
Facebook without the News: Link-Sharing Patterns during Meta’s Australian and...
The Filter in Our (?) Heads: Digital Media and Polarisation
The Twitter That Was: Reflections on Ten Years of #auspol
Political Debates in Third Spaces? Football Fan Communities and the 2022 FIFA...
‘If you don’t know, vote no’: Symptoms of Destructive Polarisation in the 202...
Breaking Points – Five Symptoms of Constructive Agonism Turning into Destruct...
“What Else Are They Talking About?”: A Large-Scale Longitudinal Analysis of M...
Polarised Media Framing of Climate Protests
AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of ...
Dynamics of Destructive Polarisation in Mainstream and Social Media: The Case...
Identifying the Symptoms of Destructive Polarisation

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Sports Quiz easy sports quiz sports quiz
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Basic Mud Logging Guide for educational purpose
PDF
Pre independence Education in Inndia.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Complications of Minimal Access Surgery at WLH
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
RMMM.pdf make it easy to upload and study
PDF
Supply Chain Operations Speaking Notes -ICLT Program
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
Sports Quiz easy sports quiz sports quiz
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
VCE English Exam - Section C Student Revision Booklet
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Basic Mud Logging Guide for educational purpose
Pre independence Education in Inndia.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Complications of Minimal Access Surgery at WLH
O7-L3 Supply Chain Operations - ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Cell Types and Its function , kingdom of life
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
RMMM.pdf make it easy to upload and study
Supply Chain Operations Speaking Notes -ICLT Program

New Methodologies for Capturing and Working with Publicly Available Twitter Data

  • 1. New Methodologies for Capturing and Working with Publicly Available Twitter Data Associate Professor Axel Bruns @snurb_dot_info http://mappingonlinepublics.net/ Queensland University of Technology
  • 2. WHY TWITTER? • Researching Twitter: – Significant world-wide social network – ~500 million accounts (but how many active?) – Varied range of uses: from phatic communication to emergency coordination – Healthy third-party ecosystem (for now) – Strong history of user innovation: @replies, #hashtags – Flat and open network structure: non-reciprocal following, public profiles by default – Good API for gathering (big) data for research
  • 3. NEW MEDIA AND PUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER -CREATED CONTENT IN ONLINE SOCIAL NETWORKS • Australian Research Council (ARC) Discovery Project (2010-13) – $410,000 – QUT (Brisbane), Sociomantic Labs (Berlin) – First comprehensive study of Australian social media use – Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr, YouTube as ‘networked publics’ – Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and communication studies – natively digital methods – Studying society with the Internet (Richard Rogers)  http://mappingonlinepublics.net/
  • 4. A TWITTER RESEARCH TOOLKIT • Data Gathering – yourTwapperkeeper + in-house crawler • Data Processing – Gawk – open source, multiplatform, programmable command-line tool for processing CSV documents • Textual Analysis – Leximancer – commercial, multiplatform: extracts key concepts from large corpora of text, examines and visualises concept co-occurrence – WordStat – commercial, PC-only text analysis tool; generates concept co- occurrence data that can be exported for visualisation • Visualisation – Gephi – open source, multiplatform network visualisation tool
  • 6. APPROACHING TWITTER • Possible research questions: – Hashtags as vehicles for ad hoc events and publics: • How do online publics form and dissolve? How do they interact, what structures do they form? • Where do they draw information from? What do they share? • Do they simply consist of the usual suspects? How insular and disconnected are online publics? – Hashtags in context: • How do different hashtag events compare? Are there common types of hashtags/publics? • How ‘big’ are they? What topics attract attention on Twitter? • What community (?) structures emerge?
  • 7. DEVELOPING TWITTER METRICS • Key data points available through the Twitter API: – text: contents of the tweet itself, in 140 characters or less – to_user_id: numerical ID of the tweet recipient (for @replies) – from_user: screen name of the tweet sender – id: numerical ID of the tweet itself – from_user_id: numerical ID of the tweet sender – iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language – source: client software used to tweet (e.g. Web, Tweetdeck, ...) – profile_image_url: URL of the tweet sender’s profile picture – geo_type: format of the sender’s geographical coordinates – geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates – created_at: tweet timestamp in human-readable format – time: tweet timestamp as a numerical Unix timestamp
  • 8. DEVELOPING TWITTER METRICS • Additional data points from tweets: – original tweets: tweets which are neither @reply nor retweet – retweets: tweets which contain RT @user… (or similar) • unedited retweets: retweets which start with RT @user… • edited retweets: retweets do not start with RT @user… – genuine @replies: tweets which contain @user, but are not retweets – URL sharing: tweets which contain URLs • Potential uses: – metrics per hashtag – metrics per timeframe (day, hour, minute, second, …) – metrics per user (or group of users) – … (Bruns & Stieglitz, forthcoming)
  • 9. #QLDFLOODS @REPLIES authorities mainstream media
  • 13. BEYOND HASHTAGS • Publics on Twitter: – Micro: @reply and retweet conversations – Meso: follower/followee networks – Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming)  Multiple overlapping publics / networks • What drives their formation and dissipation? • How do they interact and interweave? • How are they interleaved with the wider media ecology? • Twitter doesn’t contain publics: publics transcend Twitter
  • 14. ‘BIG DATA’ AND THE DIGITAL HUMANITIES • Emerging needs in Twitter research: – Unified, compatible methods and metrics for Twitter analysis  Tools and approaches shared at http://mappingonlinepublics.net/ – Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter  Data access requires substantial funding stream – Facilities for long-term data storage and preservation  Key roles for National Libraries, National Archives – Integration with related datasets (e.g. MSM content)  Need to address data interoperability questions – Robust frameworks for Internet research ethics  Clear guidelines which take into account complex new public/private structures • Twitter as a test case for digital humanities research – Widespread, open, public platform for everyday communication – Tool for observing society at scale through Internet research