SlideShare a Scribd company logo
Oxford Internet Institute Summer Doctoral Programme
iSchool, University of Toronto
July 2013
Big Data in the Social Sciences
Slides produced with:
Ralph Schroeder, Professor & MSc Programme Director
Dr Eric T. Meyer
Research Fellow & DPhil Programme Director
eric.meyer@oii.ox.ac.uk
http://www.oii.ox.ac.uk/people/?id=120
@etmeyer
Source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
Big data are data that are unprecedented in scale and scope
in relation to a given phenomenon.They are often streams
of data (rather than fixed datasets), accumulating large volumes,
often at high velocity.
Is the tail of the availability of big data and computational
methods wagging the dog of good research questions and
advancing social science?
If not, how do big data advance research?
What are the opportunities and challenges?
Source: http://www.flickr.com/photos/nakedcharlton/597075830/ Source: http://www.flickr.com/photos/jamescridland/613445810/
BusinessValue versus AcademicValue
Strategic Knowledge
• Generally time-limited (with exceptions)
• Value comes from knowing what your competitors don’t
• Often has high monetary value if it can be exploited
Durable Knowledge
• Less time-limited (with exceptions)
• Value comes from adding to the world’s knowledge (the global
brain is cumulative/scientific)
• Rarely has direct monetary value, but has value in terms of
creating the possibility both of future knowledge and of future
exploitation and commercial uses
BusinessValue versus AcademicValue
Marketing  Tailoring
Forecasting  Prediction
ComplexTrends  Linking datasets plus modelling
From Big Data to Big (Hi-res) Picture
Is there a risk that the (big data) tail will
wag the (research) dog?
Image source: Leonard John Matthews, CC-BY-SA
(http://www.flickr.com/photos/mythoto/3033590171)
?
?
?
?
?
?
?
?
?
“Surprisingly, the distribution of
types of search query did not vary
significantly across the different
Lifestyle Groups (p>0.01).”
Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science &
Technology 62(4): 761-775.
Case 1: Search engine behaviour
Waller’s analysis ofAustralian Google Users
Key findings:
Mainly leisure
< 2% contemporary issues
No perceptible ‘class’ differences
Novel advance:
Unprecedented insight into what people search for
Challenge:
Replicability
Securing access to commercial data
J Michel et al. Science 2011;331:176-182
Case 2: Large-scale text analysis
Michel et al. ‘culturomic’ analysis of 5 Million Digitized Google
Books and Heuser & Le-Khac of 2779 19th Century British
Novels
Key findings:
Patterns of key terms
Industrialization tied to shift from abstract to concrete
words
Novel advance:
Replicability, extension to other areas, systematic
analysis of cultural materials
Challenge:
Data quality
Case 3:Twitter-bots
OII master’s students Alexander Furnas and Devin Gaffney saw a large spike in then-US
presidential candidate Mitt Romney’sTwitter followers, and decided to look at the new
followers:
Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just Normal Users: 0%’. The Atlantic, July
31, http://www.theatlantic.com/technology/archive/2012/07/statistical-probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012).
Conclusions
Savage and Burrows?, who ask are commercial data outpacing
social science?
Boyd and Crawford?, who ask if big data raise ethical and
epistemological conundrums?
... Not entirely ...
The connection between research technologies and the
advance of knowledge
The threats and opportunities represented by unprecedented
windows into people’s minds and thoughts
Does this lead to more ‘scientific’ (i.e. cumulative) social
sciences and humanities?
See http://www.oii.ox.ac.uk/research/projects/?id=98
Additional readings and references
Bond, Robert et al. (2012). ‘A 61-million-person experiment in social influence and political
mobilization’, Nature 489: 295–298.
Bruns,A. and Liang,Y.E. (2012). ‘Tools and methods for capturingTwitter data during natural disasters’, First
Monday, 17 (4 – 2), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3937/3193
Furnas, A. and Gaffney, D. (2012). ‘Statistical ProbabilityThat Mitt Romney's NewTwitter Followers Are Just
Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical-
probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August
31, 2012).
Giles, J. (2012). ‘Making the Links: From E-mails to Social Networks, the DigitalTraces left Life in the
ModernWorld areTransforming Social Science’, Nature, 488: 448-50.
Kwak, H. et al. (2010). ‘What isTwitter, a Social Network or a News Media?’ Proceedings of the 19th
InternationalWorldWide Web (WWW) Conference, April 26-30, 2010, Raleigh NC.
Manyika, J. et al. (2011). ‘Big data: the next frontier for innovation, competition and productivity’, McKinsey
Global Institute, available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/
big_data_the_next_frontier_for_innovation (last accessed August 29, 2012).
Silver, Nate. (2012). The Signal and the Noise:The Art and Science of Prediction. London:Allen Lane.
Tancer, B. (2009). Click:What Millions of People are Doing Online andWhy It Matters. NewYork: Harper
Collins, 2009.
Wu, S. , J.M. Hofman,W.A. Mason, and D.J. Watts, (2011). ‘Who says what to whom on twitter’, Proceedings
of the 20th international conference onWorldWideWeb. (on DuncanWatts
webpage, http://research.microsoft.com/en-us/people/duncan/, last accessed August 29, 2012).
With support from:
Dr Eric T. Meyer
Research Fellow & DPhil Programme Director
eric.meyer@oii.ox.ac.uk
http://www.oii.ox.ac.uk/people/?id=120
@etmeyer

More Related Content

PPTX
2013 Oxford Digital Humanities Summer School Workshop
PPTX
Meyer dig ethno_2013sdp
PDF
Studying people who can talk back, Meyer 2013 DH at Oxford summer school
PDF
Weller pleasures+perils social media
PDF
Weller social media as research data_psm15
PDF
Twitter research overview
PDF
Mapping big data science
PDF
Rogers data days_2014_slides_opti
2013 Oxford Digital Humanities Summer School Workshop
Meyer dig ethno_2013sdp
Studying people who can talk back, Meyer 2013 DH at Oxford summer school
Weller pleasures+perils social media
Weller social media as research data_psm15
Twitter research overview
Mapping big data science
Rogers data days_2014_slides_opti

What's hot (18)

PPT
Social Media for Researchers
PDF
Rogers studyingpoliticalissues mar2014_optimized_ii_
PDF
2014 ATHS Summer
PDF
Big data in social sciences and IT developments (ethics considerations)
PDF
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
PPT
e-Research: A Social Informatics Perspective
PPTX
Twitter Data Analytics
PPTX
Crim 4391 homeland security fall15
PPT
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
DOC
Social Media Use by Canadian Academic Librarians
PPTX
Shall we use social media for our research?
PDF
What Does Your Repository Do? Measuring and Calculating Impact
PDF
Digital Methods by Richard Rogers
PPT
Yimei zhu poster for methods fair 2012
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PPTX
Small Worlds Social Graphs Social Media
PPTX
Scholarly Social Machines
PPT
Mike Thelwall: Introduction to Webometrics
Social Media for Researchers
Rogers studyingpoliticalissues mar2014_optimized_ii_
2014 ATHS Summer
Big data in social sciences and IT developments (ethics considerations)
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
e-Research: A Social Informatics Perspective
Twitter Data Analytics
Crim 4391 homeland security fall15
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Social Media Use by Canadian Academic Librarians
Shall we use social media for our research?
What Does Your Repository Do? Measuring and Calculating Impact
Digital Methods by Richard Rogers
Yimei zhu poster for methods fair 2012
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Small Worlds Social Graphs Social Media
Scholarly Social Machines
Mike Thelwall: Introduction to Webometrics
Ad

Viewers also liked (7)

PDF
The Internet is Big Data: How internet research has changed our understandin...
PDF
The Internet, Science, and Transformations of Knowledge
PDF
FIA Budapest - Meyer
PPTX
Reinventing Research? Information Practices in the Humanites Information Prof...
PPTX
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
PPTX
Quantifying the impacts of investment in humanities archives
PPTX
The End(s) of e-Research
The Internet is Big Data: How internet research has changed our understandin...
The Internet, Science, and Transformations of Knowledge
FIA Budapest - Meyer
Reinventing Research? Information Practices in the Humanites Information Prof...
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
Quantifying the impacts of investment in humanities archives
The End(s) of e-Research
Ad

Similar to Meyer Big Data SDP13 (20)

PPTX
Accessing and Using Big Data to Advance Social Science Knowledge
PDF
Challenges in-archiving-twitter
PDF
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
PDF
Research with Social Media Data: Stewardship & Ethical Considerations
PPTX
Ralph schroeder and eric meyer
PDF
Grounded theory meets big data: One way to marry ethnography and digital methods
DOCX
Respond to these two classmates’ posts. 1. After reading thi.docx
PPTX
“Big data” in human services organisations: Practical problems and ethical di...
PDF
Social Media Research Methods
DOCX
Respond to at least two of your classmates’ posts. 1. After .docx
PDF
Who are We Studying: Humans or Bots?
DOCX
After reading this journal article regarding ethics of interne.docx
PPTX
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
PPT
The evolution of research on social media
DOC
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
PPTX
Easy Data, Hard Data? Twitter Research and the Politics of Data Access
PDF
What do we get from Twitter - and what not?
DOCX
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
PPTX
New Media, New Ethics - ICA 2012
PDF
World Wide Research Reshaping the Sciences and Humanities William H. Dutton
Accessing and Using Big Data to Advance Social Science Knowledge
Challenges in-archiving-twitter
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Research with Social Media Data: Stewardship & Ethical Considerations
Ralph schroeder and eric meyer
Grounded theory meets big data: One way to marry ethnography and digital methods
Respond to these two classmates’ posts. 1. After reading thi.docx
“Big data” in human services organisations: Practical problems and ethical di...
Social Media Research Methods
Respond to at least two of your classmates’ posts. 1. After .docx
Who are We Studying: Humans or Bots?
After reading this journal article regarding ethics of interne.docx
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
The evolution of research on social media
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Easy Data, Hard Data? Twitter Research and the Politics of Data Access
What do we get from Twitter - and what not?
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
New Media, New Ethics - ICA 2012
World Wide Research Reshaping the Sciences and Humanities William H. Dutton

More from Eric Meyer (19)

PDF
Digital Research and Big Data: Is the Tail Wagging the Dog?
PPTX
MLA 2013: Social Science Tools for understanding the impact of the digital hu...
PPTX
tidsrdhoxss2012
PPTX
DTC-OII Ethnography Online 2011
PPTX
JISC-WW1
PPTX
TIDSR-DHOx
PPTX
i3 Conference Keynote, Aberdeen
PPTX
Reinventing Research? Information Practices in the Humanites Launch
PDF
Virtual Environments and the Future of Collaboration
PPTX
Scholarship in the Digital Age
PPT
Sharing ideas and sharing data: Researchers and Web 2.0
PPTX
WWWoH
PPT
TIDSR
PPT
ASIS&T ProQuest Dissertation of the Year Award Presentation
PPT
Moving from small science to big science: Social and organizational impedimen...
PPT
e-Research 2.0: Taking the measure of Web 2.0 in e-Research
PPT
Talking 'bout a revolution: Framing e-Research as a computerization movement
PPT
Gauging the Impact of e-Research in the Social Sciences
PDF
e-Social Science as an Experience Technology: Distance From, and Attitudes To...
Digital Research and Big Data: Is the Tail Wagging the Dog?
MLA 2013: Social Science Tools for understanding the impact of the digital hu...
tidsrdhoxss2012
DTC-OII Ethnography Online 2011
JISC-WW1
TIDSR-DHOx
i3 Conference Keynote, Aberdeen
Reinventing Research? Information Practices in the Humanites Launch
Virtual Environments and the Future of Collaboration
Scholarship in the Digital Age
Sharing ideas and sharing data: Researchers and Web 2.0
WWWoH
TIDSR
ASIS&T ProQuest Dissertation of the Year Award Presentation
Moving from small science to big science: Social and organizational impedimen...
e-Research 2.0: Taking the measure of Web 2.0 in e-Research
Talking 'bout a revolution: Framing e-Research as a computerization movement
Gauging the Impact of e-Research in the Social Sciences
e-Social Science as an Experience Technology: Distance From, and Attitudes To...

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Modernizing your data center with Dell and AMD
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Modernizing your data center with Dell and AMD
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf

Meyer Big Data SDP13

  • 1. Oxford Internet Institute Summer Doctoral Programme iSchool, University of Toronto July 2013 Big Data in the Social Sciences Slides produced with: Ralph Schroeder, Professor & MSc Programme Director Dr Eric T. Meyer Research Fellow & DPhil Programme Director eric.meyer@oii.ox.ac.uk http://www.oii.ox.ac.uk/people/?id=120 @etmeyer
  • 2. Source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
  • 3. Big data are data that are unprecedented in scale and scope in relation to a given phenomenon.They are often streams of data (rather than fixed datasets), accumulating large volumes, often at high velocity. Is the tail of the availability of big data and computational methods wagging the dog of good research questions and advancing social science? If not, how do big data advance research? What are the opportunities and challenges?
  • 4. Source: http://www.flickr.com/photos/nakedcharlton/597075830/ Source: http://www.flickr.com/photos/jamescridland/613445810/
  • 5. BusinessValue versus AcademicValue Strategic Knowledge • Generally time-limited (with exceptions) • Value comes from knowing what your competitors don’t • Often has high monetary value if it can be exploited
  • 6. Durable Knowledge • Less time-limited (with exceptions) • Value comes from adding to the world’s knowledge (the global brain is cumulative/scientific) • Rarely has direct monetary value, but has value in terms of creating the possibility both of future knowledge and of future exploitation and commercial uses BusinessValue versus AcademicValue
  • 7. Marketing  Tailoring Forecasting  Prediction ComplexTrends  Linking datasets plus modelling From Big Data to Big (Hi-res) Picture
  • 8. Is there a risk that the (big data) tail will wag the (research) dog? Image source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
  • 9. ? ? ? ? ? ? ? ? ? “Surprisingly, the distribution of types of search query did not vary significantly across the different Lifestyle Groups (p>0.01).” Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science & Technology 62(4): 761-775.
  • 10. Case 1: Search engine behaviour Waller’s analysis ofAustralian Google Users Key findings: Mainly leisure < 2% contemporary issues No perceptible ‘class’ differences Novel advance: Unprecedented insight into what people search for Challenge: Replicability Securing access to commercial data
  • 11. J Michel et al. Science 2011;331:176-182
  • 12. Case 2: Large-scale text analysis Michel et al. ‘culturomic’ analysis of 5 Million Digitized Google Books and Heuser & Le-Khac of 2779 19th Century British Novels Key findings: Patterns of key terms Industrialization tied to shift from abstract to concrete words Novel advance: Replicability, extension to other areas, systematic analysis of cultural materials Challenge: Data quality
  • 13. Case 3:Twitter-bots OII master’s students Alexander Furnas and Devin Gaffney saw a large spike in then-US presidential candidate Mitt Romney’sTwitter followers, and decided to look at the new followers: Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical-probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012).
  • 14. Conclusions Savage and Burrows?, who ask are commercial data outpacing social science? Boyd and Crawford?, who ask if big data raise ethical and epistemological conundrums? ... Not entirely ... The connection between research technologies and the advance of knowledge The threats and opportunities represented by unprecedented windows into people’s minds and thoughts Does this lead to more ‘scientific’ (i.e. cumulative) social sciences and humanities?
  • 16. Additional readings and references Bond, Robert et al. (2012). ‘A 61-million-person experiment in social influence and political mobilization’, Nature 489: 295–298. Bruns,A. and Liang,Y.E. (2012). ‘Tools and methods for capturingTwitter data during natural disasters’, First Monday, 17 (4 – 2), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3937/3193 Furnas, A. and Gaffney, D. (2012). ‘Statistical ProbabilityThat Mitt Romney's NewTwitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical- probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012). Giles, J. (2012). ‘Making the Links: From E-mails to Social Networks, the DigitalTraces left Life in the ModernWorld areTransforming Social Science’, Nature, 488: 448-50. Kwak, H. et al. (2010). ‘What isTwitter, a Social Network or a News Media?’ Proceedings of the 19th InternationalWorldWide Web (WWW) Conference, April 26-30, 2010, Raleigh NC. Manyika, J. et al. (2011). ‘Big data: the next frontier for innovation, competition and productivity’, McKinsey Global Institute, available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/ big_data_the_next_frontier_for_innovation (last accessed August 29, 2012). Silver, Nate. (2012). The Signal and the Noise:The Art and Science of Prediction. London:Allen Lane. Tancer, B. (2009). Click:What Millions of People are Doing Online andWhy It Matters. NewYork: Harper Collins, 2009. Wu, S. , J.M. Hofman,W.A. Mason, and D.J. Watts, (2011). ‘Who says what to whom on twitter’, Proceedings of the 20th international conference onWorldWideWeb. (on DuncanWatts webpage, http://research.microsoft.com/en-us/people/duncan/, last accessed August 29, 2012).
  • 17. With support from: Dr Eric T. Meyer Research Fellow & DPhil Programme Director eric.meyer@oii.ox.ac.uk http://www.oii.ox.ac.uk/people/?id=120 @etmeyer

Editor's Notes

  • #6: Show increasing value graph and something showing store of knowledge
  • #7: Show increasing value graph and something showing store of knowledge
  • #8: Show increasing value graph and something showing store of knowledge
  • #9: Implications for STSPolicy implicationsPractice implicationsIntermediation (algorithms between researchers and their artifacts) and disintermediation (more direct access to traces of people’s behaviour)Researchers getting both more distant from their objects of research and closer to their objects of research, and both have important implicationsEngagement with businessStrategic value versus academic valueShort term business knowledge versus durable knowledge