SlideShare a Scribd company logo
CRAWLING AND SCRAPING




            Noortje Marres (Goldsmiths, University of London)

            Michael Stevenson (Digital Methods Initiative, University of Amsterdam)

            Esther Weltevrede (Digital Methods Initiative, University of Amsterdam)


            Digital Methods Summer School, 28 June 2011




Wednesday, June 29, 2011
CRAWLING AND SCRAPING




            Techniques for online data capture and analysis:

                  • Issuecrawler
                  • Lippmanian Device

            Implications for research methods:

                  • dynamic data sets
                  • formatted data

            Real-time research?


Wednesday, June 29, 2011
MAPPING NETWORKS WITH ISSUE CRAWLER




                       A Web-based tool for the location and visualization of
                                 hyperlink networks on the Web




Wednesday, June 29, 2011
Locating issue networks on the Web

            How to demarcate networks that have configured around
            specific affairs on the Web?

            To do this, Issue Crawler relies on:

                  • well-chosen starting points or Web pages that disclose activity
                  around a particular issue on the Web by way of hyperlinks

                  • the ‘intelligence’ of aggregated, live hyperlinking



Wednesday, June 29, 2011
Extractive Industries Review network, 2004


Wednesday, June 29, 2011
More about hyperlink analysis


            Issue Crawler performs iterations of co-link analysis

                  • the critique of ‘absolute’ citation measures (as in: pagerank)

            Compare this with co-citation analysis in the social studies of
            science (Callon et al., 1983)

                  • topical relevance vs overall popularity




Wednesday, June 29, 2011
Issue Crawler as a tool of online social research (1/2)

            To perform immanent critique of the supposed ‘egalitarianism’
            of the Internet:

                  to highlight specific asymmetries of relevance and/or authority among
                  organizations’ Web pages


                  To deploy hyperlink analysis for purposes of issue analysis

                  in the politics of issues, “experts and activists define issues by sharing
                  information about them” (Heclo, 1974)




Wednesday, June 29, 2011
fëëìÉë=áå=íÜÉ=cÉêÖ~å~=s~ääÉóI=
  ròÄÉâáëí~åI=~ÅÅçêÇáåÖ=íç=íÜÉ=tÉÄK
  c~ää=OMMN




  fëëìÉë=~êÉ=çå=íÜÉ=tÉÄI=Äìí=ïÜáÅÜ=áëëìÉë=
  qÜÉ=ÅÜ~ê~ÅíÉêáë~íáçå=çÑ=íÜÉ=cÉêÖ~å~=s~ääÉó=áëëìÉë=
  ÇÉéÉåÇë=çå=íÜÉ=ëáíÉë=~ÅÅÉëëÉÇK




                                                       jìãíçòÄÉÖáã                                                kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ=
                                                       _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ                       çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ=
                                                       dáêäDë=píìÇáçI=hçâ~åÇ                                      çå=íÜÉ=tÉÄK
                                                       kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ
                                                       j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ
                                                       j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ
                                                       fÑíáòçê=léÉå=vçìíÜ=`äìÄ
                                                       pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG
                                                       ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ
                                                       `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~
                                                        g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå
                                                        fÑíáòçê=`ÉåíêÉI=hçâ~åÇ
                                                        j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å
                                                                               ä
Wednesday, June 29, 2011                                oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
jìãíçòÄÉÖáã                                                 kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ=
                           _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ                        çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ=
                           dáêäDë=píìÇáçI=hçâ~åÇ                                       çå=íÜÉ=tÉÄK
                           kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ
                           j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ
                           j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ
                           fÑíáòçê=léÉå=vçìíÜ=`äìÄ
                           pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG
                           ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ
                           `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~
                            g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå
                            fÑíáòçê=`ÉåíêÉI=hçâ~åÇ
                            j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å
                                                   ä
                            oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
                            jÉÜêJp~Üçî~í=`Ü~êáí~ÄäÉ=`ÉåíêÉ
                            b`lp^k=fåíÉêå~íáçå~ä=cçìåÇ~íáçåI=hçâ~åÇG
                            jìëë~Ñç=bÅçäçÖáÅ~ä=`ÉåíêÉI=hçâ~åÇ                          Gdlkdl=EÖçîÉêåãÉåíJçêÖ~åáëÉÇ=kdlF




                                                                                      kç=fåíÉêåÉíI=åç=áëëìÉë=Ñêçã=íÜÉ=ÖêçìåÇ=
                                                                                      kdlÛë=áå=íÜÉ=cÉêÖ~å~=s~ääÉó=ã~ó=åçí=Ü~îÉ=tÉÄ=ëáíÉëI=
                                                                                      Äìí=íÜÉáê=áëëìÉë=~êÉ=çå=íÜÉ=tÉÄK




Wednesday, June 29, 2011
Issue Crawler




            How to use it




Wednesday, June 29, 2011
Issue Crawler




            http://issuecrawler.net

                  Request account and log in




Wednesday, June 29, 2011
Issue Crawler lobby



            News
              workshops, software

            Queue
              time sharing

            Current
              three simultaneous crawlers



Wednesday, June 29, 2011
Issue Crawler harvester




            Enter text, URLs will be stripped out




Wednesday, June 29, 2011
Crawling and analysis

            Crawling
              to a certain depth

            Analysis
              snowball
              inter-actor
              co-link

            Iterate (optional)




Wednesday, June 29, 2011
Issue Crawler as a tool of online social research (2/2)


            More generally, to adopt an empirical approach to the study of
            public controversies:

                  • is there a network? (is there an issue?)
                  • who are the actors?
                  • how are they related?
                  • what are the issues?
                  • where are they happening?




Wednesday, June 29, 2011
Co-link settings


            1 iteration ~ social or event network

            2 iterations ~ issue network

            3 iterations ~ establishment network


            See http://www.govcom.org/scenarios_use.htm




Wednesday, June 29, 2011
THE LIPPMANNIAN DEVICE*

            Scraping and other digital methods skills




            * a.k.a. The Google Scraper




Wednesday, June 29, 2011
WHEN SEARCH BECOMES RESEARCH




                  Turning Google into a research tool




Wednesday, June 29, 2011
WALTER LIPPMANN (1889-1974)

            The Phantom Public, 1927




            "The problem is to locate by clear
            and coarse objective tests the
            actor in a controversy who is most
            worthy of public support" (p.120)




Wednesday, June 29, 2011
LIPPMANNIAN DEVICE - MODES OF ANALYSIS
            Showing the partisanship of an actor.
            Showing the issue agenda of an organization.



            Issue Cloud Issue agenda.Which                 Source cloud Partisanship or
            issues are on the agenda of an                 commitment. Which sources
            organization or movement?                      mention the issue?




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            An organization’s issue agenda (or commitment)




            Greenpeace has issues.
            Which are they most committed to?




Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            Greenpeace issues, http://www.greenpeace.org/international/campaigns.

            Stop climate change
            Protect ancient forests
            Defending our Oceans
            Say no to genetic engineering
            Eliminate toxic chemicals
            Demand Peace and Disarmament
            End the nuclear age
            Encourage sustainable trade

            Keep most significant issue language:
            "climate change"
            "ancient forests"
            “oceans”
            "genetic engineering"
            "toxic chemicals"
            “disarmament”
            "nuclear power"
            "sustainable trade"                             ---> Query Design workshop
Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            Greenpeace’s issue agenda (distribution of commitment)




            Greenpeace's issue commitment. Greenpeace's campaign issue
            list, ranked according to number of mentions of issues on
            greenpeace.org, 11 October 2009.
Wednesday, June 29, 2011
EXAMPLE: SOURCE CLOUD

            Method for showing the partisanship or commitment of sources to names




            Method
            1. Gather source list (e.g. through Issuecrawler or top google
            results)
            2. Query source list for one or more experts




            Digital Methods Initiative, 2007




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Query design: What are the sources?




            Climate Change Skeptics: Who recognizes them?

            1. Top 100 results for the query “climate change”

            http://www.google.com/search?q=%22climate+change
            %22&num=100




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Query design: What are the issues?




            Derive list of climate change skeptics
               Sources: motherjones.com, wikipedia.org, heartland.org

                     Compare the three lists and retain the skeptics that are
                     mentioned in at least two of the lists




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Skeptics




            S. Fred Singer
            Robert Balling
            Sallie Baliunas
            Patrick Michaels
            Richard Lindzen
            Steven Milloy
            Timothy Ball
            Paul Driessen
            Willie Soon
            Sherwood B. Idso
            Frederick Seitz



Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
GOOGLE BLOCKING




            Check query design before launching a scrape

            Number of sources x number of issues = number of request to
            Google




Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
----> data visualization: clouding workshop

Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Frederick Seitz)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



              Body Text
                             epa.gov (0)     bbc.co.uk (0)         defra.gov.uk (0)      unep.org (0)        bom.gov.au (0)            ipcc.ch (0)         pewclimate.org (0)
                             davidsuzuki.org (0)       panda.org (0)     mfe.govt.nz (0)      ec.gc.ca (0)      exploratorium.edu (0)    climatechange.com.au (0)
                             greenpeace.org (0)       climatechallenge.gov.uk (0)       guardian.co.uk (0)       iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
                             foe.co.uk (0)    state.gov (0)        scidev.net (0)       eea.europa.eu (0)              whoi.edu (0)           cbc.ca (0)       energy.gov (0)
                             marshall.org (8)                     climateark.org (4)               un.org (0)           dar.csiro.au (0)         theglobeandmail.com (0)
                             acfonline.org.au (0)       gcrio.org (0)   nature.com (0)       grida.no (0)      nature.org (0)         ecokids.ca (0)       royalsoc.ac.uk (0)
                             climatechangecentral.com (0)                 iea.org (0)           ecn.ac.uk (0)                ecy.wa.gov (0)            worldwildlife.org (0)


                            realclimate.org (35)
                             metoffice.gov.uk (0)      open2.net (0)    scienceagogo.com (0)       eldis.org (0)  ft.com (0) who.int (0) climatecrisis.net (0)
                                                                                                                                                                  faqs.org (0)




                             ltscotland.org.uk (0)             abc.net.au (0)            climatechange.ca.gov (0)         envirolink.org (0)   mofa.go.jp (0)


                    sourcewatch.org (21)
              Body text
                                                                                                              iucn.org (0)         dfat.gov.au (0)         ncdc.noaa.gov (0)

                             climatescience.gov (0)            climatechangecollege.org (0)             ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                               Product_of the Digital Methods Initiative,
Query_“Frederick Seitz”                                                                                                         dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order.                                                       Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                    Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                CLIMATE CHANGE
                                                                                                                                                                                    SCEPTICS

                                                                                                                                CC_BY:NC:SA




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Steven Milloy)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



             Body Text
                             epa.gov (1)        bbc.co.uk (0)      defra.gov.uk (1)      unep.org (1)           bom.gov.au (0)       ipcc.ch (1)          pewclimate.org (1)
                             davidsuzuki.org (0)       panda.org (0)       mfe.govt.nz (0)       ec.gc.ca (0)     exploratorium.edu (0)            climatechange.com.au (0)
                             greenpeace.org (1)                  climatechallenge.gov.uk (1)               guardian.co.uk (0)                  iisd.org (0)       g8.gov.uk (0)
                             campaigncc.org (0)          foe.co.uk (0)     state.gov (1)       eea.europa.eu (1)          whoi.edu (1)           cbc.ca (0)    energy.gov (1)
                             marshall.org (0)      climateark.org (2)           un.org (0)     dar.csiro.au (1)        theglobeandmail.com (0)             acfonline.org.au (0)
                             gcrio.org (0)      nature.com (0)         grida.no (0)      nature.org (1)             ecokids.ca (0)             climatechangecentral.com (0)
                             iea.org (0)     ecn.ac.uk (1)            ecy.wa.gov (1)           worldwildlife.org (0)


                            realclimate.org (33)
                             open2.net (0)       eldis.org (0)     ft.com (0)   who.int (1)       climatecrisis.net (1)
                                                                                                                                       faqs.org (0) metoffice.gov.uk (1)

                                                                                                                                 ltscotland.org.uk (1)           abc.net.au (0)
                             climatechange.ca.gov (1)              envirolink.org (1)        mofa.go.jp (1)


              Body text      sourcewatch.org (27)                                                                                       iucn.org (0)            dfat.gov.au (0)

                             ncdc.noaa.gov (1)            climatescience.gov (0)              climatechangecollege.org (1)                      ciel.org (0)       ucar.edu (0)




Source_google.com                                                                                                                Product_of the Digital Methods Initiative,
Query_“Stephen Milloy”                                                                                                           dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Stephen Milloy” in top 100. Organized in order.                                                         Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                     Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                 CLIMATE CHANGE
                                                                                                                                                                                     SCEPTICS

                                                                                                                                 CC_BY:NC:SA




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (S. Fred Singer)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



             Body Textepa.gov (0)                 bbc.co.uk (0)    defra.gov.uk (0)    unep.org (0)     bom.gov.au (0)     ipcc.ch (0)     pewclimate.org (0)
                               davidsuzuki.org (0)        panda.org (0)   mfe.govt.nz (0)   ec.gc.ca (0) exploratorium.edu (0)     climatechange.com.au (0)
                               greenpeace.org (1)         climatechallenge.gov.uk (0)         guardian.co.uk (0)      iisd.org (0)     g8.gov.uk (0)    campaigncc.org (1)
                               foe.co.uk (0)        state.gov (0)       scidev.net (0)        eea.europa.eu (0)             whoi.edu (0)         cbc.ca (0)       energy.gov (0)
                               marshall.org (0)       climateark.org (1)         un.org (0)      dar.csiro.au (0)      theglobeandmail.com (0)            acfonline.org.au (0)
                               gcrio.org (0)           nature.com (0)             grida.no (0)             nature.org (0)            ecokids.ca (0)           royalsoc.ac.uk (0)
                               climatechangecentral.com (0)                  iea.org (0)            ecn.ac.uk (0)               ecy.wa.gov (0)           worldwildlife.org (0)

                               realclimate.org (14)                                        faqs.org (0)    metoffice.gov.uk (0)       open2.net (0)    scienceagogo.com (0)

                               eldis.org (0)   ft.com (0) who.int (0) climatecrisis.net (0)             ltscotland.org.uk (0)     abc.net.au (0) climatechange.ca.gov (0)




                    sourcewatch.org (64)
                               envirolink.org (0)       mofa.go.jp (0)




              Body text        iucn.org (0)               dfat.gov.au (0)                ncdc.noaa.gov (0)          climatescience.gov (11)
                               climatechangecollege.org (0)          ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                                 Product_of the Digital Methods Initiative,
Query_“Fred Singer”                                                                                                               dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Fred Singer” in top 100. Organized in order.                                                             Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                      Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                  CLIMATE CHANGE
                                                                                                                                                                                      SCEPTICS

                                                                                                                                  CC_BY:NC:SA




Wednesday, June 29, 2011
LIPPMANNIAN DEVICE

            Modes of analysis




            Issue agenda check. What are the current commitments of an
            organization(s)?
            Use the issue cloud

            Partisanship check. Which side is an actor on?
            Use the source cloud




Wednesday, June 29, 2011
Tools and references




            http://tools.digitalmethods.net
            http://digitalmethods.net
            http://govcom.org




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Frederick Seitz)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.




                             epa.gov (0)     bbc.co.uk (0)         defra.gov.uk (0)      unep.org (0)        bom.gov.au (0)            ipcc.ch (0)         pewclimate.org (0)
                             davidsuzuki.org (0)       panda.org (0)     mfe.govt.nz (0)      ec.gc.ca (0)      exploratorium.edu (0)    climatechange.com.au (0)
                             greenpeace.org (0)       climatechallenge.gov.uk (0)       guardian.co.uk (0)       iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
                             foe.co.uk (0)    state.gov (0)        scidev.net (0)       eea.europa.eu (0)              whoi.edu (0)           cbc.ca (0)       energy.gov (0)
              Body Text
                     marshall.org (8)                             climateark.org (4)               un.org (0)           dar.csiro.au (0)         theglobeandmail.com (0)
                             acfonline.org.au (0)       gcrio.org (0)   nature.com (0)       grida.no (0)      nature.org (0)         ecokids.ca (0)       royalsoc.ac.uk (0)
                             climatechangecentral.com (0)                 iea.org (0)           ecn.ac.uk (0)                ecy.wa.gov (0)            worldwildlife.org (0)


                            realclimate.org (35)
                             metoffice.gov.uk (0)      open2.net (0)    scienceagogo.com (0)       eldis.org (0)  ft.com (0) who.int (0) climatecrisis.net (0)
                                                                                                                                                                  faqs.org (0)




                             ltscotland.org.uk (0)             abc.net.au (0)            climatechange.ca.gov (0)         envirolink.org (0)   mofa.go.jp (0)


                    sourcewatch.org (21)
              Body text
                                                                                                              iucn.org (0)         dfat.gov.au (0)         ncdc.noaa.gov (0)

                             climatescience.gov (0)            climatechangecollege.org (0)             ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                               Product_of the Digital Methods Initiative,
Query_“Frederick Seitz”                                                                                                         dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order.                                                       Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                    Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                CLIMATE CHANGE
                                                                                                                                                                                    SCEPTICS

                                                                                                                                CC_BY:NC:SA
E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS



Research Question:
Which climate change issue actors mention the skeptics, and
what kinds of actors are more likely to mention them?

Method:
Comparative Query skeptics in two source sets (‘top’ sources
and climate change blogs), outputting source cloud.
SOURCE SETS

(1) Top ten Google returns for “climate change” (mix of media
as well as governmental organizations)
SOURCE SETS

(2) Climate change blogs network (IssueCrawler results - mix of
‘establishment’ blogs, media and governmental and non-
governmental organizations)
E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS

Steps:
- Acquire source sets and skeptics list from Michael.
- Launch the Lippmannian device (aka Google Scraper - see
tools.digitalmethods.net).
- Enter source sets and skeptics names. Query the source sets
separately, and remember to use “” to get exact returns.
- Wait. Use this moment to discuss hypotheses.
- Explore the output, and present findings.

More Related Content

PDF
DMI Workshop: Data visualization. Analytical clouding.
PPT
Unix fundamentals
DOCX
PDF
The Data Journalism Handbook (poster)
PDF
Data Infrastructure Literacy: Reshaping Practices of Measurement, Monitoring ...
PDF
Solved problem binomial probability distribution
PDF
Siete herramientas basicas y siete nuevas herrramientas de administracion de ...
PPTX
California love tupac
DMI Workshop: Data visualization. Analytical clouding.
Unix fundamentals
The Data Journalism Handbook (poster)
Data Infrastructure Literacy: Reshaping Practices of Measurement, Monitoring ...
Solved problem binomial probability distribution
Siete herramientas basicas y siete nuevas herrramientas de administracion de ...
California love tupac

Similar to DMI Workshop: Crawling and Scraping (9)

PDF
創造のテーブル2021 - トークセッション・スライド(井庭崇)
DOCX
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
PDF
Bukvar 1-klas-prischepa-rus
PDF
1k bukv-prishepa-kolis-05
PDF
Букварь 1 класс Прищепа
PDF
Bukvar 1 prischepa_rus
PDF
论文范本
PPTX
Fundamentals of Program Impact Evaluation
PDF
NDF Guide
創造のテーブル2021 - トークセッション・スライド(井庭崇)
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
Bukvar 1-klas-prischepa-rus
1k bukv-prishepa-kolis-05
Букварь 1 класс Прищепа
Bukvar 1 prischepa_rus
论文范本
Fundamentals of Program Impact Evaluation
NDF Guide
Ad

More from Digital Methods Initiative (20)

PDF
Query Design for Digital Methods by Richard Rogers
PDF
Digital Methods by Richard Rogers
PPTX
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
PDF
Digital Methods Tool Medley
PDF
Digital Methods Summer School 2015 Tool Medley
PDF
Rogers data days_2014_slides_opti
PDF
Digital Methods Summer School 2014 Tool Medley
PDF
Rogers studyingpoliticalissues mar2014_optimized_ii_
PDF
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
PDF
The Birth of Social Media Methods
PPTX
Interactive visualization and exploration of network data with Gephi
PDF
National Tracking Ecologies - Digital Methods Summer School 2013
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PDF
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
PDF
Repurposing Wikipedia: Wikipedia as data set and analytical device
PDF
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
PDF
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
PDF
Digital Methods Summer School 2013 Tool Medley
PDF
Hashtag lifelines
KEY
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Query Design for Digital Methods by Richard Rogers
Digital Methods by Richard Rogers
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Digital Methods Tool Medley
Digital Methods Summer School 2015 Tool Medley
Rogers data days_2014_slides_opti
Digital Methods Summer School 2014 Tool Medley
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
The Birth of Social Media Methods
Interactive visualization and exploration of network data with Gephi
National Tracking Ecologies - Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Repurposing Wikipedia: Wikipedia as data set and analytical device
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Digital Methods Summer School 2013 Tool Medley
Hashtag lifelines
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Ad

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Business Ethics Teaching Materials for college
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Cell Types and Its function , kingdom of life
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharmacology of Heart Failure /Pharmacotherapy of CHF
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Classroom Observation Tools for Teachers
Abdominal Access Techniques with Prof. Dr. R K Mishra
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Business Ethics Teaching Materials for college
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O7-L3 Supply Chain Operations - ICLT Program
Cell Types and Its function , kingdom of life
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Anesthesia in Laparoscopic Surgery in India
Week 4 Term 3 Study Techniques revisited.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Renaissance Architecture: A Journey from Faith to Humanism
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPH.pptx obstetrics and gynecology in nursing
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Supply Chain Operations Speaking Notes -ICLT Program
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

DMI Workshop: Crawling and Scraping

  • 1. CRAWLING AND SCRAPING Noortje Marres (Goldsmiths, University of London) Michael Stevenson (Digital Methods Initiative, University of Amsterdam) Esther Weltevrede (Digital Methods Initiative, University of Amsterdam) Digital Methods Summer School, 28 June 2011 Wednesday, June 29, 2011
  • 2. CRAWLING AND SCRAPING Techniques for online data capture and analysis: • Issuecrawler • Lippmanian Device Implications for research methods: • dynamic data sets • formatted data Real-time research? Wednesday, June 29, 2011
  • 3. MAPPING NETWORKS WITH ISSUE CRAWLER A Web-based tool for the location and visualization of hyperlink networks on the Web Wednesday, June 29, 2011
  • 4. Locating issue networks on the Web How to demarcate networks that have configured around specific affairs on the Web? To do this, Issue Crawler relies on: • well-chosen starting points or Web pages that disclose activity around a particular issue on the Web by way of hyperlinks • the ‘intelligence’ of aggregated, live hyperlinking Wednesday, June 29, 2011
  • 5. Extractive Industries Review network, 2004 Wednesday, June 29, 2011
  • 6. More about hyperlink analysis Issue Crawler performs iterations of co-link analysis • the critique of ‘absolute’ citation measures (as in: pagerank) Compare this with co-citation analysis in the social studies of science (Callon et al., 1983) • topical relevance vs overall popularity Wednesday, June 29, 2011
  • 7. Issue Crawler as a tool of online social research (1/2) To perform immanent critique of the supposed ‘egalitarianism’ of the Internet: to highlight specific asymmetries of relevance and/or authority among organizations’ Web pages To deploy hyperlink analysis for purposes of issue analysis in the politics of issues, “experts and activists define issues by sharing information about them” (Heclo, 1974) Wednesday, June 29, 2011
  • 8. fëëìÉë=áå=íÜÉ=cÉêÖ~å~=s~ääÉóI= ròÄÉâáëí~åI=~ÅÅçêÇáåÖ=íç=íÜÉ=tÉÄK c~ää=OMMN fëëìÉë=~êÉ=çå=íÜÉ=tÉÄI=Äìí=ïÜáÅÜ=áëëìÉë= qÜÉ=ÅÜ~ê~ÅíÉêáë~íáçå=çÑ=íÜÉ=cÉêÖ~å~=s~ääÉó=áëëìÉë= ÇÉéÉåÇë=çå=íÜÉ=ëáíÉë=~ÅÅÉëëÉÇK jìãíçòÄÉÖáã kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ= _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ= dáêäDë=píìÇáçI=hçâ~åÇ çå=íÜÉ=tÉÄK kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ fÑíáòçê=léÉå=vçìíÜ=`äìÄ pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~ g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå fÑíáòçê=`ÉåíêÉI=hçâ~åÇ j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å ä Wednesday, June 29, 2011 oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
  • 9. jìãíçòÄÉÖáã kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ= _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ= dáêäDë=píìÇáçI=hçâ~åÇ çå=íÜÉ=tÉÄK kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ fÑíáòçê=léÉå=vçìíÜ=`äìÄ pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~ g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå fÑíáòçê=`ÉåíêÉI=hçâ~åÇ j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å ä oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG= jÉÜêJp~Üçî~í=`Ü~êáí~ÄäÉ=`ÉåíêÉ b`lp^k=fåíÉêå~íáçå~ä=cçìåÇ~íáçåI=hçâ~åÇG jìëë~Ñç=bÅçäçÖáÅ~ä=`ÉåíêÉI=hçâ~åÇ Gdlkdl=EÖçîÉêåãÉåíJçêÖ~åáëÉÇ=kdlF kç=fåíÉêåÉíI=åç=áëëìÉë=Ñêçã=íÜÉ=ÖêçìåÇ= kdlÛë=áå=íÜÉ=cÉêÖ~å~=s~ääÉó=ã~ó=åçí=Ü~îÉ=tÉÄ=ëáíÉëI= Äìí=íÜÉáê=áëëìÉë=~êÉ=çå=íÜÉ=tÉÄK Wednesday, June 29, 2011
  • 10. Issue Crawler How to use it Wednesday, June 29, 2011
  • 11. Issue Crawler http://issuecrawler.net Request account and log in Wednesday, June 29, 2011
  • 12. Issue Crawler lobby News workshops, software Queue time sharing Current three simultaneous crawlers Wednesday, June 29, 2011
  • 13. Issue Crawler harvester Enter text, URLs will be stripped out Wednesday, June 29, 2011
  • 14. Crawling and analysis Crawling to a certain depth Analysis snowball inter-actor co-link Iterate (optional) Wednesday, June 29, 2011
  • 15. Issue Crawler as a tool of online social research (2/2) More generally, to adopt an empirical approach to the study of public controversies: • is there a network? (is there an issue?) • who are the actors? • how are they related? • what are the issues? • where are they happening? Wednesday, June 29, 2011
  • 16. Co-link settings 1 iteration ~ social or event network 2 iterations ~ issue network 3 iterations ~ establishment network See http://www.govcom.org/scenarios_use.htm Wednesday, June 29, 2011
  • 17. THE LIPPMANNIAN DEVICE* Scraping and other digital methods skills * a.k.a. The Google Scraper Wednesday, June 29, 2011
  • 18. WHEN SEARCH BECOMES RESEARCH Turning Google into a research tool Wednesday, June 29, 2011
  • 19. WALTER LIPPMANN (1889-1974) The Phantom Public, 1927 "The problem is to locate by clear and coarse objective tests the actor in a controversy who is most worthy of public support" (p.120) Wednesday, June 29, 2011
  • 20. LIPPMANNIAN DEVICE - MODES OF ANALYSIS Showing the partisanship of an actor. Showing the issue agenda of an organization. Issue Cloud Issue agenda.Which Source cloud Partisanship or issues are on the agenda of an commitment. Which sources organization or movement? mention the issue? Wednesday, June 29, 2011
  • 21. ISSUE CLOUD: GREENPEACE ISSUES An organization’s issue agenda (or commitment) Greenpeace has issues. Which are they most committed to? Wednesday, June 29, 2011
  • 22. Body Text Body text Wednesday, June 29, 2011
  • 23. ISSUE CLOUD: GREENPEACE ISSUES Greenpeace issues, http://www.greenpeace.org/international/campaigns. Stop climate change Protect ancient forests Defending our Oceans Say no to genetic engineering Eliminate toxic chemicals Demand Peace and Disarmament End the nuclear age Encourage sustainable trade Keep most significant issue language: "climate change" "ancient forests" “oceans” "genetic engineering" "toxic chemicals" “disarmament” "nuclear power" "sustainable trade" ---> Query Design workshop Wednesday, June 29, 2011
  • 24. Body Text Body text Wednesday, June 29, 2011
  • 25. ISSUE CLOUD: GREENPEACE ISSUES Greenpeace’s issue agenda (distribution of commitment) Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009. Wednesday, June 29, 2011
  • 26. EXAMPLE: SOURCE CLOUD Method for showing the partisanship or commitment of sources to names Method 1. Gather source list (e.g. through Issuecrawler or top google results) 2. Query source list for one or more experts Digital Methods Initiative, 2007 Wednesday, June 29, 2011
  • 27. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Query design: What are the sources? Climate Change Skeptics: Who recognizes them? 1. Top 100 results for the query “climate change” http://www.google.com/search?q=%22climate+change %22&num=100 Wednesday, June 29, 2011
  • 28. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Query design: What are the issues? Derive list of climate change skeptics Sources: motherjones.com, wikipedia.org, heartland.org Compare the three lists and retain the skeptics that are mentioned in at least two of the lists Wednesday, June 29, 2011
  • 29. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Skeptics S. Fred Singer Robert Balling Sallie Baliunas Patrick Michaels Richard Lindzen Steven Milloy Timothy Ball Paul Driessen Willie Soon Sherwood B. Idso Frederick Seitz Wednesday, June 29, 2011
  • 30. Body Text Body text Wednesday, June 29, 2011
  • 31. GOOGLE BLOCKING Check query design before launching a scrape Number of sources x number of issues = number of request to Google Wednesday, June 29, 2011
  • 32. Body Text Body text Wednesday, June 29, 2011
  • 33. ----> data visualization: clouding workshop Wednesday, June 29, 2011
  • 34. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Text epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 35. Climate Change Sceptics on the Web (Steven Milloy) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Text epa.gov (1) bbc.co.uk (0) defra.gov.uk (1) unep.org (1) bom.gov.au (0) ipcc.ch (1) pewclimate.org (1) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (1) climatechallenge.gov.uk (1) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (0) foe.co.uk (0) state.gov (1) eea.europa.eu (1) whoi.edu (1) cbc.ca (0) energy.gov (1) marshall.org (0) climateark.org (2) un.org (0) dar.csiro.au (1) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (1) ecokids.ca (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (1) ecy.wa.gov (1) worldwildlife.org (0) realclimate.org (33) open2.net (0) eldis.org (0) ft.com (0) who.int (1) climatecrisis.net (1) faqs.org (0) metoffice.gov.uk (1) ltscotland.org.uk (1) abc.net.au (0) climatechange.ca.gov (1) envirolink.org (1) mofa.go.jp (1) Body text sourcewatch.org (27) iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (1) climatescience.gov (0) climatechangecollege.org (1) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Stephen Milloy” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Stephen Milloy” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 36. Climate Change Sceptics on the Web (S. Fred Singer) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Textepa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (1) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) marshall.org (0) climateark.org (1) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (14) faqs.org (0) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) sourcewatch.org (64) envirolink.org (0) mofa.go.jp (0) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (11) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Fred Singer” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Fred Singer” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 37. LIPPMANNIAN DEVICE Modes of analysis Issue agenda check. What are the current commitments of an organization(s)? Use the issue cloud Partisanship check. Which side is an actor on? Use the source cloud Wednesday, June 29, 2011
  • 38. Tools and references http://tools.digitalmethods.net http://digitalmethods.net http://govcom.org Wednesday, June 29, 2011
  • 39. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) Body Text marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA
  • 40. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E SKEPTICS Research Question: Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them? Method: Comparative Query skeptics in two source sets (‘top’ sources and climate change blogs), outputting source cloud.
  • 41. SOURCE SETS (1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)
  • 42. SOURCE SETS (2) Climate change blogs network (IssueCrawler results - mix of ‘establishment’ blogs, media and governmental and non- governmental organizations)
  • 43. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E SKEPTICS Steps: - Acquire source sets and skeptics list from Michael. - Launch the Lippmannian device (aka Google Scraper - see tools.digitalmethods.net). - Enter source sets and skeptics names. Query the source sets separately, and remember to use “” to get exact returns. - Wait. Use this moment to discuss hypotheses. - Explore the output, and present findings.