SlideShare a Scribd company logo
DMI’S WIKIPEDIA TOOLS




Erik Borra

Digital Methods Initiative
University of Amsterdam

25 March 2009
Digital Methods Initiative


How can the internet be made to show what is happening in
society?

How to collect and analyze data and distill trends from the Web?

Follow the medium as opposed to importing standard methods
from social sciences.
tools @ dmi wiki




http://wiki.digitalmethods.net/Dmi/ToolDatabase?
cat=DeviceCentric&subcat=Wikipedia
wikipedia bot edits


S. Niederer and J. Van Dijck (2010). “The case of Wikipedia:
Wisdom of the crowd or technicity of content?” New Media and
Society

Short version @ http://wiki.digitalmethods.net/Dmi/
NetworkedContent
wikipedia bot edits scraper


How?
• Enter the link to an article
• Scraper retrieves all edit logs for an article
• Filters out all mentions of ‘bot’ and ‘using’
• Returns permalink, date, time, user, permalink, comment
Why?
to find out dependency of article upkeep by bots
two examples




http://wiki.digitalmethods.net/Dmi/DebottingWikipedia

Dependence of climate change articles on bots

Anti-vandalism bot activity within a disputed article
wikipedia edits scraper and ip localizer

How?
• Enter the link to an article
• Scraper retrieves all edit logs for an article
• When an IP is encountered instead of a username, MaxMinds
  IP-to-GEO database will be queried for geo information
• Returns permalink, date, time, user (or IP), permalink,
  comment, (city, country, lat, lon)

Why? Edit-history analysis, scandal research, places of edits.
ip to geo cases



Scandal research
  WikiScanner (http://wikiscanner.virgil.gr)

Places of edits
   http://mastersofmedia.hum.uva.nl/2007/10/07/
   repurposing-the-wikiscanner-comparing-dutch-universities-
   edits-on-wikipedia/
wikipedia network analysis

How?
• Enter the link to an article
• Scraper retrieves all bidirectional links to the article, from
  within Wikipedia
• Scraper parses those articles and retrieves all their links
  • (reiterate previous step until certain depth)
• List links in table (link from -> to)
• Visualize
Why? Article network ecology.
Body Text




Body text
wip: controversy generator




Wikipedia can be seen as a controversy-defusing device as it
strives to NPOV but well-balanced articles.

What if one disentangles the consensus and lays bare
controversies? How would one do that?
wip: controversy generator, possible ways forward

• analyze traces in the system
  • edit-histories
  • protected pages
  • amount of followers
  • forkings / splits
  • article length
  • bot edits
  • templates (detecting controversy types)
  • ...

More Related Content

PDF
Collaborative platforms for streamlining workflows in Open Science
PPTX
PRIME: Achievements, Challenges & Recommendations
PPTX
ScholarLib: Sharing Resources and Data by linking scientific Information Port...
PPTX
Research Data Publishing
PPT
Beyond Articles: non-standard publishing (Toby Green)
PDF
The Journal of Open Economics Data
PDF
The data journal: incentivizing open scholarship or 'a convenient fiction'?
PPTX
Brian Hole Open Access - LSE 2013 talk
Collaborative platforms for streamlining workflows in Open Science
PRIME: Achievements, Challenges & Recommendations
ScholarLib: Sharing Resources and Data by linking scientific Information Port...
Research Data Publishing
Beyond Articles: non-standard publishing (Toby Green)
The Journal of Open Economics Data
The data journal: incentivizing open scholarship or 'a convenient fiction'?
Brian Hole Open Access - LSE 2013 talk

What's hot (6)

PPTX
IPTC Rights Statements For News
PPTX
Open Access is Just the Beginning: Disrupting Publishing
PPT
News Innovation Lightning Talk
PDF
Building a scalable, sustainable service with OJS
PPTX
IPTC New Taxonomies Ideas
PPT
The Web’s Rich Tapestry
IPTC Rights Statements For News
Open Access is Just the Beginning: Disrupting Publishing
News Innovation Lightning Talk
Building a scalable, sustainable service with OJS
IPTC New Taxonomies Ideas
The Web’s Rich Tapestry
Ad

Viewers also liked (9)

PDF
Digital Methods Summer School 2015 Tool Medley
PDF
Rogers digitalmethods archived_website_30_nov10_optimized
PDF
Rogers data days_2014_slides_opti
PDF
Digital Methods Tool Medley
PPTX
Using My Own Work - Felt making
PDF
Rogers studyingpoliticalissues mar2014_optimized_ii_
PDF
Repurposing Wikipedia: Wikipedia as data set and analytical device
PDF
Digital Methods Summer School 2014 Tool Medley
PPTX
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Digital Methods Summer School 2015 Tool Medley
Rogers digitalmethods archived_website_30_nov10_optimized
Rogers data days_2014_slides_opti
Digital Methods Tool Medley
Using My Own Work - Felt making
Rogers studyingpoliticalissues mar2014_optimized_ii_
Repurposing Wikipedia: Wikipedia as data set and analytical device
Digital Methods Summer School 2014 Tool Medley
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Ad

Similar to Wiki Analytics Workshop (20)

ODP
Hacking Mediawiki
PPT
Dissecting Wikipedia
PPTX
Wikipedia - The most successful encyclopedia in the world
PDF
Digital methods for Social Sciences: origin and definitions
PPT
Wikipedia for Researchers
PDF
Wikimedia, MediaWiki & Education in IT: Notes
PDF
Wiki technologies nov_2008_ye
PDF
(Some of) Wikipedia's Open Data
PDF
Building Real Time, Open-Source Tools for Wikipedia
PPTX
Jist tutorial semantic wikis and applications
PPT
Aporte Wikis
PPT
Celt2005
PPTX
Digital Transformation and Data - the Wikimedia Residency at the University o...
PPT
Working With Wikis Libraries Aug2007
PPT
BioWikis BSB10
PDF
Exploring Article Networks on Wikipedia with NodeXL
PPTX
Wikipedia
PPT
Wikis at work
PPT
Open Knowledge Management
PPT
Wikimedia historic perspective
Hacking Mediawiki
Dissecting Wikipedia
Wikipedia - The most successful encyclopedia in the world
Digital methods for Social Sciences: origin and definitions
Wikipedia for Researchers
Wikimedia, MediaWiki & Education in IT: Notes
Wiki technologies nov_2008_ye
(Some of) Wikipedia's Open Data
Building Real Time, Open-Source Tools for Wikipedia
Jist tutorial semantic wikis and applications
Aporte Wikis
Celt2005
Digital Transformation and Data - the Wikimedia Residency at the University o...
Working With Wikis Libraries Aug2007
BioWikis BSB10
Exploring Article Networks on Wikipedia with NodeXL
Wikipedia
Wikis at work
Open Knowledge Management
Wikimedia historic perspective

More from Digital Methods Initiative (20)

PDF
Query Design for Digital Methods by Richard Rogers
PDF
Digital Methods by Richard Rogers
PDF
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
PDF
The Birth of Social Media Methods
PPTX
Interactive visualization and exploration of network data with Gephi
PDF
National Tracking Ecologies - Digital Methods Summer School 2013
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PDF
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
PDF
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
PDF
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
PDF
Digital Methods Summer School 2013 Tool Medley
PDF
Hashtag lifelines
KEY
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
PDF
Post-social methods? Issues in live research, by Noortje Marres and Esther We...
KEY
Web Flags Summer School 2012
PDF
Dmi12 workshops - crawling and scraping
PDF
Digital Methods Tool Medley. Digital Methods Summer School 2012
PDF
Digital Methods Winterschool 2012: API - Interfaces to the Cloud
PDF
DMI Workshop: When Search Becomes Research
PDF
DMI Workshop: Crawling and Scraping
Query Design for Digital Methods by Richard Rogers
Digital Methods by Richard Rogers
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
The Birth of Social Media Methods
Interactive visualization and exploration of network data with Gephi
National Tracking Ecologies - Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Digital Methods Summer School 2013 Tool Medley
Hashtag lifelines
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Post-social methods? Issues in live research, by Noortje Marres and Esther We...
Web Flags Summer School 2012
Dmi12 workshops - crawling and scraping
Digital Methods Tool Medley. Digital Methods Summer School 2012
Digital Methods Winterschool 2012: API - Interfaces to the Cloud
DMI Workshop: When Search Becomes Research
DMI Workshop: Crawling and Scraping

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharma ospi slides which help in ospi learning
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
RMMM.pdf make it easy to upload and study
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Cell Structure & Organelles in detailed.
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Complications of Minimal Access Surgery at WLH
Final Presentation General Medicine 03-08-2024.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharma ospi slides which help in ospi learning
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Week 4 Term 3 Study Techniques revisited.pptx
PPH.pptx obstetrics and gynecology in nursing
RMMM.pdf make it easy to upload and study
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Cell Structure & Organelles in detailed.
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

Wiki Analytics Workshop

  • 1. DMI’S WIKIPEDIA TOOLS Erik Borra Digital Methods Initiative University of Amsterdam 25 March 2009
  • 2. Digital Methods Initiative How can the internet be made to show what is happening in society? How to collect and analyze data and distill trends from the Web? Follow the medium as opposed to importing standard methods from social sciences.
  • 3. tools @ dmi wiki http://wiki.digitalmethods.net/Dmi/ToolDatabase? cat=DeviceCentric&subcat=Wikipedia
  • 4. wikipedia bot edits S. Niederer and J. Van Dijck (2010). “The case of Wikipedia: Wisdom of the crowd or technicity of content?” New Media and Society Short version @ http://wiki.digitalmethods.net/Dmi/ NetworkedContent
  • 5. wikipedia bot edits scraper How? • Enter the link to an article • Scraper retrieves all edit logs for an article • Filters out all mentions of ‘bot’ and ‘using’ • Returns permalink, date, time, user, permalink, comment Why? to find out dependency of article upkeep by bots
  • 6. two examples http://wiki.digitalmethods.net/Dmi/DebottingWikipedia Dependence of climate change articles on bots Anti-vandalism bot activity within a disputed article
  • 7. wikipedia edits scraper and ip localizer How? • Enter the link to an article • Scraper retrieves all edit logs for an article • When an IP is encountered instead of a username, MaxMinds IP-to-GEO database will be queried for geo information • Returns permalink, date, time, user (or IP), permalink, comment, (city, country, lat, lon) Why? Edit-history analysis, scandal research, places of edits.
  • 8. ip to geo cases Scandal research WikiScanner (http://wikiscanner.virgil.gr) Places of edits http://mastersofmedia.hum.uva.nl/2007/10/07/ repurposing-the-wikiscanner-comparing-dutch-universities- edits-on-wikipedia/
  • 9. wikipedia network analysis How? • Enter the link to an article • Scraper retrieves all bidirectional links to the article, from within Wikipedia • Scraper parses those articles and retrieves all their links • (reiterate previous step until certain depth) • List links in table (link from -> to) • Visualize Why? Article network ecology.
  • 11. wip: controversy generator Wikipedia can be seen as a controversy-defusing device as it strives to NPOV but well-balanced articles. What if one disentangles the consensus and lays bare controversies? How would one do that?
  • 12. wip: controversy generator, possible ways forward • analyze traces in the system • edit-histories • protected pages • amount of followers • forkings / splits • article length • bot edits • templates (detecting controversy types) • ...