SlideShare a Scribd company logo
(Some of) Wikipedia's Open Data
Analytics Engineering
(Some of) Wikipedia's Open Data
We build analytics
infrastructure
analytics@lists.wikimedia.org
The Analytics Team sees as its primary responsibility making
Wikimedia related data available for querying and analysis
to both WMF and the different Wiki communities and
stakeholders. We develop infrastructure so all our users,
both within the Foundation as within the different
communities, can access data in a self-service fashion that is
consistent with the values of the movement.
We do not handle data
requests (for the most
part)
We try for (all) data to
be public by default.
The more accessible the data is, the more impact it can have.
But we are not there
Yet
Public Data
Data that is Useful for the
world at large.
(Some of) Wikipedia's Open Data
6th largest website[Alexa]
Wikipedia reaches hundreds
of millions of unique
devices every month and, as
such, are a good barometer
of browser popularity.
The most popular browser
?
The most
popular browser
in April 2017
Was Chrome 56
with 25% market
share
https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-browser
Issues
IE7 making a
comeback… up
more than 1%
last year
(Some of) Wikipedia's Open Data
(Some of) Wikipedia's Open Data
Bots ...
Data useful to WMF,
Researchers and
Community
Pageviews
We process about
200,000 HTTP requests / second at peak
At peak we
process
about 200.000
requests
per second
Pageview API
Get a pageview count time series of en.wikipedia's article
Albert Einstein for the month of October 2015:
http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/
en.wikipedia/all-access/all-agents/Albert_Einstein/daily/20151
00100/2015103100
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI
http://tools.wmflabs.org/pageviews/
http://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&range=latest-20&sites=tr.wikipedia.org
Issues
Bots, Bots, Bots
Data useful to WMF
(mostly)
Unique Devices
Get the monthly number of unique devices for the mobile
version of cs.wikipedia.org for the month of January,
February and March 2016:
http://wikimedia.org/api/rest_v1/metrics/unique-dev
ices/cs.wikipedia.org/mobile-site/monthly/20160101
/20160301
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Unique_Devices
https://analytics.wikimedia.org/dashboards/vital-signs/#projects=cswiki/metrics=UniqueDevices
Data useful to Community
(mostly)
Wikistats 2.0
http://stats.wikimedia.org
(Some of) Wikipedia's Open Data
CHECK IN
April 2007
TEAM/DEPT
Analytics
Wikistats exists to motivate our
editor community.
In Wikistats 2.0 we are not only
updating the website interface but we
are also providing new access to all
our edit data in an analytics-friendly
form. This much improves (and
fundamentally changes) the way,
time and resources it takes to
calculate edit metrics, for WMF and
community.
https://analytics-prototype.wmflabs.org/
https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedback/Round2
Please Chime in!
Live Data
EventStreams is a web service
that exposes continuous
streams of structured event
data. Get live updates to Wikimedia projects.
(Some of) Wikipedia's Open Data
Navigate to http://wikimedia.org in your browser and open the development console
// This is the EventStreams RecentChange stream endpoint
var url = 'https://stream.wikimedia.org/v2/stream/recentchange';
// Use EventSource (available in most browsers, or as an
// npm module: https://www.npmjs.com/package/eventsource)
// to subscribe to the stream.
var recentChangeStream = new EventSource(url);
// Print each event to the console
recentChangeStream.onmessage = function(message) {
//Parse the message.data string as JSON.
var event = JSON.parse(message.data);
console.log(event);
};
Questions?
https://xkcd.com/285
Most things documented at:
https://wikitech.wikimedia.org/wiki/Analytics/
(Some of) Wikipedia's Open Data

More Related Content

PDF
News Fact-checking: One Practical Application of Linked Statistics
PDF
Africa Media Initiative's Justin Arenstein on Data Journalism at IPI, Indigo ...
PPTX
Wikipedia - The most successful encyclopedia in the world
PPT
Web20 Intro Naj Shaik
PPTX
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
PPT
Mashups & Data Visualizations: The New Breed of Web Applications
PPT
How useful are Weblogs, RSS-Newsfeeds Wikis and Podcasting to information spe...
PDF
LAM 2015 - Social Networks Technologies
News Fact-checking: One Practical Application of Linked Statistics
Africa Media Initiative's Justin Arenstein on Data Journalism at IPI, Indigo ...
Wikipedia - The most successful encyclopedia in the world
Web20 Intro Naj Shaik
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Mashups & Data Visualizations: The New Breed of Web Applications
How useful are Weblogs, RSS-Newsfeeds Wikis and Podcasting to information spe...
LAM 2015 - Social Networks Technologies

Similar to (Some of) Wikipedia's Open Data (20)

PPT
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
PPT
Library 2.0: A New Version for the Future
PPT
ReadWriteWeb Presentation Dec08
PPTX
From Web 2.0 to Web 3.0: Yesterday, Today, Tomorrow. Where the Technology is ...
PPTX
Wikis and Networks and Blogs, Oh My!
ODP
Web2.0 2012 - lesson 7 - technologies and mashups
PPT
Power to the Users (and Librarians)
PPT
Harvesting and semantically tagging media releases from political websites us...
PPT
Impact Of Online Technology On The Nonprofit Sector
PPTX
ESUR and Web 2.0 / social media.pptx
PPT
Web 2.0
PPTX
Looking Ahead: AtoM's governance, development, and future
PPT
Wuhan Wednesday Discussion Breakout Session Keiser
PPT
Web Technology Trends for 2008 and Beyond, May 2008 Update
PPT
X|Media|Lab Wellington "Commercialising Ideas" - Presentation by Richard MacM...
PPT
What is next on the web?
PPT
Technologyppt
PDF
Visualization for Software Analytics
PPSX
Online tools webinar finalnops
PPT
Web2.0 Daniel Church
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
Library 2.0: A New Version for the Future
ReadWriteWeb Presentation Dec08
From Web 2.0 to Web 3.0: Yesterday, Today, Tomorrow. Where the Technology is ...
Wikis and Networks and Blogs, Oh My!
Web2.0 2012 - lesson 7 - technologies and mashups
Power to the Users (and Librarians)
Harvesting and semantically tagging media releases from political websites us...
Impact Of Online Technology On The Nonprofit Sector
ESUR and Web 2.0 / social media.pptx
Web 2.0
Looking Ahead: AtoM's governance, development, and future
Wuhan Wednesday Discussion Breakout Session Keiser
Web Technology Trends for 2008 and Beyond, May 2008 Update
X|Media|Lab Wellington "Commercialising Ideas" - Presentation by Richard MacM...
What is next on the web?
Technologyppt
Visualization for Software Analytics
Online tools webinar finalnops
Web2.0 Daniel Church
Ad

More from nuria_ruiz (6)

PDF
Wikipedia 101 governance and tech stack
PDF
Data and privacy at scale at wikipedia strata
PDF
The most popular browser
PDF
The dashboarding problem
PDF
Performace optimizations and frontend happiness
PDF
Client Side rendering Not so Easy
Wikipedia 101 governance and tech stack
Data and privacy at scale at wikipedia strata
The most popular browser
The dashboarding problem
Performace optimizations and frontend happiness
Client Side rendering Not so Easy
Ad

Recently uploaded (20)

PDF
Transcultural that can help you someday.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Business Analytics and business intelligence.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Managing Community Partner Relationships
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Computer network topology notes for revision
PDF
Introduction to Data Science and Data Analysis
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Predictive modeling basics in data cleaning process
Transcultural that can help you someday.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
SAP 2 completion done . PRESENTATION.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Supervised vs unsupervised machine learning algorithms
Managing Community Partner Relationships
Introduction to Knowledge Engineering Part 1
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Computer network topology notes for revision
Introduction to Data Science and Data Analysis
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ISS -ESG Data flows What is ESG and HowHow
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Predictive modeling basics in data cleaning process

(Some of) Wikipedia's Open Data