SlideShare a Scribd company logo
Measuring Open Access - Current State of the Art
by
Éric Archambault, D.Phil.
President and CEO, Science-Metrix and 1science
ESSS 2015 - Leuven
2
The OA revolution is firmly in motion
Librarians can play a key role:
Traditional role – percolation
New role – diffusion
Researchers too – be fruitful and multiply
OA in academic publications: complex beast
Understanding the OA universe is key to
useful measurement
BACKGROUND
3
Definitions
Key vantage points
Measuring OA
Results
Conclusions
SYNOPSIS
4
Budapest Open Access Initiative (2002)
“The literature that should be freely accessible online is that which
scholars give to the world without expectation of payment. Primarily,
this category encompasses their peer-reviewed journal articles, but it
also includes any unreviewed preprints that they might wish to put
online for comment or to alert colleagues to important research
findings. There are many degrees and kinds of wider and easier access to
this literature. By "open access" to this literature, we mean its free
availability on the public internet, permitting any users to read,
download, copy, distribute, print, search, or link to the full texts of these
articles, crawl them for indexing, pass them as data to software, or use
them for any other lawful purpose, without financial, legal, or technical
barriers other than those inseparable from gaining access to the internet
itself. The only constraint on reproduction and distribution, and the only
role for copyright in this domain, should be to give authors control over
the integrity of their work and the right to be properly acknowledged
and cited.”
DEFINITIONS
5
Green OA
The main idea behind Green is self-archiving
Archiving can be done in institutional and
thematic repositories
Gold OA
The main idea behind Gold is that journal
publishers make papers available
There are Gold journals (cover-to-cover) but
also Gold papers published in subscription-
based journals (a.k.a. “hybrid journals”)
DEFINITIONS
6
Complexity of OA definition and
measurement notably due to
Embargoes
Transiency
Rights of all kind (to self-archive, to crawl, to
recombine, to use commercially, etc)
Discoverability
DEFINITIONS
7
Rules of involvement in OA
VANTAGE POINTS
8
Directories/registries of repositories
Directory of OA journals
VANTAGE POINTS
9
Free or open source repository software
DSpace
EPrints
Archimede, DAITSS, Dienst, Enterprise-Wide
Digital Repository and Archive, ETD-db,
eXtensible Text Framework, Fedora,
Greenstone, Invenio, IRPlus, Keystone Digital
Library Suite, MOAI, Omeka, OPUS, PubMan,
WEKO, PeerLibrary
Source: http://oad.simmons.edu/oadwiki/Free_and_open-source_repository_software
VANTAGE POINTS
10
Key repositories
arXiv.org – the mothership
PubMed Central / Europe PubMed Central
Aggregators
OpenAire
BASE
CORE
A typical repository
hosted by the Umeå universitet Library
VANTAGE POINTS
11
Despite, or perhaps because of, all the
sources of OA available, it is very difficult to
measure the availability of OA
Here, we are concerned about the
availability of peer-reviewed articles
published in scholarly journals
Why – this is what policies and mandates
are preoccupied with
BIBLIOMETRICS – PROPORTION OF OA
12
Bottom-up measurement
One would have to harvest all the sources
available and de-duplicate results
The main problem is how to determine reliably
that items
(1) were published (as opposed to an un-
submitted manuscript)
(2) are peer-reviewed
(3) answer the “so what” question (you found
there were 14,325,678 papers, so what?)
BIBLIOMETRICS – PROPORTION OF OA
13
Top-down measurement
One would have to find an exhaustive
bibliographic database of peer-reviewed articles
and verify the availability of all papers
Main problems:
(1) there is no such database
(2) extremely tedious to check all of them
(3) how do you actually do that
BIBLIOMETRICS – PROPORTION OF OA
14
Top-down measurement - Sampling
Considering the enormous task at hand, most
authors have resorted to using sampling and
search engines
Harnad and team sampled articles from the
Web of Science
Björk and team sampled articles from Scopus
Archambault and team sampled articles from
Scopus and used multiple techniques as well as
search engines
BIBLIOMETRICS – PROPORTION OF OA
15
Dealing with search engines
Use user-friendly meta-search engines such as
DuckDuckGo or DogPile
Try to stay below the radar using mainstream
search engines
Neither solution feels remotely confortable
Other solution is to build a dedicated
infrastructure to facilitate OA discovery (this
is the solution used by 1science)
BIBLIOMETRICS – PROPORTION OF OA
16
Divergence from the real measure is due to
Capacity to design instrument that provides true
value (function of recall and retrieval precision)
Capacity to increase statistical significance
through large samples
SAMPLING AND METROLOGY
17
A true positive (tp) in the present case is a paper known to be available in OA which
is found by the harvesting instrument developed for the current project. A true
negative (tn) is an article which is not available for free and is not found by the
instrument. False positives and negatives (fp and fn) are the converse of the later.
Retrieval precision, also called positive predictive value, provides an estimation of
how frequently the instrument finds correct positive results and is calculated as
follows:
Retrieval Precision =
𝑡𝑡
𝑡𝑡+𝑓𝑓
Recall, also called true positive rate or sensitivity, is the capacity to correctly
identify a large proportion of the positive records:
Recall =
𝑡𝑡
𝑡𝑡+𝑓𝑓
Knowing the precise characteristics in terms of true and false positives and
negatives allows for the computation of an adjustment score, which can then be
applied to recalibrate the results to obtain a truer measure, one that corrects the
limits of the instrument. The adjustment made in the previous study is based on
the following formula:
Adjustment =
𝑡𝑡+𝑓𝑓
𝑡𝑡+𝑓𝑓
SAMPLING AND METROLOGY
18
Statistical precision can be assessed with the margin of
error (ME). For a proportion (p) where the population is
finite and known (which is the case here as the
population from which we are sampling is the Scopus
database), (N) is not systematically much larger than
the sample size (n), and in which the values are discrete
(for example, papers are discrete as one does not
publish one third of a paper), given a critical score Z
(which will be set at 0.95 in the study), ME is calculated
as follows:
𝑀𝑀 = 𝑍
𝑝 1−𝑝 𝑁−𝑛
𝑛 𝑁−1
+
0.5
𝑛
SAMPLING AND METROLOGY
19
The harvesting engine developed by Science-Metrix
searches specific sites, including Scielo, PubMed
Central, Research Gate and CiteSeerX
It also uses a locally hosted version of the metadata of
large-scale specialised repositories such as arXiv
It systematically harvests metadata from institutional
repositories listed in ROAR and OpenDOAR
Finally, and in addition, a portion of the harvesting
engine works in the cloud and searches for freely
available papers
MEASURING THE % OF OA PAPERS
20
For Gold Journal OA articles, an estimate of the
proportion of papers was made from the random
sample by matching the journals that were known to be
Gold to the year a paper was published
Journals were obtained from the Directory of Open
Access Journals (DOAJ) and the list of OA journals in
PubMed Central
This was done by matching journals’ ISSN, E-ISSN and
names from Scopus to the relevant records in the
sample
MEASURING THE % OF OA PAPERS
21
Evolution of the proportion of OA scientific papers as
measured in April 2013 and April 2014, 1996–2013
RESULTS
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
%ofpapersavailableinOA
Adjusted OA April 2014
Adjusted OA April 2013
Measured OA April 2014
Measured OA April 2013
22
Translation of OA availability between April 2013 and
April 2014
RESULTS
y = 2E-21e0.0234x
R² = 0.976
y = 3E-17e0.0186x
R² = 0.9473
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
%ofpapersavailableinOA
Adjusted OA April 2014
Adjusted OA April 2013
23
OA backfilling between April 2013 and April 2014 of
papers published in 1996–2011
RESULTS
y = 2E-112e0.1335x
R² = 0.9976
0
20,000
40,000
60,000
80,000
100,000
120,000
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
NumberofOApapersbackfilled
betweenApril2013andApril2014
24
Growth of the number of papers available in OA as
measured in April 2014, 1996–2013
RESULTS
y = 2E-73e0.09x
R² = 0.9971
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
NumberofpapersinOA
Adjusted OA
Measured OA
25
Scientific impact of OA and non-OA papers published in
1996–2011
RESULTS
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Averageofrelativecitations
(ARC,1=worldavergae)
OA
All Papers
Not OA
26
Impact contest by OA type by field, 2009–2011
N.B. Here, Gold refers to full-Gold journals, not to Gold papers in hybrid journals
RESULTS
1st place 2nd place 3rd place Least impact
Type ARC Type ARC Type ARC Type ARC
Agriculture, Fisheries & Forestry Green OA 1.57 Other OA 1.32 Not OA 0.88 Gold OA 0.51
Biology Other OA 1.37 Green OA 1.30 Not OA 0.69 Gold OA 0.47
Biomedical Research Other OA 1.23 Green OA 1.10 Gold OA 0.91 Not OA 0.65
Built Environment & Design Green OA 1.56 Other OA 1.28 Not OA 0.86 Gold OA 0.19
Chemistry Other OA 1.34 Green OA 1.28 Not OA 0.95 Gold OA 0.34
Clinical Medicine Other OA 1.56 Green OA 1.08 Gold OA 0.64 Not OA 0.63
Communication & Textual Studies Other OA 1.82 Green OA 1.51 Not OA 0.66 Gold OA 0.73
Earth & Environmental Sciences Green OA 1.46 Other OA 1.26 Gold OA 0.98 Not OA 0.72
Economics & Business Green OA 1.46 Other OA 1.30 Not OA 0.71 Gold OA 0.22
Enabling & Strategic Technologies Green OA 1.68 Other OA 1.53 Not OA 0.83 Gold OA 0.52
Engineering Green OA 1.84 Other OA 1.38 Not OA 0.83 Gold OA 0.55
General Arts, Humanities & Social Sciences Green OA 1.74 Other OA 1.49 Not OA 0.73 Gold OA 0.13
General Science & Technology Green OA 2.56 Other OA 2.24 Gold OA 0.69 Not OA 0.11
Historical Studies Green OA 2.37 Other OA 1.61 Not OA 0.76 Gold OA 0.37
Information & Communication Technology Green OA 1.62 Other OA 1.36 Gold OA 0.76 Not OA 0.69
Mathematics & Statistics Green OA 1.35 Other OA 1.11 Not OA 0.75 Gold OA 0.67
Philosophy & Theology Green OA 1.72 Other OA 1.63 Gold OA 0.86 Not OA 0.72
Physics & Astronomy Green OA 1.43 Gold OA 1.18 Other OA 1.04 Not OA 0.73
Psychology & Cognitive Sciences Other OA 1.35 Green OA 1.31 Not OA 0.66 Gold OA 0.59
Public Health & Health Services Other OA 1.38 Green OA 1.30 Not OA 0.76 Gold OA 0.71
Social Sciences Green OA 1.54 Other OA 1.44 Not OA 0.76 Gold OA 0.52
Visual & Performing Arts Green OA 2.16 Other OA 1.86 Not OA 0.77 Gold OA 0.29
Total Green OA 1.53 Other OA 1.36 Not OA 0.76 Gold OA 0.61
Field
27
OA is a fast-moving phenomenon
It is also quite complex to understand and to
measure
Uptake of OA limited by heterogeneity and
challenges in discovery
CONCLUSION
28
Growth of OA should be understood to
comprise two main aspects:
Organic growth as more publishers, researchers
and librarians increasingly make freshly
published papers freely available
“Backfilling” of already published papers by
researchers and librarians and dis-embargoing
of previously locked papers by publishers
contribute to a translation of the availability
curve
CONCLUSION
29
On average, openly accessible papers have a decidedly
greater impact
In 7 fields, publishing in subscription-based journals and not
self-archiving is the worst possible strategy
In these fields, Gold journals surpass in impact publishing in
subscription-based journals with no self-archiving, even if
these Gold journals are much younger and less established
No longer adequate to publish and forget papers
One has to actively market papers and think of post-
publishing communication strategies
Considering the high value of the knowledge contained
in papers, and their high public cost, working to
maximise diffusion and uptake is the least one can do
CONCLUSION
30
Visit 1science to learn about our solution to radically
facilitate the discovery and use of peer-reviewed open
access papers
Visit Science-Metrix to learn about our evaluation and
measurement activities
THANK YOU

More Related Content

PPTX
1science - MtlNewTech
PPTX
Derk Haank: Open Access publishing at Springer
PPTX
‘How does Open Access research sit in a Citation network?’ - Guillaume Rivall...
PPTX
FAIR data and model management for systems biology.
PPTX
Reproducible and citable data and models: an introduction.
PPTX
ACS 248th Paper 71 ChAMP Project
PDF
Capturing the context: one small(ish step for modellers, one giant leap for m...
1science - MtlNewTech
Derk Haank: Open Access publishing at Springer
‘How does Open Access research sit in a Citation network?’ - Guillaume Rivall...
FAIR data and model management for systems biology.
Reproducible and citable data and models: an introduction.
ACS 248th Paper 71 ChAMP Project
Capturing the context: one small(ish step for modellers, one giant leap for m...

What's hot (20)

PDF
LIBER and Open Access - Los proyectos Open Access de LIBER
PDF
Open access: What's in there for me?
PDF
Stories of “Glocality"—Nations in a Global Infrastructure
PDF
Acs collaborative computational technologies for biomedical research an enabl...
PDF
Introduction to Scratchpads & ViBRANT
PPT
Open Data in a Big Data World: easy to say, but hard to do?
PPTX
The Challenges of Making Data Travel, by Sabina Leonelli
PPTX
Data management: The new frontier for libraries
PPTX
Let’s go on a FAIR safari!
PPT
Hosting a compound centric community resource for chemistry data
PPTX
Fwf open access-2015_eng
PPTX
From Open Data to Open Science, by Geoffrey Boulton
PPTX
Delivering biodiversity knowledge in the information age
PPTX
The culture of researchData
PPTX
Responsible metrics for research - Jisc Digifest 2016
PPTX
Research Objects, SEEK and FAIRDOM
PPTX
Statcanwinter10
PDF
Open Access Gold of research work: A scientific view from a physicist by Bar...
PPT
ticTOCs introduction
PPTX
Citing data in research articles: principles, implementation, challenges - an...
LIBER and Open Access - Los proyectos Open Access de LIBER
Open access: What's in there for me?
Stories of “Glocality"—Nations in a Global Infrastructure
Acs collaborative computational technologies for biomedical research an enabl...
Introduction to Scratchpads & ViBRANT
Open Data in a Big Data World: easy to say, but hard to do?
The Challenges of Making Data Travel, by Sabina Leonelli
Data management: The new frontier for libraries
Let’s go on a FAIR safari!
Hosting a compound centric community resource for chemistry data
Fwf open access-2015_eng
From Open Data to Open Science, by Geoffrey Boulton
Delivering biodiversity knowledge in the information age
The culture of researchData
Responsible metrics for research - Jisc Digifest 2016
Research Objects, SEEK and FAIRDOM
Statcanwinter10
Open Access Gold of research work: A scientific view from a physicist by Bar...
ticTOCs introduction
Citing data in research articles: principles, implementation, challenges - an...
Ad

Similar to Measuring Open Access- Current State of the Art (20)

PPTX
Open sciencerefresher2019
PPTX
Resarch Methdology Chapter 3 III .pptx
PDF
OpenMinTeD: Making Sense of Large Volumes of Data
PPTX
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
PDF
An Annotated Bibliography Of Selected Articles On Altmetrics
PDF
Aussois bda-mdd-2018
PPT
MESUR: Making sense and use of usage data
PDF
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
PDF
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
PPT
Stevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access Era
PPTX
Open Archives & Open Access
PDF
PPT
Institutional Repositories
PDF
Open Access For Global Climate Change Factsheet 2011
PPT
The beauty of workflows and models
PPTX
Belgium webinar - openAIRE Research Graph
PPTX
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
PDF
Text Mining: the next data frontier. Beyond Open Access
PPTX
OpenAIRE Open Science publishing for Research Infrastructures: the EPOS use-c...
PDF
Museum impact: linking-up specimens with research published on them
Open sciencerefresher2019
Resarch Methdology Chapter 3 III .pptx
OpenMinTeD: Making Sense of Large Volumes of Data
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
An Annotated Bibliography Of Selected Articles On Altmetrics
Aussois bda-mdd-2018
MESUR: Making sense and use of usage data
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
Stevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access Era
Open Archives & Open Access
Institutional Repositories
Open Access For Global Climate Change Factsheet 2011
The beauty of workflows and models
Belgium webinar - openAIRE Research Graph
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
Text Mining: the next data frontier. Beyond Open Access
OpenAIRE Open Science publishing for Research Infrastructures: the EPOS use-c...
Museum impact: linking-up specimens with research published on them
Ad

Recently uploaded (20)

PPTX
Biomechanics of the Hip - Basic Science.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Overview of calcium in human muscles.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Sciences of Europe No 170 (2025)
PDF
An interstellar mission to test astrophysical black holes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Pharmacology of Autonomic nervous system
PPTX
perinatal infections 2-171220190027.pptx
PPTX
Microbes in human welfare class 12 .pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
C1 cut-Methane and it's Derivatives.pptx
PPTX
BIOMOLECULES PPT........................
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Biomechanics of the Hip - Basic Science.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Overview of calcium in human muscles.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Sciences of Europe No 170 (2025)
An interstellar mission to test astrophysical black holes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Pharmacology of Autonomic nervous system
perinatal infections 2-171220190027.pptx
Microbes in human welfare class 12 .pptx
Biophysics 2.pdffffffffffffffffffffffffff
Hypertension_Training_materials_English_2024[1] (1).pptx
Introduction to Cardiovascular system_structure and functions-1
7. General Toxicologyfor clinical phrmacy.pptx
lecture 2026 of Sjogren's syndrome l .pdf
C1 cut-Methane and it's Derivatives.pptx
BIOMOLECULES PPT........................
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.

Measuring Open Access- Current State of the Art

  • 1. Measuring Open Access - Current State of the Art by Éric Archambault, D.Phil. President and CEO, Science-Metrix and 1science ESSS 2015 - Leuven
  • 2. 2 The OA revolution is firmly in motion Librarians can play a key role: Traditional role – percolation New role – diffusion Researchers too – be fruitful and multiply OA in academic publications: complex beast Understanding the OA universe is key to useful measurement BACKGROUND
  • 3. 3 Definitions Key vantage points Measuring OA Results Conclusions SYNOPSIS
  • 4. 4 Budapest Open Access Initiative (2002) “The literature that should be freely accessible online is that which scholars give to the world without expectation of payment. Primarily, this category encompasses their peer-reviewed journal articles, but it also includes any unreviewed preprints that they might wish to put online for comment or to alert colleagues to important research findings. There are many degrees and kinds of wider and easier access to this literature. By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” DEFINITIONS
  • 5. 5 Green OA The main idea behind Green is self-archiving Archiving can be done in institutional and thematic repositories Gold OA The main idea behind Gold is that journal publishers make papers available There are Gold journals (cover-to-cover) but also Gold papers published in subscription- based journals (a.k.a. “hybrid journals”) DEFINITIONS
  • 6. 6 Complexity of OA definition and measurement notably due to Embargoes Transiency Rights of all kind (to self-archive, to crawl, to recombine, to use commercially, etc) Discoverability DEFINITIONS
  • 7. 7 Rules of involvement in OA VANTAGE POINTS
  • 9. 9 Free or open source repository software DSpace EPrints Archimede, DAITSS, Dienst, Enterprise-Wide Digital Repository and Archive, ETD-db, eXtensible Text Framework, Fedora, Greenstone, Invenio, IRPlus, Keystone Digital Library Suite, MOAI, Omeka, OPUS, PubMan, WEKO, PeerLibrary Source: http://oad.simmons.edu/oadwiki/Free_and_open-source_repository_software VANTAGE POINTS
  • 10. 10 Key repositories arXiv.org – the mothership PubMed Central / Europe PubMed Central Aggregators OpenAire BASE CORE A typical repository hosted by the Umeå universitet Library VANTAGE POINTS
  • 11. 11 Despite, or perhaps because of, all the sources of OA available, it is very difficult to measure the availability of OA Here, we are concerned about the availability of peer-reviewed articles published in scholarly journals Why – this is what policies and mandates are preoccupied with BIBLIOMETRICS – PROPORTION OF OA
  • 12. 12 Bottom-up measurement One would have to harvest all the sources available and de-duplicate results The main problem is how to determine reliably that items (1) were published (as opposed to an un- submitted manuscript) (2) are peer-reviewed (3) answer the “so what” question (you found there were 14,325,678 papers, so what?) BIBLIOMETRICS – PROPORTION OF OA
  • 13. 13 Top-down measurement One would have to find an exhaustive bibliographic database of peer-reviewed articles and verify the availability of all papers Main problems: (1) there is no such database (2) extremely tedious to check all of them (3) how do you actually do that BIBLIOMETRICS – PROPORTION OF OA
  • 14. 14 Top-down measurement - Sampling Considering the enormous task at hand, most authors have resorted to using sampling and search engines Harnad and team sampled articles from the Web of Science Björk and team sampled articles from Scopus Archambault and team sampled articles from Scopus and used multiple techniques as well as search engines BIBLIOMETRICS – PROPORTION OF OA
  • 15. 15 Dealing with search engines Use user-friendly meta-search engines such as DuckDuckGo or DogPile Try to stay below the radar using mainstream search engines Neither solution feels remotely confortable Other solution is to build a dedicated infrastructure to facilitate OA discovery (this is the solution used by 1science) BIBLIOMETRICS – PROPORTION OF OA
  • 16. 16 Divergence from the real measure is due to Capacity to design instrument that provides true value (function of recall and retrieval precision) Capacity to increase statistical significance through large samples SAMPLING AND METROLOGY
  • 17. 17 A true positive (tp) in the present case is a paper known to be available in OA which is found by the harvesting instrument developed for the current project. A true negative (tn) is an article which is not available for free and is not found by the instrument. False positives and negatives (fp and fn) are the converse of the later. Retrieval precision, also called positive predictive value, provides an estimation of how frequently the instrument finds correct positive results and is calculated as follows: Retrieval Precision = 𝑡𝑡 𝑡𝑡+𝑓𝑓 Recall, also called true positive rate or sensitivity, is the capacity to correctly identify a large proportion of the positive records: Recall = 𝑡𝑡 𝑡𝑡+𝑓𝑓 Knowing the precise characteristics in terms of true and false positives and negatives allows for the computation of an adjustment score, which can then be applied to recalibrate the results to obtain a truer measure, one that corrects the limits of the instrument. The adjustment made in the previous study is based on the following formula: Adjustment = 𝑡𝑡+𝑓𝑓 𝑡𝑡+𝑓𝑓 SAMPLING AND METROLOGY
  • 18. 18 Statistical precision can be assessed with the margin of error (ME). For a proportion (p) where the population is finite and known (which is the case here as the population from which we are sampling is the Scopus database), (N) is not systematically much larger than the sample size (n), and in which the values are discrete (for example, papers are discrete as one does not publish one third of a paper), given a critical score Z (which will be set at 0.95 in the study), ME is calculated as follows: 𝑀𝑀 = 𝑍 𝑝 1−𝑝 𝑁−𝑛 𝑛 𝑁−1 + 0.5 𝑛 SAMPLING AND METROLOGY
  • 19. 19 The harvesting engine developed by Science-Metrix searches specific sites, including Scielo, PubMed Central, Research Gate and CiteSeerX It also uses a locally hosted version of the metadata of large-scale specialised repositories such as arXiv It systematically harvests metadata from institutional repositories listed in ROAR and OpenDOAR Finally, and in addition, a portion of the harvesting engine works in the cloud and searches for freely available papers MEASURING THE % OF OA PAPERS
  • 20. 20 For Gold Journal OA articles, an estimate of the proportion of papers was made from the random sample by matching the journals that were known to be Gold to the year a paper was published Journals were obtained from the Directory of Open Access Journals (DOAJ) and the list of OA journals in PubMed Central This was done by matching journals’ ISSN, E-ISSN and names from Scopus to the relevant records in the sample MEASURING THE % OF OA PAPERS
  • 21. 21 Evolution of the proportion of OA scientific papers as measured in April 2013 and April 2014, 1996–2013 RESULTS 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 %ofpapersavailableinOA Adjusted OA April 2014 Adjusted OA April 2013 Measured OA April 2014 Measured OA April 2013
  • 22. 22 Translation of OA availability between April 2013 and April 2014 RESULTS y = 2E-21e0.0234x R² = 0.976 y = 3E-17e0.0186x R² = 0.9473 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 %ofpapersavailableinOA Adjusted OA April 2014 Adjusted OA April 2013
  • 23. 23 OA backfilling between April 2013 and April 2014 of papers published in 1996–2011 RESULTS y = 2E-112e0.1335x R² = 0.9976 0 20,000 40,000 60,000 80,000 100,000 120,000 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 NumberofOApapersbackfilled betweenApril2013andApril2014
  • 24. 24 Growth of the number of papers available in OA as measured in April 2014, 1996–2013 RESULTS y = 2E-73e0.09x R² = 0.9971 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 NumberofpapersinOA Adjusted OA Measured OA
  • 25. 25 Scientific impact of OA and non-OA papers published in 1996–2011 RESULTS 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Averageofrelativecitations (ARC,1=worldavergae) OA All Papers Not OA
  • 26. 26 Impact contest by OA type by field, 2009–2011 N.B. Here, Gold refers to full-Gold journals, not to Gold papers in hybrid journals RESULTS 1st place 2nd place 3rd place Least impact Type ARC Type ARC Type ARC Type ARC Agriculture, Fisheries & Forestry Green OA 1.57 Other OA 1.32 Not OA 0.88 Gold OA 0.51 Biology Other OA 1.37 Green OA 1.30 Not OA 0.69 Gold OA 0.47 Biomedical Research Other OA 1.23 Green OA 1.10 Gold OA 0.91 Not OA 0.65 Built Environment & Design Green OA 1.56 Other OA 1.28 Not OA 0.86 Gold OA 0.19 Chemistry Other OA 1.34 Green OA 1.28 Not OA 0.95 Gold OA 0.34 Clinical Medicine Other OA 1.56 Green OA 1.08 Gold OA 0.64 Not OA 0.63 Communication & Textual Studies Other OA 1.82 Green OA 1.51 Not OA 0.66 Gold OA 0.73 Earth & Environmental Sciences Green OA 1.46 Other OA 1.26 Gold OA 0.98 Not OA 0.72 Economics & Business Green OA 1.46 Other OA 1.30 Not OA 0.71 Gold OA 0.22 Enabling & Strategic Technologies Green OA 1.68 Other OA 1.53 Not OA 0.83 Gold OA 0.52 Engineering Green OA 1.84 Other OA 1.38 Not OA 0.83 Gold OA 0.55 General Arts, Humanities & Social Sciences Green OA 1.74 Other OA 1.49 Not OA 0.73 Gold OA 0.13 General Science & Technology Green OA 2.56 Other OA 2.24 Gold OA 0.69 Not OA 0.11 Historical Studies Green OA 2.37 Other OA 1.61 Not OA 0.76 Gold OA 0.37 Information & Communication Technology Green OA 1.62 Other OA 1.36 Gold OA 0.76 Not OA 0.69 Mathematics & Statistics Green OA 1.35 Other OA 1.11 Not OA 0.75 Gold OA 0.67 Philosophy & Theology Green OA 1.72 Other OA 1.63 Gold OA 0.86 Not OA 0.72 Physics & Astronomy Green OA 1.43 Gold OA 1.18 Other OA 1.04 Not OA 0.73 Psychology & Cognitive Sciences Other OA 1.35 Green OA 1.31 Not OA 0.66 Gold OA 0.59 Public Health & Health Services Other OA 1.38 Green OA 1.30 Not OA 0.76 Gold OA 0.71 Social Sciences Green OA 1.54 Other OA 1.44 Not OA 0.76 Gold OA 0.52 Visual & Performing Arts Green OA 2.16 Other OA 1.86 Not OA 0.77 Gold OA 0.29 Total Green OA 1.53 Other OA 1.36 Not OA 0.76 Gold OA 0.61 Field
  • 27. 27 OA is a fast-moving phenomenon It is also quite complex to understand and to measure Uptake of OA limited by heterogeneity and challenges in discovery CONCLUSION
  • 28. 28 Growth of OA should be understood to comprise two main aspects: Organic growth as more publishers, researchers and librarians increasingly make freshly published papers freely available “Backfilling” of already published papers by researchers and librarians and dis-embargoing of previously locked papers by publishers contribute to a translation of the availability curve CONCLUSION
  • 29. 29 On average, openly accessible papers have a decidedly greater impact In 7 fields, publishing in subscription-based journals and not self-archiving is the worst possible strategy In these fields, Gold journals surpass in impact publishing in subscription-based journals with no self-archiving, even if these Gold journals are much younger and less established No longer adequate to publish and forget papers One has to actively market papers and think of post- publishing communication strategies Considering the high value of the knowledge contained in papers, and their high public cost, working to maximise diffusion and uptake is the least one can do CONCLUSION
  • 30. 30 Visit 1science to learn about our solution to radically facilitate the discovery and use of peer-reviewed open access papers Visit Science-Metrix to learn about our evaluation and measurement activities THANK YOU