SlideShare a Scribd company logo
1
b
b
www.know-center.at
Exploring Coverage and Distribution of
Scholarly Identifiers on the Web
14th International Symposium of Information Science
Zadar, 21 May 2015
Peter Kraker, Asura Enkhbayar & Elisabeth Lex
2
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
The Emerging Eco-system of Scholarly Services on the Web
3
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
The Emerging Eco-system of Scholarly Services on the Web
4
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
5
Research Questions
• How are scholarly identifiers distributed in crowd-
sourced systems on the Web?
• Does the provision of different identifiers have an
influence on findability of scientific publications in other
bibliographic and bibliometric sources?
• Who are the top providers of identifiers?
• Which identifier combinations are the most common?
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
6
Indentifiers on the Scholarly Web
Article level
Publication levelAuthor level
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
7
Data & Method
Data Sources: arXiv, CrossRef
and Mendeley
arXiv discipline of quantitative
biology (q-bio), 1992-2014
(n=14,195 metadata records)
Data collection: 17/11/2014
arXiv discipline of astrophysics
(astro-ph), 2009-2014
(n=81,814 metdata records)
Data collection: 04/03/2015
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
arxiv-ID, DOI
DOI
• arXiv-ID
• DOI
• Scopus-ID
• Pubmed-ID
• ISSN
8
Distribution of arXiv Articles Over Time: q-bio
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
9
Distribution of arXiv Articles Over Time: astro-ph
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
10
Results of the Data Collection Process
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
36.1%
50.0%
81.5%
29.8%
76.8%
86.1%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
docs with DOI on
arXiv
docs with DOI docs found on
Mendeley
q-bio (1992-2014; n=14,195) astro-ph (2009-2014; n=81,814)
Change in the
metadata lookup
11
Findability on Mendeley: q-bio (1992-2014)
11Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
Total documents
14,195
Documents
with DOI & arXiv-ID
7,100 (50.02%)
Documents
with arXiv-ID
7,095 (49.98%)
Documents retrieved
on Mendeley
with DOI
6,492 (91.44%)
Documents retrieved
on Mendeley
with arXiv-ID
4,896 (69.01%)
Documents retrieved
on Mendeley
with arXiv-ID
5,414 (76.25%)
12
Findability on Mendeley: astro-ph (2009-2014)
12Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
Total documents
81,814
Documents
with DOI & arXiv-ID
62,800 (76.8%)
Documents
with arXiv-ID
19,014 (23.2%)
Documents retrieved
on Mendeley
with DOI
54,093 (86.1%)
Documents retrieved
on Mendeley
with arXiv-ID
12,812 (67.4%)
Documents retrieved
on Mendeley
with arXiv-ID
45,586 (72.6%)
13
Findability on Mendeley & DOI Coverage: q-bio
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
14
Findability on Mendeley & DOI Coverage: astro-ph
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2009 2010 2011 2012 2013 2014
findability
DOI availability
15
Identifier Frequency and Mean Readership on
Mendeley
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
arxiv doi scopus pmid issn
q-bio (n=11,570)
frequency
10,351
(89.5%)
8,321
(71.9%)
8,409
(72.7%)
5,477
(47.3%)
8,119
(70.2%)
mean readership 20.4 25.4 25.4 32.4 25.9
median readership 9 13 13 19 14
astro-ph (n=70,459)
frequency
65,719
(93.3%)
61,667
(87.5%)
58,780
(83.4%)
1,454
(2.1%)
60,985
(86.6%)
mean readership 7.2 7.3 7.5 20.6 7.3
median readership 5 5 13 5 6
StdDev: 33.4
16
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
Distribution of Disciplines on Mendeley (March 2011)
[Kraker et al.
2012]
17
Identifier Combination Frequency on Mendeley
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
39.2%
22.8%
18.2%
4.4%
4.1%
2.7%
0.0%
0.0%
0.0%
8.5%
1.3%
10.2%
74.5%
0.0%
0.0%
4.5%
2.6%
1.7%
1.5%
3.7%
0% 20% 40% 60% 80% 100%
arxiv-doi-issn-pmid-scopus
arxiv
arxiv-doi-issn-scopus
doi-issn-pmid-scopus
arxiv-scopus
doi-issn-scopus
doi-arxiv-issn
doi-issn
doi-arxiv
Other
astro-ph (2009-2014; n=81,814) q-bio (1992-2014; n=14,195)
18
Conclusions
• As expected, crowd-sourced systems show big differences
in identifier coverage and distribution  crowd-sourced
data needs to be amended automatically
• When retrieving arXiv articles from Mendeley, we were able
to obtain more articles using the DOI than the arXiv-ID
• BUT: a single arXiv-ID is the second most popular identifier
combination on Mendeley. This suggests that pre-prints are
being read – if at a lower level
• There is, however, a certain time lag concerning the
adoption of articles from arXiv on Mendeley
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
19
Conclusions
• The distribution of identifiers in a collection of papers
gives hints at the nature of papers in this collection
• As with citations, field normalization will be essential for
cross-comparison of altmetrics scores
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
20
Limitations & Future Work
• Choice of arXiv as primary data source introduces a bias
• We only looked at two disciplines from natural sciences
• Using a random sample of 381 articles from Web of Science,
Zahedi et al. (2014) report that they were able to retrieve only
47.7% of articles on Mendeley using the DOI or the title.
Expand this study to further disciplines and fields using
more data sources
 Look more deeply into the reasons for disciplinary and
tool-related differences
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
21
Open Source Crawling Framework
Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
https://github.com/Bubblbu/crawling-framework
22
b
SAVE THE DATE: i-KNOW Conference
Special Track on Science 2.0 & Open Science
21-23 October 2015, Graz, Austria
Submission Deadline (extended): 22 June 2015 (Abstracts due: June 8)

More Related Content

PDF
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PPT
Who will use the open data? Mark Humphries keynote
PPTX
FAIR for the future: embracing all things data
PPT
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
PPTX
From Data Search to Data Showcasing
PPTX
THOR Workshop - Data Publishing Elsevier
PDF
Alain Frey Research Data for universities and information producers
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
Who will use the open data? Mark Humphries keynote
FAIR for the future: embracing all things data
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
From Data Search to Data Showcasing
THOR Workshop - Data Publishing Elsevier
Alain Frey Research Data for universities and information producers

What's hot (20)

PDF
Fair by design
PPTX
THOR Workshop - Data Publishing PLOS
PPTX
THOR Workshop - Data Publishing
PPTX
Lankade data Vinnova webbinarium
PPTX
20200130_Mannocci_OpenAIRE_ResearchGraph
PPTX
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
PDF
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
PDF
NordForsk Open Access Reykjavik 14-15/8-2014:Status and-plans-norway
PDF
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
PPTX
Why would a publisher care about open data?
PDF
Open interoperability standards, tools and services at EMBL-EBI
PPTX
Modern tools for sharing and synthesizing neuroimaging results
PPTX
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PPTX
Journal Data Sharing Policies rscd2018
PPTX
Rscd 2018 Journal policies - natasha simons
PPTX
Data Management for Research (New Faculty Orientation)
PDF
Research Data Explored: Citations versus Altmetrics
PDF
Increasing transparency in Medical Education through Open Data
PPTX
Whitehead Seminar 5/2
PPTX
Minimal viable-datareuse-czi
Fair by design
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing
Lankade data Vinnova webbinarium
20200130_Mannocci_OpenAIRE_ResearchGraph
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
NordForsk Open Access Reykjavik 14-15/8-2014:Status and-plans-norway
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
Why would a publisher care about open data?
Open interoperability standards, tools and services at EMBL-EBI
Modern tools for sharing and synthesizing neuroimaging results
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
Journal Data Sharing Policies rscd2018
Rscd 2018 Journal policies - natasha simons
Data Management for Research (New Faculty Orientation)
Research Data Explored: Citations versus Altmetrics
Increasing transparency in Medical Education through Open Data
Whitehead Seminar 5/2
Minimal viable-datareuse-czi
Ad

Similar to Exploring Coverage and Distribution of Scholarly Identifiers on the Web (14)

PPTX
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
PPTX
Approaches to Structured Data for SEO
PPTX
MMU research seminar slides
PDF
ArXiv Literature Exploration using Social Network Analysis
PDF
IC-SDV 2019: Are Ontologies relevant in a Machine Learning World? - Lee Harla...
PDF
Nordic health data metadata
PDF
Research data explored II: the Anatomy and Reception of figshare
PPTX
CiteSeerX: Mining Scholarly Big Data
PDF
Scientific Knowledge Graphs: an Overview
PDF
Strata 2012: Big Data and Bibliometrics
PPT
PIDs and DOI registration with DataCite - IATUL Workshop 2013
PPTX
OA Week 2011 UC Davis: Beyond the Impact Factor: Getting Your Research Notice...
PPTX
Monitoring the broad impact of the journal publication output on country leve...
PDF
Workshop 5: Uptake of, and concepts in text and data mining
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
Approaches to Structured Data for SEO
MMU research seminar slides
ArXiv Literature Exploration using Social Network Analysis
IC-SDV 2019: Are Ontologies relevant in a Machine Learning World? - Lee Harla...
Nordic health data metadata
Research data explored II: the Anatomy and Reception of figshare
CiteSeerX: Mining Scholarly Big Data
Scientific Knowledge Graphs: an Overview
Strata 2012: Big Data and Bibliometrics
PIDs and DOI registration with DataCite - IATUL Workshop 2013
OA Week 2011 UC Davis: Beyond the Impact Factor: Getting Your Research Notice...
Monitoring the broad impact of the journal publication output on country leve...
Workshop 5: Uptake of, and concepts in text and data mining
Ad

More from Open Knowledge Maps (20)

PPTX
Open Knowledge Maps
PDF
Academic SEO, or: How do I get my research to show up in search engines and d...
PDF
Open Science in Practice
PDF
Open Knowledge Maps Overview & Business Model
PDF
Changing The Way We Discover Research
PDF
On Open Knowledge Maps, Aiming High & Integrity
PDF
The Vienna Principles: A Vision for Scholarly Communication
PDF
Introduction to "Open Science - What's in it for me?"
PDF
Open Knowledge Maps - A Visual Interface to the World's Scientific Knowledge
PDF
Altmetrics & visualizations for discovery
PDF
Open Knowledge Maps for the Wikiverse!
PDF
Open Knowledge Maps: A Visual Interface to the World's Scientific Knowledge
PDF
The Open Science Way to do Literature Research
PDF
Open Knowledge Maps, a visual interface to the world's scientific knowledge
PPTX
Open Knowledge Maps Mozfest Demo
PDF
Open Science Lecture: Altmetrics
PPTX
Open Science in Practice
PDF
How to get recognition for your Open Science work: practical tools, guideline...
PPTX
Research Data Explored: Two Studies on Data Citation & Usage
PPTX
Open Knowledge Maps
Academic SEO, or: How do I get my research to show up in search engines and d...
Open Science in Practice
Open Knowledge Maps Overview & Business Model
Changing The Way We Discover Research
On Open Knowledge Maps, Aiming High & Integrity
The Vienna Principles: A Vision for Scholarly Communication
Introduction to "Open Science - What's in it for me?"
Open Knowledge Maps - A Visual Interface to the World's Scientific Knowledge
Altmetrics & visualizations for discovery
Open Knowledge Maps for the Wikiverse!
Open Knowledge Maps: A Visual Interface to the World's Scientific Knowledge
The Open Science Way to do Literature Research
Open Knowledge Maps, a visual interface to the world's scientific knowledge
Open Knowledge Maps Mozfest Demo
Open Science Lecture: Altmetrics
Open Science in Practice
How to get recognition for your Open Science work: practical tools, guideline...
Research Data Explored: Two Studies on Data Citation & Usage

Recently uploaded (20)

PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
20th Century Theater, Methods, History.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Introduction to Building Materials
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
My India Quiz Book_20210205121199924.pdf
Weekly quiz Compilation Jan -July 25.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
20th Century Theater, Methods, History.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Introduction to pro and eukaryotes and differences.pptx
Computing-Curriculum for Schools in Ghana
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Introduction to Building Materials
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
HVAC Specification 2024 according to central public works department
My India Quiz Book_20210205121199924.pdf

Exploring Coverage and Distribution of Scholarly Identifiers on the Web

  • 1. 1 b b www.know-center.at Exploring Coverage and Distribution of Scholarly Identifiers on the Web 14th International Symposium of Information Science Zadar, 21 May 2015 Peter Kraker, Asura Enkhbayar & Elisabeth Lex
  • 2. 2 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics The Emerging Eco-system of Scholarly Services on the Web
  • 3. 3 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics The Emerging Eco-system of Scholarly Services on the Web
  • 4. 4 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 5. 5 Research Questions • How are scholarly identifiers distributed in crowd- sourced systems on the Web? • Does the provision of different identifiers have an influence on findability of scientific publications in other bibliographic and bibliometric sources? • Who are the top providers of identifiers? • Which identifier combinations are the most common? Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 6. 6 Indentifiers on the Scholarly Web Article level Publication levelAuthor level Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 7. 7 Data & Method Data Sources: arXiv, CrossRef and Mendeley arXiv discipline of quantitative biology (q-bio), 1992-2014 (n=14,195 metadata records) Data collection: 17/11/2014 arXiv discipline of astrophysics (astro-ph), 2009-2014 (n=81,814 metdata records) Data collection: 04/03/2015 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics arxiv-ID, DOI DOI • arXiv-ID • DOI • Scopus-ID • Pubmed-ID • ISSN
  • 8. 8 Distribution of arXiv Articles Over Time: q-bio Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 9. 9 Distribution of arXiv Articles Over Time: astro-ph Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 10. 10 Results of the Data Collection Process Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 36.1% 50.0% 81.5% 29.8% 76.8% 86.1% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% docs with DOI on arXiv docs with DOI docs found on Mendeley q-bio (1992-2014; n=14,195) astro-ph (2009-2014; n=81,814) Change in the metadata lookup
  • 11. 11 Findability on Mendeley: q-bio (1992-2014) 11Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Total documents 14,195 Documents with DOI & arXiv-ID 7,100 (50.02%) Documents with arXiv-ID 7,095 (49.98%) Documents retrieved on Mendeley with DOI 6,492 (91.44%) Documents retrieved on Mendeley with arXiv-ID 4,896 (69.01%) Documents retrieved on Mendeley with arXiv-ID 5,414 (76.25%)
  • 12. 12 Findability on Mendeley: astro-ph (2009-2014) 12Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Total documents 81,814 Documents with DOI & arXiv-ID 62,800 (76.8%) Documents with arXiv-ID 19,014 (23.2%) Documents retrieved on Mendeley with DOI 54,093 (86.1%) Documents retrieved on Mendeley with arXiv-ID 12,812 (67.4%) Documents retrieved on Mendeley with arXiv-ID 45,586 (72.6%)
  • 13. 13 Findability on Mendeley & DOI Coverage: q-bio Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 14. 14 Findability on Mendeley & DOI Coverage: astro-ph Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2009 2010 2011 2012 2013 2014 findability DOI availability
  • 15. 15 Identifier Frequency and Mean Readership on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics arxiv doi scopus pmid issn q-bio (n=11,570) frequency 10,351 (89.5%) 8,321 (71.9%) 8,409 (72.7%) 5,477 (47.3%) 8,119 (70.2%) mean readership 20.4 25.4 25.4 32.4 25.9 median readership 9 13 13 19 14 astro-ph (n=70,459) frequency 65,719 (93.3%) 61,667 (87.5%) 58,780 (83.4%) 1,454 (2.1%) 60,985 (86.6%) mean readership 7.2 7.3 7.5 20.6 7.3 median readership 5 5 13 5 6 StdDev: 33.4
  • 16. 16 Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics Distribution of Disciplines on Mendeley (March 2011) [Kraker et al. 2012]
  • 17. 17 Identifier Combination Frequency on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics 39.2% 22.8% 18.2% 4.4% 4.1% 2.7% 0.0% 0.0% 0.0% 8.5% 1.3% 10.2% 74.5% 0.0% 0.0% 4.5% 2.6% 1.7% 1.5% 3.7% 0% 20% 40% 60% 80% 100% arxiv-doi-issn-pmid-scopus arxiv arxiv-doi-issn-scopus doi-issn-pmid-scopus arxiv-scopus doi-issn-scopus doi-arxiv-issn doi-issn doi-arxiv Other astro-ph (2009-2014; n=81,814) q-bio (1992-2014; n=14,195)
  • 18. 18 Conclusions • As expected, crowd-sourced systems show big differences in identifier coverage and distribution  crowd-sourced data needs to be amended automatically • When retrieving arXiv articles from Mendeley, we were able to obtain more articles using the DOI than the arXiv-ID • BUT: a single arXiv-ID is the second most popular identifier combination on Mendeley. This suggests that pre-prints are being read – if at a lower level • There is, however, a certain time lag concerning the adoption of articles from arXiv on Mendeley Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 19. 19 Conclusions • The distribution of identifiers in a collection of papers gives hints at the nature of papers in this collection • As with citations, field normalization will be essential for cross-comparison of altmetrics scores Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 20. 20 Limitations & Future Work • Choice of arXiv as primary data source introduces a bias • We only looked at two disciplines from natural sciences • Using a random sample of 381 articles from Web of Science, Zahedi et al. (2014) report that they were able to retrieve only 47.7% of articles on Mendeley using the DOI or the title. Expand this study to further disciplines and fields using more data sources  Look more deeply into the reasons for disciplinary and tool-related differences Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics
  • 21. 21 Open Source Crawling Framework Know-Center GmbH • Research Center for Data-Driven Business and Big Data Analytics https://github.com/Bubblbu/crawling-framework
  • 22. 22 b SAVE THE DATE: i-KNOW Conference Special Track on Science 2.0 & Open Science 21-23 October 2015, Graz, Austria Submission Deadline (extended): 22 June 2015 (Abstracts due: June 8)