SlideShare a Scribd company logo
1/22
My repository is being aggregated:
a blessing or a curse?
Petr Knoth
CORE (Connecting REpositories)
Knowledge Media institute
The Open University
@petrknoth
Open Repositories 2014
Helsinki, Finland
2/22
Some interesting quotes about aggregations
It seems as though when we like it we call it “curation,” and when
we don’t we call it “aggregation.” https://gigaom.com/2011/07/13/like-it-
or-not-aggregation-is-part-of-the-future-of-media/
"Aggregators and Google News are, to us, the worst offenders.
They make money by living off the sweat of our brow.”
https://www.techdirt.com/articles/20091014/1831246537.shtml
3/22
OR ?
4/22
repositories
aggregators
The ecosystem
5/22
repositories
aggregators
The use cases
Enrichment &
harmonisation
Data input
Data management
Analytics
Search & discovery
Programmable
(machine-to-machine)
access
Mutually beneficial ecosystem!
6/22
repositories
aggregators
The problem
?
The aggregators
have a negative
impact on our
usage statistics.
We are improving
the
discoverability of
the repository
content and
increasing its
reuse potential.
7/22
repositories
aggregators
A shortsighted solution to the problem
Access denied to aggregators
8/22
repositories
aggregators
A shortsighted solution to the problem
Access denied to aggregators
Typically achieved using
the Robots Exclusion
Protocol (robots.txt)
9/22
Can be done selectively:
OK
* Not allowed
repositories
aggregators
A shortsighted solution to the problem
Access denied to aggregators
Typically achieved using
the Robots Exclusion
Protocol (robots.txt)
For example:
- Arch1m3r in Franc3
- OTH3S in Austr1a
- 3uras1a journals in Turk3y
10/22
The open access paradox
“Open access content is more open for exploitation by
commercial services than by not for profit public services.”
11/22
Is protectionism legal?
Groom (2004) suggests it might be illegal as it, among
other things, triggers concerns of unfair competition.
12/22
The mission of repositories according to SPARC (Crow, 2002)
“… the primary goal of repositories is to open and
disseminate research outputs to a worldwide audience …”
13/22
SPARC’s position paper on IRs
“For the repository to provide access to the broader research
community, users outside the university must be able to find and
retrieve information from the repository. Therefore, institutional
repository systems must be able to support interoperability in order
to provide access via multiple search engines and other discovery
tools. An institution does not necessarily need to implement
searching and indexing functionality to satisfy this demand: it could
simply maintain and expose metadata, allowing other services to
harvest and search the content. This simplicity lowers the barrier to
repository operation for many institutions, as it only requires a file
system to hold the content and the ability to create and share
metadata with external systems.”
14/22
COAR: About harvesting and aggregations …
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and services,
and generate new knowledge from repository content.’’
[COAR manifesto]
15/22
What is Open Access exactly?
By “open access” to [peer-reviewed research literature], we mean
its free availability on the public internet, permitting any users to
read, download, copy, distribute, print, search, or link to the full
texts of these articles, crawl them for indexing, pass them as data
to software, or use them for any other lawful purpose, without
financial, legal, or technical barriers other than those inseparable
from gaining access to the internet itself.
[BOAI, 2002]
16/22
Open Access = Access + Reuse
17/22
Multiple copies of content
• It would not be right to stop copying of content, as multiple
copies mean:
• Better preservation
• Higher availability
• Lower network latency
• Increased visibility
• Higher re-use opportunities
• Keeping the market free from monopoly
• Researchers like copying of content
18/22
Solution
• Aggregators must support repositories and help them to fulfill
their mission
• Repositories must stop believing they are the only access point
for open access content (this includes both gold and green OA)
• Aggregators must implement reasonable measures to help
repositories get accurate benchmarks.
19/22
repositories
aggregators
The solution
?
usage monitoring service
20/22
IRUS-CORE implementation
21/22
Conclusions
• It is possible to create a mutually beneficial ecosystem for both
repositories and aggregators
• Open Access is not just about access, but also reuse -
encouraging multiple copies of content.
• The primary role of repositories is to disseminate not to become
a single access point
• Repositories and aggregators each serve largely a different
audience.
• Aggregators should implement mechanisms to give credit to
repositories.
22/22
Thank you!

More Related Content

PPT
IWMW 2002: Avoiding Portal Wars - View from the Library
PPT
IWMW 2002: Avoiding Portal Wars - a JISC/DNER View
PPTX
The ENVRI user landscape
PPT
COAR: Enhancing research visibility through Open Access repositories
PPTX
'Portico: Current Work and Future Plans' by Kate Wittenberg
PPTX
'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur...
PPTX
'The Archive Layer, and the Atkinson Challenges' by John MacColl
PDF
Alek Tarkowski at Creative Commons Ireland event
IWMW 2002: Avoiding Portal Wars - View from the Library
IWMW 2002: Avoiding Portal Wars - a JISC/DNER View
The ENVRI user landscape
COAR: Enhancing research visibility through Open Access repositories
'Portico: Current Work and Future Plans' by Kate Wittenberg
'HathiTrust's Long View: Perspectives on Preservation Strategies' by Mike Fur...
'The Archive Layer, and the Atkinson Challenges' by John MacColl
Alek Tarkowski at Creative Commons Ireland event

What's hot (20)

PDF
Preservation planning at the British Library
PDF
Liber's digital preservation projects
PPTX
Stephanie Orphan - Portico- Preservation in the Digital Era AAUP 14
PPTX
Eldis 20th Anniversary Workshop 2016: Nason Bimbe
PPTX
'Building the Legal Deposit E-Journal Archive for the UK' by Andrew MacEwan
PDF
Using the Web as a Data Source: Challenges for Linked Science
PDF
'Your Scholarship. Our World. Preserving the Long Tail' by Vicky Reich
PPTX
Designing innovative and flexible library systems: Terkko Navigator on-demand...
PDF
2015 04-21-eexcess emtacl
PPTX
SC1 Hangout: Updating public databases: Automation and other challenges for c...
PPTX
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
PPTX
'Scholars Portal: What's Now, What's Next' by Steve Marks
PPT
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
PPT
Open Access Overview, Libraries All-Staff Meeting, 10/22/08
PPTX
Levels of Service for Digital Libraries
PDF
Improving library services with semantic web technology in the realm of repos...
PDF
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
PPTX
Panel discussion on Global Repositories of Merged Public Data
PDF
Models for integrating institutional repositories and research information ma...
PDF
How Jisc supports reporting, communicating and measuring research in the UK
Preservation planning at the British Library
Liber's digital preservation projects
Stephanie Orphan - Portico- Preservation in the Digital Era AAUP 14
Eldis 20th Anniversary Workshop 2016: Nason Bimbe
'Building the Legal Deposit E-Journal Archive for the UK' by Andrew MacEwan
Using the Web as a Data Source: Challenges for Linked Science
'Your Scholarship. Our World. Preserving the Long Tail' by Vicky Reich
Designing innovative and flexible library systems: Terkko Navigator on-demand...
2015 04-21-eexcess emtacl
SC1 Hangout: Updating public databases: Automation and other challenges for c...
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
'Scholars Portal: What's Now, What's Next' by Steve Marks
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open Access Overview, Libraries All-Staff Meeting, 10/22/08
Levels of Service for Digital Libraries
Improving library services with semantic web technology in the realm of repos...
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
Panel discussion on Global Repositories of Merged Public Data
Models for integrating institutional repositories and research information ma...
How Jisc supports reporting, communicating and measuring research in the UK
Ad

Viewers also liked (20)

PPTX
FOSTER - Content Delivery (WP3)
PPTX
Aggregating Research papers from Publishers' Systems to Support Text and Data...
PPTX
Semantometrics: Towards Fulltext-based Research Evaluation
PDF
RFringe15GS
PDF
Snail 12345
PPTX
CORE projects family
PPT
Ali’S Careers Power Point
PDF
Amicable resources corporate presentation- Human resource company
PPTX
Text mining in CORE (OR2012)
PPTX
From Open Access Metadata to Open Access Content: Two Principles for Increase...
PPTX
Core presentation
PPTX
DiggiCORE: Digging into Connected Repositories
PPTX
DEVCSI Core Mobile
PPTX
The murder of a student.
PPTX
Towards an Infrastructure for Mining Scientific Publications
PPTX
CORE: Aggregating and Enriching Content to Support Open Access
PPT
Suman Pandit
PPT
The Clown Doctor
PDF
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
FOSTER - Content Delivery (WP3)
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Semantometrics: Towards Fulltext-based Research Evaluation
RFringe15GS
Snail 12345
CORE projects family
Ali’S Careers Power Point
Amicable resources corporate presentation- Human resource company
Text mining in CORE (OR2012)
From Open Access Metadata to Open Access Content: Two Principles for Increase...
Core presentation
DiggiCORE: Digging into Connected Repositories
DEVCSI Core Mobile
The murder of a student.
Towards an Infrastructure for Mining Scientific Publications
CORE: Aggregating and Enriching Content to Support Open Access
Suman Pandit
The Clown Doctor
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
Ad

Similar to My repository is being aggregated: a blessing or a curse? (20)

PPT
CORE - Petr Knoth, Research Associate
PPTX
Better together: building services for public good on top of content from the...
PPTX
Better together: building services for public good on top of content from the...
PPTX
OA Repositories for DE in Myanmar presentation
PDF
Putting Open Access into Practice
PDF
What is Open Access? An Introduction to OA
PPTX
Overview of open access progress globally
PPT
Reshaping the world of scholarly communication by Dr. Usha Munshi
PPTX
Open Access: Open Access Looking for ways to increase the reach and impact of...
PPTX
Open Access: Prospectors Wanted!
PPT
Emerging Trends in Scholarly Communication and the coming Decade of Open Access
PPT
Open access for researchers, policy makers and research managers - Short ver...
PDF
Getting Started with Institutional Repositories and Open Access
PDF
OpenAIRE at Open Knowledge Governance for Innovation, Internet Governance For...
PPTX
World landscape of repositories and repository networks: achievements, challe...
PPTX
Oai 10 clacso at panel the future of repositories (for slideshare)
PPTX
Open access e repositories kelaniya workshop final
PPT
OPEN ACCESS RESOURCES
PPTX
Freeing up Research with Open Access
PPT
OpenAccess policies as tools for innovative research and educational challenges.
CORE - Petr Knoth, Research Associate
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
OA Repositories for DE in Myanmar presentation
Putting Open Access into Practice
What is Open Access? An Introduction to OA
Overview of open access progress globally
Reshaping the world of scholarly communication by Dr. Usha Munshi
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Prospectors Wanted!
Emerging Trends in Scholarly Communication and the coming Decade of Open Access
Open access for researchers, policy makers and research managers - Short ver...
Getting Started with Institutional Repositories and Open Access
OpenAIRE at Open Knowledge Governance for Innovation, Internet Governance For...
World landscape of repositories and repository networks: achievements, challe...
Oai 10 clacso at panel the future of repositories (for slideshare)
Open access e repositories kelaniya workshop final
OPEN ACCESS RESOURCES
Freeing up Research with Open Access
OpenAccess policies as tools for innovative research and educational challenges.

More from petrknoth (14)

PPTX
Qui Bono? Cumulative advantage in open access publishing
PPTX
CORE APIv3
PPTX
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
PPTX
UKRI OA policy requirements for repositories and how to meet them
PPTX
Enabling Educators to Locate High-Quality Teaching Resources
PPTX
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
PPTX
CORE Analytics Dashboard
PPTX
Analysing the performance of open access papers discovery tools
PPTX
Assessing Compliance with the UK REF 2021 Open Access Policy
PPTX
Data interoperability toolkit (OpenMinTeD)
PPTX
Integrating research indicators for use in the repositories infrastructure
PPTX
Towards effective research recommender systems for repositories
PPTX
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
PPTX
Seamless access to the world’s open access research papers via ResourceSync
Qui Bono? Cumulative advantage in open access publishing
CORE APIv3
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
UKRI OA policy requirements for repositories and how to meet them
Enabling Educators to Locate High-Quality Teaching Resources
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
CORE Analytics Dashboard
Analysing the performance of open access papers discovery tools
Assessing Compliance with the UK REF 2021 Open Access Policy
Data interoperability toolkit (OpenMinTeD)
Integrating research indicators for use in the repositories infrastructure
Towards effective research recommender systems for repositories
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
Seamless access to the world’s open access research papers via ResourceSync

Recently uploaded (20)

PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Introduction to cybersecurity and digital nettiquette
PPTX
Mathew Digital SEO Checklist Guidlines 2025
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
DOCX
Unit-3 cyber security network security of internet system
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
Digital Literacy And Online Safety on internet
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PPTX
newyork.pptxirantrafgshenepalchinachinane
An introduction to the IFRS (ISSB) Stndards.pdf
Module 1 - Cyber Law and Ethics 101.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Introduction to cybersecurity and digital nettiquette
Mathew Digital SEO Checklist Guidlines 2025
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Unit-3 cyber security network security of internet system
presentation_pfe-universite-molay-seltan.pptx
Exploring VPS Hosting Trends for SMBs in 2025
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Digital Literacy And Online Safety on internet
artificialintelligenceai1-copy-210604123353.pptx
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Sims 4 Historia para lo sims 4 para jugar
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
newyork.pptxirantrafgshenepalchinachinane

My repository is being aggregated: a blessing or a curse?

  • 1. 1/22 My repository is being aggregated: a blessing or a curse? Petr Knoth CORE (Connecting REpositories) Knowledge Media institute The Open University @petrknoth Open Repositories 2014 Helsinki, Finland
  • 2. 2/22 Some interesting quotes about aggregations It seems as though when we like it we call it “curation,” and when we don’t we call it “aggregation.” https://gigaom.com/2011/07/13/like-it- or-not-aggregation-is-part-of-the-future-of-media/ "Aggregators and Google News are, to us, the worst offenders. They make money by living off the sweat of our brow.” https://www.techdirt.com/articles/20091014/1831246537.shtml
  • 5. 5/22 repositories aggregators The use cases Enrichment & harmonisation Data input Data management Analytics Search & discovery Programmable (machine-to-machine) access Mutually beneficial ecosystem!
  • 6. 6/22 repositories aggregators The problem ? The aggregators have a negative impact on our usage statistics. We are improving the discoverability of the repository content and increasing its reuse potential.
  • 7. 7/22 repositories aggregators A shortsighted solution to the problem Access denied to aggregators
  • 8. 8/22 repositories aggregators A shortsighted solution to the problem Access denied to aggregators Typically achieved using the Robots Exclusion Protocol (robots.txt)
  • 9. 9/22 Can be done selectively: OK * Not allowed repositories aggregators A shortsighted solution to the problem Access denied to aggregators Typically achieved using the Robots Exclusion Protocol (robots.txt) For example: - Arch1m3r in Franc3 - OTH3S in Austr1a - 3uras1a journals in Turk3y
  • 10. 10/22 The open access paradox “Open access content is more open for exploitation by commercial services than by not for profit public services.”
  • 11. 11/22 Is protectionism legal? Groom (2004) suggests it might be illegal as it, among other things, triggers concerns of unfair competition.
  • 12. 12/22 The mission of repositories according to SPARC (Crow, 2002) “… the primary goal of repositories is to open and disseminate research outputs to a worldwide audience …”
  • 13. 13/22 SPARC’s position paper on IRs “For the repository to provide access to the broader research community, users outside the university must be able to find and retrieve information from the repository. Therefore, institutional repository systems must be able to support interoperability in order to provide access via multiple search engines and other discovery tools. An institution does not necessarily need to implement searching and indexing functionality to satisfy this demand: it could simply maintain and expose metadata, allowing other services to harvest and search the content. This simplicity lowers the barrier to repository operation for many institutions, as it only requires a file system to hold the content and the ability to create and share metadata with external systems.”
  • 14. 14/22 COAR: About harvesting and aggregations … “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto]
  • 15. 15/22 What is Open Access exactly? By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. [BOAI, 2002]
  • 16. 16/22 Open Access = Access + Reuse
  • 17. 17/22 Multiple copies of content • It would not be right to stop copying of content, as multiple copies mean: • Better preservation • Higher availability • Lower network latency • Increased visibility • Higher re-use opportunities • Keeping the market free from monopoly • Researchers like copying of content
  • 18. 18/22 Solution • Aggregators must support repositories and help them to fulfill their mission • Repositories must stop believing they are the only access point for open access content (this includes both gold and green OA) • Aggregators must implement reasonable measures to help repositories get accurate benchmarks.
  • 21. 21/22 Conclusions • It is possible to create a mutually beneficial ecosystem for both repositories and aggregators • Open Access is not just about access, but also reuse - encouraging multiple copies of content. • The primary role of repositories is to disseminate not to become a single access point • Repositories and aggregators each serve largely a different audience. • Aggregators should implement mechanisms to give credit to repositories.

Editor's Notes

  • #3: Aggregators bring together content distributed across many systems, enhance it and make it available from a single virtual space. They operate in different domains including travel, news also in research. While some believe they add value, other do not. You can see some of the quotes illustrating the situation. In this presentation, I will discuss how to create a mutually beneficial ecosystem for repositories and aggregators in the open access domain.
  • #4: In this presentation, I will discuss how to create a mutually beneficial ecosystem for repositories and aggregators in the open access domain and find out if aggregations are a blessing or a curse.
  • #5: We have repositories and aggregators. And by repositories, I do not mean only institutional repositories, but also CRISes, subject repositories and the systems of publishers.
  • #6: Repositories and aggregators each focus on different use cases in a single ecosystem. Aggregators mostly explout the harmonised access to large quantities of data.
  • #7: Both repositories and aggregators have their own set of users who are in manycases distinct: developers, bibliometricians. Repositories receiev funding from their institutiosn and need to show the impact to justify this amount. Aggregators have typically quote complex funding models, but they also need to show this impact. So as yu can probably see this is an issue of credit
  • #8: We have repositories and aggregators. And by repositories, I do not mean only institutional repositories, but also CRISes, subject repositories and the systems of publishers.
  • #9: We have repositories and aggregators. And by repositories, I do not mean only institutional repositories, but also CRISes, subject repositories and the systems of publishers.
  • #10: We have repositories and aggregators. And by repositories, I do not mean only institutional repositories, but also CRISes, subject repositories and the systems of publishers.
  • #11: unethical and disadvantageous for the scholarly community and the public, Text-mining and CC-BY – commercial services have been text-mining non-CC-BY content for years
  • #12: unethical and disadvantageous for the scholarly community and the public,
  • #15: COAR and CORE are two different things.
  • #16: Let me know revisit some of the goals of OA, and I am sure this is familiar. I would like to show on this the importance of building an infrastructure that supports the reuse of OA content.
  • #17: OA means Access+Reuse, but in order to be abel to reuse, we must aggregate, as aggregations enable such reuse
  • #18: It would not be right to prevent multiple copies of content to be available on the internet Add a map with articles spread across the world
  • #19: Let me know revisit some of the goals of OA, and I am sure this is familiar. I would like to show on this the importance of building an infrastructure that supports the reuse of OA content.
  • #20: Both repositories and aggregators have their own set of users who are in manycases distinct: developers, bibliometricians. Repositories receiev funding from their institutiosn and need to show the impact to justify this amount. Aggregators have typically quote complex funding models, but they also need to show this impact.