SlideShare a Scribd company logo
Enabling better science
Results and vision of the OpenAIRE
infrastructure and RDA Data Publishing
Working Group
Paolo Manghi
paolo.manghi@isti.cnr.it
Institute of Information Science & Technologies “A. Faedo”
National Research Council, Pisa, Italy
The research group
• http://nemis.isti.cnr.it/groups/infrascience
• 27 among senior and junior researchers, sys
admins and PhD student (computer science &
information engineering)
Data interoperability
Digital Library Foundations
&
Management Systems
Enabling middleware
for service-oriented
infrastructures
Enhanced publication
&
compound object models
Virtual Research
Environments
De-duplication of
information objects
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
The research group
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
Modern Scholarly Communication
ResearchInfrastructuresgo beyondliterature
Data-driven: Jim
Gray’s fourth
paradigm
software
experiment experiment
service
Dataset publishing
Data repositories
Scientific process publishing
Web-driven: immediate
sharing and access to
digital knowledge
Literature publishing
Institutional, thematic repositories
Publisher Journal repositories
Research
Infra
Research
Infra
Market
Place
Modern Scholarly Communication
Publishing beyond literature
Comprehensive
scientific reward by
citation of any research
outcome
Improved
understanding of
research outcome
Better research review-
process [repeatability,
replicability, and
reproducibility of
experiments - Goble,
2009]
Effective dissemination
and re-use of valuable
research assets
Lower costs of science
Publication
Data
Scientific
process
Methodological processes,
executable workflows, piece of
software
Input/output to scientific
process
Funders and projects
• Funders are crucial to development of science
• If public funders, they push for Open Access mandates
• Funders require methodologies to monitor impact (ROI) and
adherence to mandates of projects they fund
• Projects require the same tools to show off their production
Publication
DataProject
Funding
Scientific
process
Scientific communication
workflows for research products
• Publications: well established (PIDs, metadata,
deposition, peer-review, citation, dissemination)
• Research data: available for given communities (PIDs,
metadata, deposition?, peer-review?, citation?
dissemination?)
• Scientific process: almost inexistent and not supported
by scientific reward mechanisms
Deposition Peer-review Dissemination
Sharing/publishing research
data: current solutions
• Part of article: subset of research data embedded as figures or
tables
• Additional material possibly submitted to the journal along with
the publication full-text
• Independent from article: Data deposited and described at
dedicated locations
• Data centres, Discipline databases, Thematic data repositories
• Linked to article: Data deposited and described at dedicated
locations with link from/to article full-texts
• Discipline-specific, data papers, deposition guidelines
• e.g. DRYAD, PANGAEA, GigaDB
• Enhanced publications, research objects
Sharing/publishing scientific
process: current solutions
• Part of article: piece of code is included in the paper or as
additional material possibly submitted to the journal along
with the publication full-text
• Independent from article: software, VMs, workflows,
Services, e-notebooks are deposited and described at
dedicated locations
• Software repositories, e.g. Github
• Workflow repositories, e.g. myexperiment.org
• Linked to article: software deposited and described at
dedicated locations with link from/to article full-texts
• Software papers, but no real deposition guidelines
• Enhanced publications, research objects
Enabling Better Science
Market-placeservicesin OpenAIREand RDA
The OpenAIRE
infrastrucure
• Establishment of an interoperable network of publication
repositories
• Deposition, discovery, linking and monitoring of research
products (articles, datasets, software) produced under
National and EC funding
• Monitoring the compliance to EC OA mandates for
publications
• Support decision makers with statistics
OPEN ACCESS INFRASTRUCTURE
FOR RESEARCH IN EUROPE
The point of reference for
Open Access in Europe
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
The OpenAIRE infrastructure
Human Network e-infrastructure
• NOADS: National Open Access
Desks
• Monitor and foster the adoption of
Open Access policies at the local
level
• Support researchers at the
implementation of the Data Pilot
• e-infrastructure for monitoring
impact of OA mandates and
research projects
• OpenAIRE guidelines for metadata
exchange
• Zenodo Repository for the
deposition of research products
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
Get support
(NOADs)
Linked Content
Statistics
Search & Browse
Feedback
Claim/deposi
t
Publications
& data
Research impact
Citations, usage
statistics
+++
Data
repositories/aggrega
tors
Data Journals
Metadata
on data
Publication
repositories/aggregators
Institutional & Thematic
Open Access
Journals/Publishers
Usage data
Metadata
And pdfs
National funding
EC funding
Guidelines for use services
Institutional
CRIS Systems
CERN/OpenAIRE “catch-all”
Guidelines for data interoperability
OpenDOAR
re3data
Validation
Cleaning &
Transformation
De-duplication
Enrichment by
metadata and
text mining
APIs
OpenAIRE data model:
view from the moon
The OpenAIRE e-infrastructure:
view from the moon
www.d-net.research-infrastructures.eu
OpenAIRE e-infrastructure:
hardware numbers
Production
• 44 CPU cores
• 84 GB of RAM
• 3,998 GB allocated disk
Mining Cluster
• 14 servers
• 98 CPU cores
• 514 GB ram
• 18,458 GB allocated disk
Data provision cluster
• 15 servers
• 90 CPU cores
• 236 GB ram
• 12,300 GB allocated disk
OpenAIRE information space
numbers (September2015)
http://www.openaire.eu
• 12M publications (de-duplicated)
• 200,000 links publication-project from 5 funders
• 9,000 datasets linked to publications or projects
• 34,000 organizations (de-duplicated)
• Collected from:
• 600+ “direct” data providers
• 5,000+ “indirect” data providers (inherited from aggregators)
• End-users…
OpenAIRE information space
numbers (September2015)
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
OpenAIRE’s
http://www.zenodo.org
• Zenodo repository (production)
• Deposition of publications, datasets, software
• DOI minting and metadata curation
• Community support
• Much more…
• FREE
• Numbers
• Publications 16,240
• Datasets 1,477
• Software 4,456
• Other products 1,400+
OpenAIRE partners and liaisons
SHARE
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
Other OA initiatives:
international collaborations
RDA-ANDS
(Australia)
SHARE
(United
States)
La
Referencia
(South
America)
CAS
(China)
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
Sharing research products and
context to enable better science
in OpenAIRE
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
OpenAIRE
2009 - 2012
OpenAIRE Plus
2011 - 2014
OpenAIRE2020
2015 - 2018
EC Open Access mandate
monitoring
European
Grid Infra
(EGI)
Links among
publications,
data and
process
National
funders
Links
between
publications
and data
ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
Publications in OpenAIRE
• Publications acquisition policy
• Open Access publications
• Publications linked to a project whose funder is
supported by OpenAIRE
• Publications are collected from literature repositories
and “claimed” by registered end-users
• Metadata and full text (when Open Access or agreed with
publishers)
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Funders and projects in
OpenAIRE
• Collects projects from the following funder sources
• European Commission: FP7 and H2020
• Wellcome Trust
• FCT (Portugal)
• NHRMC (Australia)
• ARC (Australia)
• Science Foundation Ireland (Ireland)
• On the way: Croatian, Dutch, and American (NSF)
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Enabling better science:
publications and funders
• OpenAIRE guidelines for literature repositories
• “How to describe” publications
• “How to describe” projects
• “How to put publications in context” with projects
• Cooperation with SHARE (US), JISC (UK), La Referencia (South
America)
• OpenAIRE Services
• Offering access to project information by funder
• Inferring links between articles and projects of any funders
• Monitoring ROI/Open Access of any funders by project (and
more)
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Enabling better science:
publications and funders
• Literature Broker Service for Institutional Repositories
(deliver 2016)
• Serving repository managers
• Subscriptions based on configurable criteria of
publication-repository “closeness”
27
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Research Data in OpenAIRE
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
• OpenAIRE research data acquisition Policy
• Must be linked to OpenAIRE publications or to projects
• No datasets identified by accession numbers
• Dealt with as “external links”
• Datasets are collected from data archives and
“claimed” by registered end-users
Enabling better science:
research data
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
• OpenAIRE guidelines for data archives
• “How to describe” datasets (inspired by DataCite)
• “How to put datasets in context” with projects
• OpenAIRE services
• Inference of links to datasets from article full-text
• Extraction of dataset-publication links from data
archives (e.g. PANGAEA, DataCite)
Research initiatives
• OpenAIRE opens to “research initiatives” willing to
• Monitor the productivity of the community in terms of
publications and datasets
• Support the discovery of research made by peers in the same
community
Publication
DataProject
Funding
Research
Initiative
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Enabling better science
Research initiatives
• OpenAIRE research initiatives
• European Grid Infrastructure (EGI), concepts: EGI
Virtual Organizations and EGI disciplines
• OpenAIRE services
• Inference of links to research initiatives from
article full-texts
• Monitoring ROI and Open Access w.r.t. relevant
“concepts” of a research activity
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Enabling better science:
Scientific processes
To be defined
• Scientific process acquisition policies
• e.g. software, process (e.g. Taverna workflows), methods (e.g. e-
notebooks)
• Collection strategies
• From “process repositories”? E.g. myexperiments.org, GitHub
• Guidelines for “process repository managers”
• OpenAIRE services
• Monitoring ROI of projects in terms of processes!
• Inference/extraction of article/process links?
Publicatio
ns
EC
funding
National
funding
Research
Data
Research
Initiatives
Scientific
Process
Research Data
Alliance (RDA)
Research Data Alliance (RDA)
PublishingData ServicesWorkingGroup
• Forum funded by the Commission to propel the discussion
among researchers and practitioners in the ambit of research
data management
• Identification of common, cross-discipline problems and yield
best practices, recommendations
• Organized in Interest Groups and Working Groups
• Focus:
• Publishing Data Interest Group: umbrella of WGs focusing on
enabling a stronger research data publishing infrastructure
• Publishing Data Services Working Group: focusing on article-
datasets interlinking
Publishing Data Services Working Group
Data-article links
• Benefits of creating context by establishing data-article links
• Increasing visibility and discoverability
• Stimulating reuse and repeatability
• Key to make it worth it:
• Infrastructural approach: linking needs to be done collectively, at
community (and cross-community) level, sharing procedures,
policies and technologies
• Issues
• No common framework for interlinking datasets and published
articles
• Initiatives live in isolation and cannot be combined
Enabling better science: giving
open access to article-dataset links
• Creating “an open, freely accessible, web based service
that enables its users to identify datasets that are
associated with a given article, and vice versa”
• The Service will serve as a flexible sandbox
• Major scholarly communication stakeholders involved at
different levels
• Feed authoritative links to the Service
• Access links from the service
• Feedback requirements, preferences, recommendations, obstacles
to refine/enhance the service
Harkan Grudd
Siddeswara Guru
Laure Haak (ORCID)
John Helly
Francisco Hernandez
Simon Hodson
Richard Kidd (RSc)
Hylke Koers (Elsevier) – co-chair
Paolo Manghi (OpenAire)
Haralambos Marmanis
Caroline Martin
Jo McEntyre (EMBL - EBI)
Yolanda Meleco
Sheila Morrissey
Lyubomir Penev
Mohan Ramamurthy
Howard Ratner
Nigel Robinson (Thomson Reuters)
Sergio Ruiz (DataCite)
Uwe Schindler (PANGAEA)
Johanna Schwarz (Springer)
Martina Stockhause
Carly Strasser
Eefke Smit (STM)
Jonathan Tedds
Joachim Wackerow
Juanle Wang
Hua Xu
Eva Zanzerkia
Claire Austin
David Arctur
Amir Aryani (ANDS)
Geoff Bilder (CrossRef)
Timea Biro
Adrian Burton (ANDS) - co-chair
Ian Bruno (CCDC)
Sarah Callaghan
David Carlson
Jamus Collier (PANGAEA)
Suenje Dallmeier-Thiessen
Tim DiLauro
Ingrid Dillo
Rorie Edmunds
Janine Felden
Carol Goble
Jeffrey Grethe
PDS-WG Stakeholders
The Data-Literature Service
• A one-for-all service model infrastructure for the research
data publishing
• Increase interoperability
• Decrease systemic inefficiencies
• Power new tools and functionalities to the benefit of researchers
Benefits
• For data repositories and journal publishers
• Linking becomes more scalable and cheaper, ensuring
more visibility for data sources (and their “customers”)
• For research institutes, bibliographic providers,
and funding bodies
• Enables bibliographic services and productivity
assessment tools that track datasets and journal
publications within a common framework
• For researchers
• Sharing and accessing relevant articles and data easier,
more efficient and accurate, thereby increasing
scientific reward and enhancing its practices.
System development and operation:
OpenAIRE and PANGAEA
Links collection
…
Harmonizing
PID
resolution
De-
duplicating
Information Space
Web Portal
Core Data Model
Data Sources
OAI-PMHSearch APIs
Examples:
• Pairs of DOIs
• DataCite records
• PANGAEA records
OAI-PMH
intersection
Information Space
Core Data Model Schema
The Service (BETA)
http://dliservice.research-infrastructures.eu
Powered by:
• OpenAIRE D-NET
software
• PANGAEA search engine
Some numbers
• Close to 1 Million links and 2 Millions objects
Providers
What’s intrinsically
wrong?
Research products publishing
workflow
Digital research
Products (articles, data,
scientific process)
Repositories for
literature, research
data, scientific process
Research e-infrastructure
Market-place services
ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
Research products publishing
workflow
Reuse of
products
Lack of
context: no
replication
Deposition
De-
contextualisation
Staticity
Extra Cost
Quality
Assessment
Inefficient
peer-review:
No repeatability
Dissemination
Fragmentation
in thematic or
typology silos
Lack of
semantic
linking
Research activities
• A research activity can be intended as the course of actions,
following a scientific method that leads to prove an initial
thesis in order to bring novelty to a research field
• Every research activity builds upon and produces a wide array
of research products
Publication
DataProject
Funding
Research
Initiative
Scientfiic
process
Research
Initiative
Time for a change in scholarly
communication
Publishing Research Activities
• De facto the literature publishing workflow has been adapted
and adopted for other research products
• “Elsewhere” and “on date” philosophy
• On the contrary, modern research conducted with the support
of Research e-Infrastructures is
• Strongly contextualized, intrinsically dynamic
• Research products should be published “in place” and
“during” and together with research activities
• Research e-Infrastructures should evolve to support
marketplace-like functionality
Science 2.0 Repositories
• Creation of research
products in the Re-I is
intercepted and the new
products are published
• Notify peers about
research activities and
published products via
research social networks
• Foster continuous open
peer review
ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
SciRepo publishing model benefits
Deposition
In context
Products
remain
“alive”
No Extra Cost
Alternative
products
Quality
Assessment
Continuous
and in
context
Self-
assessment
Dissemination
Unified
Automatic
and
Complete
Deposition
De-
contextualis
ation
Staticity
Extra Cost
Quality
Assessment
Ineffective
peer-review
Dissemination
Fragmentat
ion in
thematic or
typology
silos
Lack of
semantic
linking
Current
Model Cons
SciRepo
Model Pros
ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
An example of SciRepo
Research Activity web page
ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
SciRepo in the real world
http://www.i-marine.eu
• The iMarine Research Infrastructure features part of the
SciRepo social functionalities:
• Applications running in the RI generate products that can be
shared
• Notifications of new research products via News Feed
• Research products accessible in the context of the application
that generated them
• More will be realized for the new BlueBridge project
ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
Thank you
Suggested reading:
• Bardi A., Manghi P. A Framework Supporting the Shift from
Traditional Digital Publications to Enhanced Publications (2015).
doi: 10.1045/january2015-bardi
• Manghi P., Bolikowski L., Manola N., Schirrwagen J., Smith T.
OpenAIREplus: the European Scholarly Communication Data
Infrastructure (2012).
doi:10.1045/september2012-manghi
• Assante M., Candela L., Castelli D., Manghi P., Pagano P. Science 2.0
Repositories: Time for a Change in Scholarly Communication (2015).
doi:10.1045/january2015-assante
Contacts:
donatella.castelli@isti.cnr.it
paolo.manghi@isti.cnr.it
alessia.bardi@isti.cnr.it

More Related Content

PDF
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
PDF
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
PDF
Webinar on OpenAIRE compatibility for repositories: proprietary platforms
PDF
Webinar on OpenAIRE compatibility for repositories: DSpace repository platform
PDF
OpenAIRE compatibility for DSpace repositories - OR 2014 workshop
PPTX
Making your Repository or Open Access Journal OpenAIRE compatible with OA Hor...
PPTX
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
PPTX
OpenAIRE Advance presentation at the #EUDATPORTO conference 2018
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: proprietary platforms
Webinar on OpenAIRE compatibility for repositories: DSpace repository platform
OpenAIRE compatibility for DSpace repositories - OR 2014 workshop
Making your Repository or Open Access Journal OpenAIRE compatible with OA Hor...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE Advance presentation at the #EUDATPORTO conference 2018

What's hot (20)

PPTX
2014 ALA MW SPARC-ACRL Forum Talk
PPTX
OpenAIRE and the Case of Irish Repositories
PPTX
Reporting Horizon 2020 project outputs with OpenAIRE (Project Publications Re...
PPTX
Towards a European Research Information Infrastructure
PDF
OpenAIRE@info day_amsterdam_jan_2016
PPTX
OpenAIRE webinar on Open Access in H2020 (OAW2016)
PPTX
WEBINAR: "How to manage your data to make them open and fair"
PPTX
WEBINAR: Open Research Data in Horizon 2020
PDF
WEBINAR: Open Access to publications in Horizon 2020
PDF
Open Science as-a-Service for research communities: preliminary results and u...
PDF
OpenAIRE Guidelines for Data Source Managers aiming for Metadata Harmonizatio...
PPT
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
PPTX
Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...
PPT
OpenAIRE "How to make your repository OpenAIRE compliant: EPrints"
PPTX
OpenAIRE-connect: Services for open science
PPTX
Open access to publications in Horizon 2020
PPTX
OpenAIRE services and tools - presentation at #DI4R2016
PPTX
OpenAIRE implementing open science
PPTX
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
PPTX
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
2014 ALA MW SPARC-ACRL Forum Talk
OpenAIRE and the Case of Irish Repositories
Reporting Horizon 2020 project outputs with OpenAIRE (Project Publications Re...
Towards a European Research Information Infrastructure
OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE webinar on Open Access in H2020 (OAW2016)
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Access to publications in Horizon 2020
Open Science as-a-Service for research communities: preliminary results and u...
OpenAIRE Guidelines for Data Source Managers aiming for Metadata Harmonizatio...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
Webinar@AIMS: How to practically support Open Access: Guidelines for Data Pro...
OpenAIRE "How to make your repository OpenAIRE compliant: EPrints"
OpenAIRE-connect: Services for open science
Open access to publications in Horizon 2020
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE implementing open science
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Ad

Similar to Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary) (20)

PPTX
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
PPTX
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
PPTX
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
PPTX
OpenAIRE: eInfrastructure for Open Science
PPTX
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PPTX
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
PPTX
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
PDF
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
PPTX
20200130_Mannocci_OpenAIRE_ResearchGraph
PPTX
Research Data Management in GLAM: Managing Data for Cultural Heritage
PPTX
A user journey in OpenAIRE services through the lens of repository managers -...
PDF
NFDI Physical Sciences Colloquium - FAIR
PPTX
Scholze liber 2015-06-25_final
PPTX
OpenAIRE workshop @ OR2016 - From Repositories, for repositories
PPTX
Moving from an IR to a CRIS, the why & how
PDF
Enabling Research without Geographical Boundaries via Collaborative Research ...
PPTX
Introduction to OpenAIRE services and the OpenAIRE Research Graph
PPTX
Institutional Repositories
PDF
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
PPTX
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
Moving content across the OpenAIRE infrastructure boundaries (6th RDA Plenary)
IDCC workshop: OpenAIRE services and tools for Open Research Data in H2020
Research data discovery in OpenAIRE (Presentation by Paolo Manghi at DI4R2018)
OpenAIRE: eInfrastructure for Open Science
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
20200130_Mannocci_OpenAIRE_ResearchGraph
Research Data Management in GLAM: Managing Data for Cultural Heritage
A user journey in OpenAIRE services through the lens of repository managers -...
NFDI Physical Sciences Colloquium - FAIR
Scholze liber 2015-06-25_final
OpenAIRE workshop @ OR2016 - From Repositories, for repositories
Moving from an IR to a CRIS, the why & how
Enabling Research without Geographical Boundaries via Collaborative Research ...
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Institutional Repositories
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
Ad

More from OpenAIRE (20)

PDF
10th OpenAIRE Content Providers Community Call
PDF
9th Content Providers Community Call\
PPTX
OpenAIRE in the European Open Science Cloud (EOSC)
PDF
8th Content Providers Community Call
PDF
7th Content Providers Community Call
PDF
OpenAIRE PROVIDE Dashboard for Turkish repository managers
PDF
What will it cost to manage and share my data?
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
PDF
6th Content Providers Community Call
PPTX
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PPTX
20200504_Research Data & the GDPR: How Open is Open?
PDF
20200504_Data, Data Ownership and Open Science
PPTX
20200429_Research Data & the GDPR: How Open is Open? (updated version)
PDF
20200429_Data, Data Ownership and Open Science
PPTX
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PDF
COVID-19: Activities, tools, best practice and contact points in Greece
PDF
5th Content Providers Community Call
PDF
4th Content Providers Community Call
10th OpenAIRE Content Providers Community Call
9th Content Providers Community Call\
OpenAIRE in the European Open Science Cloud (EOSC)
8th Content Providers Community Call
7th Content Providers Community Call
OpenAIRE PROVIDE Dashboard for Turkish repository managers
What will it cost to manage and share my data?
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
6th Content Providers Community Call
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_Research Data & the GDPR: How Open is Open?
20200504_Data, Data Ownership and Open Science
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Data, Data Ownership and Open Science
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
COVID-19: Activities, tools, best practice and contact points in Greece
5th Content Providers Community Call
4th Content Providers Community Call

Recently uploaded (20)

PPTX
famous lake in india and its disturibution and importance
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Microbiology with diagram medical studies .pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
The scientific heritage No 166 (166) (2025)
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPT
Chemical bonding and molecular structure
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Sciences of Europe No 170 (2025)
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
famous lake in india and its disturibution and importance
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
INTRODUCTION TO EVS | Concept of sustainability
The KM-GBF monitoring framework – status & key messages.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Microbiology with diagram medical studies .pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
neck nodes and dissection types and lymph nodes levels
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Biophysics 2.pdffffffffffffffffffffffffff
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
The scientific heritage No 166 (166) (2025)
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Chemical bonding and molecular structure
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Sciences of Europe No 170 (2025)
AlphaEarth Foundations and the Satellite Embedding dataset
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS

Enabling better science: Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group (6th RDA Plenary)

  • 1. Enabling better science Results and vision of the OpenAIRE infrastructure and RDA Data Publishing Working Group Paolo Manghi paolo.manghi@isti.cnr.it Institute of Information Science & Technologies “A. Faedo” National Research Council, Pisa, Italy
  • 2. The research group • http://nemis.isti.cnr.it/groups/infrascience • 27 among senior and junior researchers, sys admins and PhD student (computer science & information engineering) Data interoperability Digital Library Foundations & Management Systems Enabling middleware for service-oriented infrastructures Enhanced publication & compound object models Virtual Research Environments De-duplication of information objects ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 4. Modern Scholarly Communication ResearchInfrastructuresgo beyondliterature Data-driven: Jim Gray’s fourth paradigm software experiment experiment service Dataset publishing Data repositories Scientific process publishing Web-driven: immediate sharing and access to digital knowledge Literature publishing Institutional, thematic repositories Publisher Journal repositories Research Infra Research Infra Market Place
  • 5. Modern Scholarly Communication Publishing beyond literature Comprehensive scientific reward by citation of any research outcome Improved understanding of research outcome Better research review- process [repeatability, replicability, and reproducibility of experiments - Goble, 2009] Effective dissemination and re-use of valuable research assets Lower costs of science Publication Data Scientific process Methodological processes, executable workflows, piece of software Input/output to scientific process
  • 6. Funders and projects • Funders are crucial to development of science • If public funders, they push for Open Access mandates • Funders require methodologies to monitor impact (ROI) and adherence to mandates of projects they fund • Projects require the same tools to show off their production Publication DataProject Funding Scientific process
  • 7. Scientific communication workflows for research products • Publications: well established (PIDs, metadata, deposition, peer-review, citation, dissemination) • Research data: available for given communities (PIDs, metadata, deposition?, peer-review?, citation? dissemination?) • Scientific process: almost inexistent and not supported by scientific reward mechanisms Deposition Peer-review Dissemination
  • 8. Sharing/publishing research data: current solutions • Part of article: subset of research data embedded as figures or tables • Additional material possibly submitted to the journal along with the publication full-text • Independent from article: Data deposited and described at dedicated locations • Data centres, Discipline databases, Thematic data repositories • Linked to article: Data deposited and described at dedicated locations with link from/to article full-texts • Discipline-specific, data papers, deposition guidelines • e.g. DRYAD, PANGAEA, GigaDB • Enhanced publications, research objects
  • 9. Sharing/publishing scientific process: current solutions • Part of article: piece of code is included in the paper or as additional material possibly submitted to the journal along with the publication full-text • Independent from article: software, VMs, workflows, Services, e-notebooks are deposited and described at dedicated locations • Software repositories, e.g. Github • Workflow repositories, e.g. myexperiment.org • Linked to article: software deposited and described at dedicated locations with link from/to article full-texts • Software papers, but no real deposition guidelines • Enhanced publications, research objects
  • 12. • Establishment of an interoperable network of publication repositories • Deposition, discovery, linking and monitoring of research products (articles, datasets, software) produced under National and EC funding • Monitoring the compliance to EC OA mandates for publications • Support decision makers with statistics OPEN ACCESS INFRASTRUCTURE FOR RESEARCH IN EUROPE The point of reference for Open Access in Europe ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 13. The OpenAIRE infrastructure Human Network e-infrastructure • NOADS: National Open Access Desks • Monitor and foster the adoption of Open Access policies at the local level • Support researchers at the implementation of the Data Pilot • e-infrastructure for monitoring impact of OA mandates and research projects • OpenAIRE guidelines for metadata exchange • Zenodo Repository for the deposition of research products ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 14. Get support (NOADs) Linked Content Statistics Search & Browse Feedback Claim/deposi t Publications & data Research impact Citations, usage statistics +++ Data repositories/aggrega tors Data Journals Metadata on data Publication repositories/aggregators Institutional & Thematic Open Access Journals/Publishers Usage data Metadata And pdfs National funding EC funding Guidelines for use services Institutional CRIS Systems CERN/OpenAIRE “catch-all” Guidelines for data interoperability OpenDOAR re3data Validation Cleaning & Transformation De-duplication Enrichment by metadata and text mining APIs
  • 15. OpenAIRE data model: view from the moon
  • 16. The OpenAIRE e-infrastructure: view from the moon www.d-net.research-infrastructures.eu
  • 17. OpenAIRE e-infrastructure: hardware numbers Production • 44 CPU cores • 84 GB of RAM • 3,998 GB allocated disk Mining Cluster • 14 servers • 98 CPU cores • 514 GB ram • 18,458 GB allocated disk Data provision cluster • 15 servers • 90 CPU cores • 236 GB ram • 12,300 GB allocated disk
  • 18. OpenAIRE information space numbers (September2015) http://www.openaire.eu • 12M publications (de-duplicated) • 200,000 links publication-project from 5 funders • 9,000 datasets linked to publications or projects • 34,000 organizations (de-duplicated) • Collected from: • 600+ “direct” data providers • 5,000+ “indirect” data providers (inherited from aggregators) • End-users…
  • 19. OpenAIRE information space numbers (September2015) ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 20. OpenAIRE’s http://www.zenodo.org • Zenodo repository (production) • Deposition of publications, datasets, software • DOI minting and metadata curation • Community support • Much more… • FREE • Numbers • Publications 16,240 • Datasets 1,477 • Software 4,456 • Other products 1,400+
  • 21. OpenAIRE partners and liaisons SHARE ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 22. Other OA initiatives: international collaborations RDA-ANDS (Australia) SHARE (United States) La Referencia (South America) CAS (China) ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 23. Sharing research products and context to enable better science in OpenAIRE Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process OpenAIRE 2009 - 2012 OpenAIRE Plus 2011 - 2014 OpenAIRE2020 2015 - 2018 EC Open Access mandate monitoring European Grid Infra (EGI) Links among publications, data and process National funders Links between publications and data ByDonatellaCastelliandAlessiaBardi,September2015,bardi@isti.cnr.it
  • 24. Publications in OpenAIRE • Publications acquisition policy • Open Access publications • Publications linked to a project whose funder is supported by OpenAIRE • Publications are collected from literature repositories and “claimed” by registered end-users • Metadata and full text (when Open Access or agreed with publishers) Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 25. Funders and projects in OpenAIRE • Collects projects from the following funder sources • European Commission: FP7 and H2020 • Wellcome Trust • FCT (Portugal) • NHRMC (Australia) • ARC (Australia) • Science Foundation Ireland (Ireland) • On the way: Croatian, Dutch, and American (NSF) Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 26. Enabling better science: publications and funders • OpenAIRE guidelines for literature repositories • “How to describe” publications • “How to describe” projects • “How to put publications in context” with projects • Cooperation with SHARE (US), JISC (UK), La Referencia (South America) • OpenAIRE Services • Offering access to project information by funder • Inferring links between articles and projects of any funders • Monitoring ROI/Open Access of any funders by project (and more) Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 27. Enabling better science: publications and funders • Literature Broker Service for Institutional Repositories (deliver 2016) • Serving repository managers • Subscriptions based on configurable criteria of publication-repository “closeness” 27 Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 28. Research Data in OpenAIRE Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process • OpenAIRE research data acquisition Policy • Must be linked to OpenAIRE publications or to projects • No datasets identified by accession numbers • Dealt with as “external links” • Datasets are collected from data archives and “claimed” by registered end-users
  • 29. Enabling better science: research data Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process • OpenAIRE guidelines for data archives • “How to describe” datasets (inspired by DataCite) • “How to put datasets in context” with projects • OpenAIRE services • Inference of links to datasets from article full-text • Extraction of dataset-publication links from data archives (e.g. PANGAEA, DataCite)
  • 30. Research initiatives • OpenAIRE opens to “research initiatives” willing to • Monitor the productivity of the community in terms of publications and datasets • Support the discovery of research made by peers in the same community Publication DataProject Funding Research Initiative Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 31. Enabling better science Research initiatives • OpenAIRE research initiatives • European Grid Infrastructure (EGI), concepts: EGI Virtual Organizations and EGI disciplines • OpenAIRE services • Inference of links to research initiatives from article full-texts • Monitoring ROI and Open Access w.r.t. relevant “concepts” of a research activity Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 32. Enabling better science: Scientific processes To be defined • Scientific process acquisition policies • e.g. software, process (e.g. Taverna workflows), methods (e.g. e- notebooks) • Collection strategies • From “process repositories”? E.g. myexperiments.org, GitHub • Guidelines for “process repository managers” • OpenAIRE services • Monitoring ROI of projects in terms of processes! • Inference/extraction of article/process links? Publicatio ns EC funding National funding Research Data Research Initiatives Scientific Process
  • 34. Research Data Alliance (RDA) PublishingData ServicesWorkingGroup • Forum funded by the Commission to propel the discussion among researchers and practitioners in the ambit of research data management • Identification of common, cross-discipline problems and yield best practices, recommendations • Organized in Interest Groups and Working Groups • Focus: • Publishing Data Interest Group: umbrella of WGs focusing on enabling a stronger research data publishing infrastructure • Publishing Data Services Working Group: focusing on article- datasets interlinking
  • 35. Publishing Data Services Working Group Data-article links • Benefits of creating context by establishing data-article links • Increasing visibility and discoverability • Stimulating reuse and repeatability • Key to make it worth it: • Infrastructural approach: linking needs to be done collectively, at community (and cross-community) level, sharing procedures, policies and technologies • Issues • No common framework for interlinking datasets and published articles • Initiatives live in isolation and cannot be combined
  • 36. Enabling better science: giving open access to article-dataset links • Creating “an open, freely accessible, web based service that enables its users to identify datasets that are associated with a given article, and vice versa” • The Service will serve as a flexible sandbox • Major scholarly communication stakeholders involved at different levels • Feed authoritative links to the Service • Access links from the service • Feedback requirements, preferences, recommendations, obstacles to refine/enhance the service
  • 37. Harkan Grudd Siddeswara Guru Laure Haak (ORCID) John Helly Francisco Hernandez Simon Hodson Richard Kidd (RSc) Hylke Koers (Elsevier) – co-chair Paolo Manghi (OpenAire) Haralambos Marmanis Caroline Martin Jo McEntyre (EMBL - EBI) Yolanda Meleco Sheila Morrissey Lyubomir Penev Mohan Ramamurthy Howard Ratner Nigel Robinson (Thomson Reuters) Sergio Ruiz (DataCite) Uwe Schindler (PANGAEA) Johanna Schwarz (Springer) Martina Stockhause Carly Strasser Eefke Smit (STM) Jonathan Tedds Joachim Wackerow Juanle Wang Hua Xu Eva Zanzerkia Claire Austin David Arctur Amir Aryani (ANDS) Geoff Bilder (CrossRef) Timea Biro Adrian Burton (ANDS) - co-chair Ian Bruno (CCDC) Sarah Callaghan David Carlson Jamus Collier (PANGAEA) Suenje Dallmeier-Thiessen Tim DiLauro Ingrid Dillo Rorie Edmunds Janine Felden Carol Goble Jeffrey Grethe PDS-WG Stakeholders
  • 38. The Data-Literature Service • A one-for-all service model infrastructure for the research data publishing • Increase interoperability • Decrease systemic inefficiencies • Power new tools and functionalities to the benefit of researchers
  • 39. Benefits • For data repositories and journal publishers • Linking becomes more scalable and cheaper, ensuring more visibility for data sources (and their “customers”) • For research institutes, bibliographic providers, and funding bodies • Enables bibliographic services and productivity assessment tools that track datasets and journal publications within a common framework • For researchers • Sharing and accessing relevant articles and data easier, more efficient and accurate, thereby increasing scientific reward and enhancing its practices.
  • 40. System development and operation: OpenAIRE and PANGAEA Links collection … Harmonizing PID resolution De- duplicating Information Space Web Portal Core Data Model Data Sources OAI-PMHSearch APIs Examples: • Pairs of DOIs • DataCite records • PANGAEA records OAI-PMH intersection
  • 42. The Service (BETA) http://dliservice.research-infrastructures.eu Powered by: • OpenAIRE D-NET software • PANGAEA search engine
  • 43. Some numbers • Close to 1 Million links and 2 Millions objects
  • 46. Research products publishing workflow Digital research Products (articles, data, scientific process) Repositories for literature, research data, scientific process Research e-infrastructure Market-place services ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
  • 47. Research products publishing workflow Reuse of products Lack of context: no replication Deposition De- contextualisation Staticity Extra Cost Quality Assessment Inefficient peer-review: No repeatability Dissemination Fragmentation in thematic or typology silos Lack of semantic linking
  • 48. Research activities • A research activity can be intended as the course of actions, following a scientific method that leads to prove an initial thesis in order to bring novelty to a research field • Every research activity builds upon and produces a wide array of research products Publication DataProject Funding Research Initiative Scientfiic process Research Initiative
  • 49. Time for a change in scholarly communication Publishing Research Activities • De facto the literature publishing workflow has been adapted and adopted for other research products • “Elsewhere” and “on date” philosophy • On the contrary, modern research conducted with the support of Research e-Infrastructures is • Strongly contextualized, intrinsically dynamic • Research products should be published “in place” and “during” and together with research activities • Research e-Infrastructures should evolve to support marketplace-like functionality
  • 50. Science 2.0 Repositories • Creation of research products in the Re-I is intercepted and the new products are published • Notify peers about research activities and published products via research social networks • Foster continuous open peer review ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
  • 51. SciRepo publishing model benefits Deposition In context Products remain “alive” No Extra Cost Alternative products Quality Assessment Continuous and in context Self- assessment Dissemination Unified Automatic and Complete Deposition De- contextualis ation Staticity Extra Cost Quality Assessment Ineffective peer-review Dissemination Fragmentat ion in thematic or typology silos Lack of semantic linking Current Model Cons SciRepo Model Pros ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
  • 52. An example of SciRepo Research Activity web page ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
  • 53. SciRepo in the real world http://www.i-marine.eu • The iMarine Research Infrastructure features part of the SciRepo social functionalities: • Applications running in the RI generate products that can be shared • Notifications of new research products via News Feed • Research products accessible in the context of the application that generated them • More will be realized for the new BlueBridge project ByDonatellaCastelliandAlessiaBardi,MassimilianoAssante,September2015,bardi@isti.cnr.it
  • 54. Thank you Suggested reading: • Bardi A., Manghi P. A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications (2015). doi: 10.1045/january2015-bardi • Manghi P., Bolikowski L., Manola N., Schirrwagen J., Smith T. OpenAIREplus: the European Scholarly Communication Data Infrastructure (2012). doi:10.1045/september2012-manghi • Assante M., Candela L., Castelli D., Manghi P., Pagano P. Science 2.0 Repositories: Time for a Change in Scholarly Communication (2015). doi:10.1045/january2015-assante Contacts: donatella.castelli@isti.cnr.it paolo.manghi@isti.cnr.it alessia.bardi@isti.cnr.it

Editor's Notes

  • #6: Valutazione del merito scientifico
  • #15: Publications from OA repositories Links to any funding information Links them to data repositories Integrating national research systems Produces new knowledge / service Monitoring for research administrators
  • #47: WHAT WHEN WHERE