SlideShare a Scribd company logo
Managing and sharing data 
Sarah Jones 
DCC, University of Glasgow 
sarah.jones@glasgow.ac.uk 
Twitter: @sjDCC 
ERC Workshop on Research Data Management and Sharing 
18-19 September 2014 , Brussels 
Funded by:
European Research Council policy 
Commitment to open science from the start: 
"it is the firm intention of the ERC Scientific Council to issue 
specific guidelines for the mandatory deposit in open access 
repositories of research results – that is, publications, data 
and primary materials – obtained thanks to ERC grants, as 
soon as pertinent repositories become operational." 
Statement on Open Access, December 2006 
Image CC BY-SA 3.0 by Greg Emmerich 
www.flickr.com/photos/gemmerich/6365692655
Why make data available?
Sharing leads to breakthroughs 
www.nytimes.com/2010/08/13/health/research 
/13alzheimer.html?pagewanted=all&_r=0 
“It was unbelievable. Its not science 
the way most of us have practiced in 
our careers. But we all realised that 
we would never get biomarkers 
unless all of us parked our egos and 
intellectual property noses outside 
the door and agreed that all of our 
data would be public immediately.” 
Dr John Trojanowski, University of Pennsylvania 
... increases the speed of discovery
Returns for institutions 
“If an institution spent A$10 million on data, 
what would be the return? The answer is: more 
publications; an increased citation count; more 
grants; greater profile; and more collaboration.” 
Dr Ross Wilkinson, ANDS 
www.ariadne.ac.uk/issue72/oar-2013-rpt
Researchers get a citation boost 
“Publicly available data was significantly 
(p = 0.006) associated with a 69% increase in 
citations, independently of journal impact 
factor, date of publication, and author 
country of origin using linear regression.” 
Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data 
is associated with increased citation rate. DOI: 10.1371/journal.pone.0000308
But, there are also barriers... 
Who owns the data? 
• Researchers? 
• University? 
• Commercial partners? 
• Funders? 
• … 
People are often misinformed about 
who owns the data. It is particularly 
hard to determine in international 
projects or ones with industry. 
Restrictions on sharing 
• Patentable data 
• Commercial sensitivities 
• Personal, identifiable data 
• Lack of consent 
• … 
There are legitimate reasons to agree 
embargo periods, impose conditions, 
or to share only some of the data. 
However, these are often given as 
reasons not to share data at all. 
www.dcc.ac.uk/sites/default/files/documents/events/ 
workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf
And opportunity costs 
By Emilio Bruna 
http://brunalab.org/blog/2014/09/04/the-opportunity-cost- 
of-my-openscience-was-35-hours-690 
For his most recent paper: 
1. Double checking the main dataset and 
reformatting to submit to Dryad: 5 hours 
2. Creating complementary file and preparing 
metadata: 3 hours 
3. Submission of these two files and the 
metadata to Dryad: 45 minutes 
4. Preparing a map of the locations: 1 hour 
5. Submission of map to Figshare: 15 minutes 
6. Cleaning up and documenting the code, 
uploading it to GitHub: 25 hours 
7. Cost of archiving in Dryad: US$90 
8. Page Charges: $600
What needs to change? 
Conclusions from Emilio Bruna: 
• Develop a better system of incentives from the 
community for archiving data and code 
• Teach our students how to do this NOW - it’s much easier 
if you develop good habits early 
• Minimise the actual and opportunity costs 
We need to stop telling people “You should” and get 
better at telling people “Here’s how”
What is involved in data curation 
• Data Management Planning 
• Data creation 
• Annotating / documenting data 
• Analysis, use, versioning 
• Storage and backup 
• Publishing papers and data 
• Preparing for deposit 
• Archiving and sharing 
• Licensing 
• Citing… 
Plan 
Create 
Document 
Use 
Share 
Publish
Data Management Plans 
Brief plans to determine how data will be created, managed and 
shared. DMPs usually cover: 
1. Description of data to be collected / created 
2. Standards and methodologies for data collection & management 
3. Any issues or restrictions due to ethics and Intellectual Property 
4. Plans for data sharing and access 
5. Strategy for long-term preservation 
DMPs are often submitted as part of grant applications, but are 
useful whenever you’re creating data.
Help with DMPs 
A web-based tool to help researchers 
write data management plans 
https://dmponline.dcc.ac.uk 
Framework for creating a DMP 
A list of common elements explaining why they 
are important and giving example answers 
www.icpsr.umich.edu/icpsrweb/content/ 
datamanagement/dmp/framework.html 
www.dcc.ac.uk/sites/default/files/documents 
/resource/DMP_Checklist_2013.pdf 
Examples plans 
www.dcc.ac.uk/resources/data-management- 
plans/guidance-examples
Managing and sharing data: 
a best practice guide 
http://data-archive.ac.uk/media/2894/managingsharing.pdf
Training materials 
FOSTER project 
• Open science training 
• Courses across EU 
• Portal to OA materials 
• Guidance on Horizon 2020 
• Free online training course 
• Aimed at PhD students 
• Case studies, quizzes etc 
• Data handling tutorials 
– R 
– SPSS 
– ArcGIS 
– Nvivo 
http://datalib.edina.ac.uk/mantra www.fosteropenscience.eu
DCC tools catalogue 
A catalogue of RDM tools for different audiences. 
Tools for researchers focus on data handling, managing 
workflows, citation and impact. 
www.dcc.ac.uk/resources/external/tools-services
Tools to help with RDM activities 
impactstory.org 
Citation & 
impact 
owncloud.org 
www.datacite.org 
thedata.org 
www.taverna.org.uk 
www.myexperiment.org 
www.labtrove.org 
Documentation 
& metadata 
dataup.cdlib.org 
Workflow 
management 
Storage & 
collaboration
Metadata standards catalogue 
Use standards wherever possible for interoperability 
www.dcc.ac.uk/resources/ 
metadata-standards
Data repositories 
http://databib.org 
http://service.re3data.org/search
1. How do you foster open science? 
• Make it feasible to comply 
– provide tools and infrastructure 
• Train people early in their careers 
• Incentivise openness 
• Listen to researchers and learn from their 
experience about what doesn’t work 
• Follow up on any demands made in policies
2. Who is responsible for providing 
infrastructure and support? 
Discipline 
Funders 
Institution 
Third-party 
services 
National 
provider 
Data centres 
e.g. via NERC 
Institutional support for discipline-specific 
tools e.g. Monash MeRC 
partnership on tools like OMERO 
National brokerage of deals with third-party 
providers e.g. Jisc Janet deals with Arkivum 
And what about 
co-ordination?
3. Who should pay? 
Funding Research Data Management 
"A conversation with the funders” 
The DCC held a special 
event on this topic in 
the UK, but there’s still a 
long way to go 
www.dcc.ac.uk/events/research-data- 
management-forum-rdmf/ 
rdmf-special-event-funding- 
research-data-management
Thanks – any questions? 
DCC guidance, tools and case studies: 
www.dcc.ac.uk/resources 
Follow us on twitter: 
@digitalcuration and #ukdcc

More Related Content

PPTX
RDM LIASA webinar
PPTX
Managing and sharing data
PPTX
H2020 open-data-pilot
PPTX
RDM and DMP intro
PPTX
DMP health sciences
PPTX
RDM policy and recovering costs
PPTX
Research support-challenges
PPTX
Intro to Data Management Plans
RDM LIASA webinar
Managing and sharing data
H2020 open-data-pilot
RDM and DMP intro
DMP health sciences
RDM policy and recovering costs
Research support-challenges
Intro to Data Management Plans

What's hot (20)

PDF
OU Library Research Support webinar: Data sharing
PPT
Research Data Management
PPTX
RDM for librarians
PPTX
Overcoming obstacles to sharing data about human subjects
PPTX
EPSRC research data expectations and PURE for datasets
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPT
What is-rdm
PPTX
Research Data Management: Approaches to Institutional Policy
PPTX
Opening up data – Jisc and CNI conference 10 July 2014
PPTX
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...
PDF
Research Data Management: Policy Development
PPTX
Data Management Planning at Edinburgh
PPTX
LEARN Conference - How to cost
PPTX
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
PPTX
H2020 Open Data Pilot
PPTX
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
PPT
Supporting-DMPs
PDF
Research Data Management, Challenges and Tools - Per Öster
PPTX
Building a collaborative RDM community, research data network
PPTX
20160523 23 Research Data Things
OU Library Research Support webinar: Data sharing
Research Data Management
RDM for librarians
Overcoming obstacles to sharing data about human subjects
EPSRC research data expectations and PURE for datasets
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
What is-rdm
Research Data Management: Approaches to Institutional Policy
Opening up data – Jisc and CNI conference 10 July 2014
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...
Research Data Management: Policy Development
Data Management Planning at Edinburgh
LEARN Conference - How to cost
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
H2020 Open Data Pilot
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Supporting-DMPs
Research Data Management, Challenges and Tools - Per Öster
Building a collaborative RDM community, research data network
20160523 23 Research Data Things
Ad

Viewers also liked (9)

PPTX
DMPonline demo
PPTX
My data, your data, our data - increasing data value through reuse (Eurocris2...
PPT
H2020 data pilot openaire
PPTX
Use and reuse: research data locally & globally #esipfed
PPTX
20160719 23 Research Data Things
PPT
Disciplinary RDM
PPTX
Benefits and practice of open science
PPT
JISC repositories and preservation programme: Plenary presentation 2009
PPTX
Open Science
DMPonline demo
My data, your data, our data - increasing data value through reuse (Eurocris2...
H2020 data pilot openaire
Use and reuse: research data locally & globally #esipfed
20160719 23 Research Data Things
Disciplinary RDM
Benefits and practice of open science
JISC repositories and preservation programme: Plenary presentation 2009
Open Science
Ad

Similar to Managing and sharing data (20)

PPTX
Data Management and Horizon 2020
PPT
DC101 UWE
PPTX
Intro to RDM
PPTX
How to elaborate a data management plan
PDF
The state of global research data initiatives: observations from a life on th...
PPTX
Open Access to Research Data: Challenges and Solutions
PPTX
Gobinda Chowdhury
PDF
Open Data - strategies for research data management & impact of best practices
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Winter school in research data science research data management - final
PPTX
The FOSTER project - general overview
PPTX
Shareable by Design: Making Better Use of your Research
PPTX
Open Data Strategies and Research Data Realities
PPSX
Managing Your Research Data for Maximum Impact -Rob Daley 300616_Shared
PDF
A basic course on Research data management: part 1 - part 4
PPTX
Research Data Management
PDF
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
PPTX
Research data management : [part of] PROOF course Finding and controlling sci...
PPTX
The Horizon 2020 Open Data Pilot
PPTX
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
Data Management and Horizon 2020
DC101 UWE
Intro to RDM
How to elaborate a data management plan
The state of global research data initiatives: observations from a life on th...
Open Access to Research Data: Challenges and Solutions
Gobinda Chowdhury
Open Data - strategies for research data management & impact of best practices
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Winter school in research data science research data management - final
The FOSTER project - general overview
Shareable by Design: Making Better Use of your Research
Open Data Strategies and Research Data Realities
Managing Your Research Data for Maximum Impact -Rob Daley 300616_Shared
A basic course on Research data management: part 1 - part 4
Research Data Management
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Research data management : [part of] PROOF course Finding and controlling sci...
The Horizon 2020 Open Data Pilot
The Horizon2020 Open Data Pilot - OpenAIRE Webinar

More from Sarah Jones (20)

PPTX
Data training tips and tricks
PPTX
EOSC and libraries
PPTX
EOSC Association priorities and activities
PPTX
Managing and sharing data: lessons from the European context
PPTX
Reflections on Open Science
PPTX
MAR comments analysis
PPTX
Introduction to Open Science and EOSC
PPTX
EOSC-MAR-update.pptx
PPTX
Intro-EOSC.pptx
PPTX
Why is EOSC so hard?
PPTX
The future of FAIR
PPTX
Data Management Planning for researchers
PPTX
Is Europe ready for Open Science
PPTX
DMPonline: 10 years, 10 lessons
PPTX
Do & don't of supporting Open Science
PPTX
Why institutions need to raise their capabilities to support FAIR
PPTX
It takes more than a village: lessons on building global research commons
PPTX
DMPTuuli - what's new?
PPTX
DCC and FAIR initiatives
PPTX
Reflections on EOSC through the mirror of ARDC
Data training tips and tricks
EOSC and libraries
EOSC Association priorities and activities
Managing and sharing data: lessons from the European context
Reflections on Open Science
MAR comments analysis
Introduction to Open Science and EOSC
EOSC-MAR-update.pptx
Intro-EOSC.pptx
Why is EOSC so hard?
The future of FAIR
Data Management Planning for researchers
Is Europe ready for Open Science
DMPonline: 10 years, 10 lessons
Do & don't of supporting Open Science
Why institutions need to raise their capabilities to support FAIR
It takes more than a village: lessons on building global research commons
DMPTuuli - what's new?
DCC and FAIR initiatives
Reflections on EOSC through the mirror of ARDC

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf

Managing and sharing data

  • 1. Managing and sharing data Sarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjDCC ERC Workshop on Research Data Management and Sharing 18-19 September 2014 , Brussels Funded by:
  • 2. European Research Council policy Commitment to open science from the start: "it is the firm intention of the ERC Scientific Council to issue specific guidelines for the mandatory deposit in open access repositories of research results – that is, publications, data and primary materials – obtained thanks to ERC grants, as soon as pertinent repositories become operational." Statement on Open Access, December 2006 Image CC BY-SA 3.0 by Greg Emmerich www.flickr.com/photos/gemmerich/6365692655
  • 3. Why make data available?
  • 4. Sharing leads to breakthroughs www.nytimes.com/2010/08/13/health/research /13alzheimer.html?pagewanted=all&_r=0 “It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.” Dr John Trojanowski, University of Pennsylvania ... increases the speed of discovery
  • 5. Returns for institutions “If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.” Dr Ross Wilkinson, ANDS www.ariadne.ac.uk/issue72/oar-2013-rpt
  • 6. Researchers get a citation boost “Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression.” Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data is associated with increased citation rate. DOI: 10.1371/journal.pone.0000308
  • 7. But, there are also barriers... Who owns the data? • Researchers? • University? • Commercial partners? • Funders? • … People are often misinformed about who owns the data. It is particularly hard to determine in international projects or ones with industry. Restrictions on sharing • Patentable data • Commercial sensitivities • Personal, identifiable data • Lack of consent • … There are legitimate reasons to agree embargo periods, impose conditions, or to share only some of the data. However, these are often given as reasons not to share data at all. www.dcc.ac.uk/sites/default/files/documents/events/ workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf
  • 8. And opportunity costs By Emilio Bruna http://brunalab.org/blog/2014/09/04/the-opportunity-cost- of-my-openscience-was-35-hours-690 For his most recent paper: 1. Double checking the main dataset and reformatting to submit to Dryad: 5 hours 2. Creating complementary file and preparing metadata: 3 hours 3. Submission of these two files and the metadata to Dryad: 45 minutes 4. Preparing a map of the locations: 1 hour 5. Submission of map to Figshare: 15 minutes 6. Cleaning up and documenting the code, uploading it to GitHub: 25 hours 7. Cost of archiving in Dryad: US$90 8. Page Charges: $600
  • 9. What needs to change? Conclusions from Emilio Bruna: • Develop a better system of incentives from the community for archiving data and code • Teach our students how to do this NOW - it’s much easier if you develop good habits early • Minimise the actual and opportunity costs We need to stop telling people “You should” and get better at telling people “Here’s how”
  • 10. What is involved in data curation • Data Management Planning • Data creation • Annotating / documenting data • Analysis, use, versioning • Storage and backup • Publishing papers and data • Preparing for deposit • Archiving and sharing • Licensing • Citing… Plan Create Document Use Share Publish
  • 11. Data Management Plans Brief plans to determine how data will be created, managed and shared. DMPs usually cover: 1. Description of data to be collected / created 2. Standards and methodologies for data collection & management 3. Any issues or restrictions due to ethics and Intellectual Property 4. Plans for data sharing and access 5. Strategy for long-term preservation DMPs are often submitted as part of grant applications, but are useful whenever you’re creating data.
  • 12. Help with DMPs A web-based tool to help researchers write data management plans https://dmponline.dcc.ac.uk Framework for creating a DMP A list of common elements explaining why they are important and giving example answers www.icpsr.umich.edu/icpsrweb/content/ datamanagement/dmp/framework.html www.dcc.ac.uk/sites/default/files/documents /resource/DMP_Checklist_2013.pdf Examples plans www.dcc.ac.uk/resources/data-management- plans/guidance-examples
  • 13. Managing and sharing data: a best practice guide http://data-archive.ac.uk/media/2894/managingsharing.pdf
  • 14. Training materials FOSTER project • Open science training • Courses across EU • Portal to OA materials • Guidance on Horizon 2020 • Free online training course • Aimed at PhD students • Case studies, quizzes etc • Data handling tutorials – R – SPSS – ArcGIS – Nvivo http://datalib.edina.ac.uk/mantra www.fosteropenscience.eu
  • 15. DCC tools catalogue A catalogue of RDM tools for different audiences. Tools for researchers focus on data handling, managing workflows, citation and impact. www.dcc.ac.uk/resources/external/tools-services
  • 16. Tools to help with RDM activities impactstory.org Citation & impact owncloud.org www.datacite.org thedata.org www.taverna.org.uk www.myexperiment.org www.labtrove.org Documentation & metadata dataup.cdlib.org Workflow management Storage & collaboration
  • 17. Metadata standards catalogue Use standards wherever possible for interoperability www.dcc.ac.uk/resources/ metadata-standards
  • 18. Data repositories http://databib.org http://service.re3data.org/search
  • 19. 1. How do you foster open science? • Make it feasible to comply – provide tools and infrastructure • Train people early in their careers • Incentivise openness • Listen to researchers and learn from their experience about what doesn’t work • Follow up on any demands made in policies
  • 20. 2. Who is responsible for providing infrastructure and support? Discipline Funders Institution Third-party services National provider Data centres e.g. via NERC Institutional support for discipline-specific tools e.g. Monash MeRC partnership on tools like OMERO National brokerage of deals with third-party providers e.g. Jisc Janet deals with Arkivum And what about co-ordination?
  • 21. 3. Who should pay? Funding Research Data Management "A conversation with the funders” The DCC held a special event on this topic in the UK, but there’s still a long way to go www.dcc.ac.uk/events/research-data- management-forum-rdmf/ rdmf-special-event-funding- research-data-management
  • 22. Thanks – any questions? DCC guidance, tools and case studies: www.dcc.ac.uk/resources Follow us on twitter: @digitalcuration and #ukdcc

Editor's Notes

  • #3: Quite forward-thinking for such an early OA policy to be framed in terms of data and primary materials too, not just publications.
  • #6: He was making a comparison with the Hubble telescope, which A$1.5 billion is spent on each year. The cost of the Hubble archive (A$1 million per annum) is just a fraction of this, but given the OA mandate, they’ve see the research publications produced by Hubble discoveries double.
  • #7: There have been lots of studies in this area since that show a demonstrable citation boost, though not as high as 69%. This figure was for microarray data from cancer trials and it seems that the early datasets had a particularly strong impact and came from authors who were well-cited. A more realistic figure across the board is probably 10-30% increase.