SlideShare a Scribd company logo
Facilitate Open Science Training for European Research
Open Data: Strategies for Research Data Management,
and impact of best practices?
Martin Donnelly
Digital Curation Centre
University of Edinburgh
NCP Academy Webinar
16 June 2017
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Background (me)
• Academic background in cultural heritage computing…
• Which led me to work in digital preservation…
• Which led to my current involvement in research data
management and the broader topic of Open Science
• I’ve been involved to various degrees in the development
of early DMP resources (DCC Checklist, DMPonline,
DMPTool, book chapter on DMP…)
• Member of the original FOSTER consortium
• Also involved in consultancy, advocacy, events, training
etc, e.g. as external expert reviewer of Horizon 2020
DMPs
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source)
= Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Open Access + Open Data = Open Science
• Openness in research is situated within a context of ever
greater transparency, accessibility and accountability
• As Open Access to publications became normal (if not yet
ubiquitous), the scholarly community turned its attention to the
data which underpins the research outputs, and eventually to
consider it a first-class output in its own right. The development
of the OA and research data management (RDM) agendas are
closely linked as part of a broader trend in research, sometimes
termed ‘Open Science’ or ‘Open Research’
• “The European Commission is now moving beyond open access towards
the more inclusive area of open science. Elements of open science will
gradually feed into the shaping of a policy for Responsible Research and
Innovation and will contribute to the realisation of the European
Research Area and the Innovation Union, the two main flagship
initiatives for research and innovation”
http://ec.europa.eu/research/swafs/index.cfm?pg=policy&lib=science
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Good practice in RDM
RDM is “the active
management and appraisal
of data over the lifecycle of
scholarly and scientific
interest”
What sorts of activities?
- Planning and describing data-
related work before it takes
place
- Documenting your data so that
others can find and understand
it
- Storing it safely during the
project
- Depositing it in a trusted
archive at the end of the
project
- Linking publications to the
datasets that underpin them
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Benefits
• IMPACT and LONGEVITY: Open data (and publications) receive
more citations, over longer periods
• SPEED: The research process becomes faster
• ACCESSIBILITY: Interested third parties can (where
appropriate) access and build upon publicly-funded research
outputs with minimal barriers to access
• EFFICIENCY: Data collection can be funded once, and used
many times for a variety of purposes
• TRANSPARENCY and QUALITY: The evidence that underpins
research can be made open for anyone to scrutinise, and
attempt to replicate findings. This leads to a more robust
scholarly record, and reduces academic fraud for example
• DURABILITY: simply put, fewer important datasets will be lost
“In genomics research, a large-scale
analysis of data sharing shows that
studies that made data available in
repositories received 9% more
citations, when controlling for other
variables; and that whilst self-reuse
citation declines steeply after two
years, reuse by third parties
increases even after six years.”
(Piwowar and Vision, 2013)
Van den Eynden, V. and Bishop, L.
(2014). Incentives and motivations for
sharing research data, a researcher’s
perspective. A Knowledge Exchange
Report,
http://repository.jisc.ac.uk/5662/1/KE
_report-incentives-for-sharing-
researchdata.pdf
Benefits: Impact and Longevity
“Data is necessary for
reproducibility of
computational research”
Victoria Stodden, “Innovation and Growth
through Open Access to Scientific Research:
Three Ideas for High-Impact Rule Changes” in
Litan, Robert E. et al. Rules for Growth:
Promoting Innovation and Growth Through Legal
Reform. SSRN Scholarly Paper. Rochester, NY:
Social Science Research Network, February 8,
2011. http://papers.ssrn.com/abstract=1757982.
Benefits: Quality
Baker, M. (2016)
“1,500 scientists
lift the lid on
reproducibility”,
Nature,
533:7604,
http://www.nat
ure.com/news/1
-500-scientists-
lift-the-lid-on-
reproducibility-
1.19970
“Conservatively, we estimate that the value of data in
Australia’s public research to be at least $1.9 billion
and possibly up to $6 billion a year at current levels of
expenditure and activity. Research data curation and
sharing might be worth at least $1.8 billion and possibly
up to $5.5 billion a year, of which perhaps $1.4 billion to
$4.9 billion annually is yet to be realized.”
• “Open Research Data”, Report to the Australian National Data Service (ANDS),
November 2014 - John Houghton, Victoria Institute of Strategic Economic
Studies & Nicholas Gruen, Lateral Economics
Benefits: Financial
J. Manyika et al. "Open data: Unlocking innovation
and performance with liquid information" McKinsey
Global Institute, October 2013
“If we are going to wait
five years for data to
be released, the Arctic
is going to be a very
different place.”
Bryn Nelson, Nature, 10 Sept 2009
http://www.nature.com/nature/jour
nal/v461/n7261/index.html
Benefits: Speed
https://www.flickr.com/photos/gsfc/7348953774/
- CC-BY
Benefits: Durability
Vines et al. “examined the availability of data from 516 studies between 2 and 22 years
old”
- The odds of a data set being reported as extant fell by 17% per year
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing
- Policies mandating data archiving at publication are clearly needed
“The current system of leaving data with authors means that almost all of it is lost
over time, unavailable for validation of the original results or to use for entirely new
purposes” according to Timothy Vines, one of the researchers. This underscores the
need for intentional management of data from all disciplines and opened our
conversation on potential roles for librarians in this arena. (“80 Percent of Scientific
Data Gone in 20 Years” HNGN, Dec. 20, 2013,
http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in-
20-years.htm.)
Vines et al., The Availability of Research Data Declines Rapidly with Article Age,
Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Risks of getting this wrong
• Legal – sensitive data is protected by law (and contracts)
and needs to be protected
• Financial – non-compliance with funder policies can lead
to reduced access to income streams
• Scientific – potential discoveries may be hidden away in
drawers, on USB
• Opportunity cost – reduced visibility for research > lost
opportunities for collaboration
• Quality – the scholarly record becomes less robust
• Reputational – responsible data management is
increasingly considered a core element of good scholarly
practice in the 21st century
Growing momentum and ubiquity…
Data management
is a part of good
research practice.
- RCUK Policy and Code of
Conduct on the
Governance of Good
Research Conduct
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
Step 1. Be clear about who is involved
• RDM is a hybrid activity, involving multiple stakeholder groups…
• The researchers themselves
• Research support personnel
• Partners based in other institutions, funders, data centres, commercial
partners, etc
• No single person does everything, and it makes no sense to duplicate
effort or reinvent wheels
• Data Management Planning (DMP) underpins and pulls together
different strands of data management activities. DMP is the process
of planning, describing and communicating the activities carried
out during the research lifecycle in order to…
• Keep sensitive data safe
• Maximise data’s re-use potential
• Support longer-term preservation
• Data Management Plans are a means of communication, with
contemporaries and future re-users alike
Step 2. Write things down
• In a data management plan / record
• In metadata to describe the data and help others to
understand it
• In workflows and README files
• In version management
• In justifying decisions re. access, embargo, selection
and appraisal… the list can be very long…
Communication is crucial!
Step 3. Don’t try to do everything yourself
• See Step 1 ;)
RDM / Open Data in practice: key points
1. Understand your funder’s policies (and perhaps national policy
initiatives – see recent SPARC-Europe reports)
2. Create a data management plan (e.g. with DMPonline)
3. Decide which data to preserve (e.g. using the DCC How-To
guide and checklist, “Five Steps to Decide what Data to Keep”)
4. Identify a long-term home for your data (e.g. via re3data.org)
5. Link your data to your publications with a persistent identifier
(e.g. via DataCite)
• N.B. Many archives, including Zenodo, will do this for you
6. Investigate EU infrastructure services and resources
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
A few do’s and don’ts for RDM
DO DON’T
Have a plan for your data Make it up as you go along
Keep backups. Make this easy with automated
syncing services like Dropbox, provided your
data isn’t too sensitive
Carry the only copy around on a memory
card, your laptop, your phone, etc
Describe your data as you collect it. This
makes it possible for others to interpret it,
and for you to do the same a few years down
the line
Leave this till the end. The quality of
metadata decreases with time, and the
best metadata is created at the moment of
data capture
Save your work in open file formats, where
possible, and use accepted metadata
standards to enable like-with-like comparison
Invent new ‘standards’ where community
norms already exist
Deposit your data in a data centre or
repository, and link it to your publications
Be afraid to ask for help. This will exist
both within your institution, and via
national / European support organisations
Rules of thumb
• Without intervention, data + time = no data
• See Vines, above
• Prioritise: could anyone die or go to jail?
• Legal issues (e.g. protecting vulnerable subjects) are the most
important
• Storage is not the same as management
• Think of data as plants and the servers as a greenhouse
• The plants still need to be fed, watered, pruned, etc… and
sometimes disposed of
• Management is not the same as sharing
• Not all data should be shared
• Approach: “As open as possible, as closed as necessary”
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
• Phase 1 (2014-2016): Spread
the Seeds of Open Science and
Open Access
• Creation of Open Science
Taxonomy
• 2000+ training materials,
categorized in the FOSTER
Portal
• More than 100 f2f training
events in 28 countries and 25
online courses, totalling more
than 6300 participants
FacilitateOpenScienceTrainingforEuropeanResearch
The project
http://fosteropenscience.eu
• Phase 2 (2017-2019): Let the Flowers of Open Science Bloom
• Focus on:
• Training for the practical implementation of Open Science (face to face
and online) including RDM and Open Data
• Developing intermediate/advanced level/discipline-specific training
resources in collaboration with three disciplinary communities (and
related RIs): Life Sciences (ELIXIR), Social Sciences (CESSDA) and
Humanities (DARIAH)
• Update the FOSTER Portal to support moderated learning, badges and
gamification
• In concrete terms:
• 150 new training resources
• Over 50 training events (outcome-oriented, providing participants with
tangible skills) and 20 e-learning courses
• Multi-module Open Science Toolkit
• Trainers Network, Open Science Bootcamp, Open Science Training
Handbook, and more…
FacilitateOpenScienceTrainingforEuropeanResearch
The project
http://fosteropenscience.eu
Overview
1. Background
2. Context: Open Access + Open Data (+ Open Source) =
Open Science (or Open Research)
3. What is good RDM practice?
4. What are the benefits of good RDM?
5. What are the risks of poor RDM?
6. A step by step approach
7. Do’s and don’ts / Rules of thumb
8. About the FOSTER project
9. About the DCC / contact details
The Digital Curation Centre (DCC)
• UK national centre of expertise in digital preservation
and data management, est. 2004
• Principal audience is the UK higher education sector, but
we increasingly work further afield (continental Europe,
North America, South Africa, Asia…)
• Provide guidance, training, tools (e.g. DMPonline) and
other services on all aspects of research data
management and Open Science
• Tailored consultancy/training
• Organise national and international events and webinars
(International Digital Curation Conference, Research
Data Management Forum)
Contact details
• For more information about the
FOSTER project:
• Website: www.fosteropenscience.eu
• Principal investigator: Eloy Rodrigues
(eloy@sdum.uminho.pt)
• General enquiries: Gwen Franck
(gwen.franck@eifl.net)
• Twitter: @fosterscience
• My contact details:
• Email: martin.donnelly@ed.ac.uk
• Twitter: @mkdDCC
• Slideshare:
http://www.slideshare.net/martindo
nnelly
This work is licensed under the
Creative Commons Attribution
2.5 UK: Scotland License.

More Related Content

PPTX
The Horizon 2020 Open Data Pilot
PDF
Digital Resources for Open Science
PPTX
The FOSTER project - general overview
PPTX
Open science, open data - FOSTER training, Potsdam
PPTX
The culture of researchData
PPTX
The Challenges of Making Data Travel, by Sabina Leonelli
PPTX
Open Access to Research Data: Challenges and Solutions
PPTX
From Open Data to Open Science, by Geoffrey Boulton
The Horizon 2020 Open Data Pilot
Digital Resources for Open Science
The FOSTER project - general overview
Open science, open data - FOSTER training, Potsdam
The culture of researchData
The Challenges of Making Data Travel, by Sabina Leonelli
Open Access to Research Data: Challenges and Solutions
From Open Data to Open Science, by Geoffrey Boulton

What's hot (20)

PPTX
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
PPT
Concept on e-Research
PPTX
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
PPTX
The Future of Open Science
PPTX
Managing and sharing data
PPTX
Winning Horizon 2020 with Open Science
PPTX
Why science needs open data – Jisc and CNI conference 10 July 2014
PDF
OU Library Research Support webinar: Data sharing
PDF
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
PPTX
Data management: The new frontier for libraries
PPTX
20160523 23 Research Data Things
PDF
OPEN DATA. The researcher perspective
PPTX
20160719 23 Research Data Things
PPTX
LEARN Conference - How to cost
PDF
Introduction to research data management
PDF
Digital Data Sharing: Opportunities and Challenges of Opening Research
PPTX
Research Data in the Arts and Humanities: A Few Difficulties
PPT
Mind the Gap: Reflections on Data Policies and Practice
PPTX
RDM LIASA webinar
PDF
Planning for Research Data Managment
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
Concept on e-Research
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
The Future of Open Science
Managing and sharing data
Winning Horizon 2020 with Open Science
Why science needs open data – Jisc and CNI conference 10 July 2014
OU Library Research Support webinar: Data sharing
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Data management: The new frontier for libraries
20160523 23 Research Data Things
OPEN DATA. The researcher perspective
20160719 23 Research Data Things
LEARN Conference - How to cost
Introduction to research data management
Digital Data Sharing: Opportunities and Challenges of Opening Research
Research Data in the Arts and Humanities: A Few Difficulties
Mind the Gap: Reflections on Data Policies and Practice
RDM LIASA webinar
Planning for Research Data Managment
Ad

Similar to Open Data - strategies for research data management & impact of best practices (20)

PPTX
Open Data Strategies and Research Data Realities
PPTX
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
PPTX
Managing and sharing data
PPTX
Open Data: Strategies for Research Data Management (and Planning)
PPTX
Data Management and Horizon 2020
PDF
University of Hertfordshire researcher development - research data management
PPTX
Practical Research Data Management: tools and approaches, pre- and post-award
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
PPTX
Gobinda Chowdhury
PPT
DC101 UWE
PPT
Research Data Management
PPT
RDM and data sharing landscape: overview for Salford DCC training 20140522
PPTX
How to elaborate a data management plan
PPTX
Research Data Management
PPTX
Research data support: a growth area for academic libraries?
PDF
Think like a Digital Curator
PDF
Open Access and Open Data: what do I need to know (and do)?
PPTX
Practicing Open Science
Open Data Strategies and Research Data Realities
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
Managing and sharing data
Open Data: Strategies for Research Data Management (and Planning)
Data Management and Horizon 2020
University of Hertfordshire researcher development - research data management
Practical Research Data Management: tools and approaches, pre- and post-award
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Gobinda Chowdhury
DC101 UWE
Research Data Management
RDM and data sharing landscape: overview for Salford DCC training 20140522
How to elaborate a data management plan
Research Data Management
Research data support: a growth area for academic libraries?
Think like a Digital Curator
Open Access and Open Data: what do I need to know (and do)?
Practicing Open Science
Ad

More from Martin Donnelly (18)

PDF
The Roots of DMPonline
PDF
Horizon 2020 open access and open data mandates
PDF
Preparing your own data for future re-use: data management and the FAIR prin...
PDF
Developing a Data Management Plan
PDF
Research Data in the Arts and Humanities: A Few Tricky Questions
PPTX
Data management plans and planning - a gentle introduction
PDF
Open Science and Horizon 2020
PPTX
Research Data Management for the Humanities and Social Sciences
PPTX
Data Management Plans: a gentle introduction
PPTX
Research Data Management: a gentle introduction for admin staff
PPTX
Research Data Management: a gentle introduction
PPTX
Future agenda: repositories, and the research process
PPTX
Research data management: a tale of two paradigms:
PPTX
Research data management: definitions, drivers and resources
PPTX
'Found' and 'after' - a short history of data reuse in the arts
PPT
Data management planning: the what, the why, the who, the how
PPT
DMP Online: update 2013
PPT
Data management planning: UK policies and beyond
The Roots of DMPonline
Horizon 2020 open access and open data mandates
Preparing your own data for future re-use: data management and the FAIR prin...
Developing a Data Management Plan
Research Data in the Arts and Humanities: A Few Tricky Questions
Data management plans and planning - a gentle introduction
Open Science and Horizon 2020
Research Data Management for the Humanities and Social Sciences
Data Management Plans: a gentle introduction
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction
Future agenda: repositories, and the research process
Research data management: a tale of two paradigms:
Research data management: definitions, drivers and resources
'Found' and 'after' - a short history of data reuse in the arts
Data management planning: the what, the why, the who, the how
DMP Online: update 2013
Data management planning: UK policies and beyond

Recently uploaded (20)

PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Business Ethics Teaching Materials for college
PDF
Pre independence Education in Inndia.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
master seminar digital applications in india
PDF
Basic Mud Logging Guide for educational purpose
PDF
01-Introduction-to-Information-Management.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
Week 4 Term 3 Study Techniques revisited.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Business Ethics Teaching Materials for college
Pre independence Education in Inndia.pdf
PPH.pptx obstetrics and gynecology in nursing
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
master seminar digital applications in india
Basic Mud Logging Guide for educational purpose
01-Introduction-to-Information-Management.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Insiders guide to clinical Medicine.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
O7-L3 Supply Chain Operations - ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra

Open Data - strategies for research data management & impact of best practices

  • 1. Facilitate Open Science Training for European Research Open Data: Strategies for Research Data Management, and impact of best practices? Martin Donnelly Digital Curation Centre University of Edinburgh NCP Academy Webinar 16 June 2017
  • 2. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 3. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 4. Background (me) • Academic background in cultural heritage computing… • Which led me to work in digital preservation… • Which led to my current involvement in research data management and the broader topic of Open Science • I’ve been involved to various degrees in the development of early DMP resources (DCC Checklist, DMPonline, DMPTool, book chapter on DMP…) • Member of the original FOSTER consortium • Also involved in consultancy, advocacy, events, training etc, e.g. as external expert reviewer of Horizon 2020 DMPs
  • 5. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 6. Open Access + Open Data = Open Science • Openness in research is situated within a context of ever greater transparency, accessibility and accountability • As Open Access to publications became normal (if not yet ubiquitous), the scholarly community turned its attention to the data which underpins the research outputs, and eventually to consider it a first-class output in its own right. The development of the OA and research data management (RDM) agendas are closely linked as part of a broader trend in research, sometimes termed ‘Open Science’ or ‘Open Research’ • “The European Commission is now moving beyond open access towards the more inclusive area of open science. Elements of open science will gradually feed into the shaping of a policy for Responsible Research and Innovation and will contribute to the realisation of the European Research Area and the Innovation Union, the two main flagship initiatives for research and innovation” http://ec.europa.eu/research/swafs/index.cfm?pg=policy&lib=science
  • 7. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 8. Good practice in RDM RDM is “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” What sorts of activities? - Planning and describing data- related work before it takes place - Documenting your data so that others can find and understand it - Storing it safely during the project - Depositing it in a trusted archive at the end of the project - Linking publications to the datasets that underpin them
  • 9. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 10. Benefits • IMPACT and LONGEVITY: Open data (and publications) receive more citations, over longer periods • SPEED: The research process becomes faster • ACCESSIBILITY: Interested third parties can (where appropriate) access and build upon publicly-funded research outputs with minimal barriers to access • EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes • TRANSPARENCY and QUALITY: The evidence that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings. This leads to a more robust scholarly record, and reduces academic fraud for example • DURABILITY: simply put, fewer important datasets will be lost
  • 11. “In genomics research, a large-scale analysis of data sharing shows that studies that made data available in repositories received 9% more citations, when controlling for other variables; and that whilst self-reuse citation declines steeply after two years, reuse by third parties increases even after six years.” (Piwowar and Vision, 2013) Van den Eynden, V. and Bishop, L. (2014). Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, http://repository.jisc.ac.uk/5662/1/KE _report-incentives-for-sharing- researchdata.pdf Benefits: Impact and Longevity
  • 12. “Data is necessary for reproducibility of computational research” Victoria Stodden, “Innovation and Growth through Open Access to Scientific Research: Three Ideas for High-Impact Rule Changes” in Litan, Robert E. et al. Rules for Growth: Promoting Innovation and Growth Through Legal Reform. SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, February 8, 2011. http://papers.ssrn.com/abstract=1757982. Benefits: Quality
  • 13. Baker, M. (2016) “1,500 scientists lift the lid on reproducibility”, Nature, 533:7604, http://www.nat ure.com/news/1 -500-scientists- lift-the-lid-on- reproducibility- 1.19970
  • 14. “Conservatively, we estimate that the value of data in Australia’s public research to be at least $1.9 billion and possibly up to $6 billion a year at current levels of expenditure and activity. Research data curation and sharing might be worth at least $1.8 billion and possibly up to $5.5 billion a year, of which perhaps $1.4 billion to $4.9 billion annually is yet to be realized.” • “Open Research Data”, Report to the Australian National Data Service (ANDS), November 2014 - John Houghton, Victoria Institute of Strategic Economic Studies & Nicholas Gruen, Lateral Economics Benefits: Financial
  • 15. J. Manyika et al. "Open data: Unlocking innovation and performance with liquid information" McKinsey Global Institute, October 2013
  • 16. “If we are going to wait five years for data to be released, the Arctic is going to be a very different place.” Bryn Nelson, Nature, 10 Sept 2009 http://www.nature.com/nature/jour nal/v461/n7261/index.html Benefits: Speed https://www.flickr.com/photos/gsfc/7348953774/ - CC-BY
  • 17. Benefits: Durability Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old” - The odds of a data set being reported as extant fell by 17% per year - Broken e-mails and obsolete storage devices were the main obstacles to data sharing - Policies mandating data archiving at publication are clearly needed “The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes” according to Timothy Vines, one of the researchers. This underscores the need for intentional management of data from all disciplines and opened our conversation on potential roles for librarians in this arena. (“80 Percent of Scientific Data Gone in 20 Years” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in- 20-years.htm.) Vines et al., The Availability of Research Data Declines Rapidly with Article Age, Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
  • 18. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 19. Risks of getting this wrong • Legal – sensitive data is protected by law (and contracts) and needs to be protected • Financial – non-compliance with funder policies can lead to reduced access to income streams • Scientific – potential discoveries may be hidden away in drawers, on USB • Opportunity cost – reduced visibility for research > lost opportunities for collaboration • Quality – the scholarly record becomes less robust • Reputational – responsible data management is increasingly considered a core element of good scholarly practice in the 21st century
  • 20. Growing momentum and ubiquity… Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
  • 21. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 22. Step 1. Be clear about who is involved • RDM is a hybrid activity, involving multiple stakeholder groups… • The researchers themselves • Research support personnel • Partners based in other institutions, funders, data centres, commercial partners, etc • No single person does everything, and it makes no sense to duplicate effort or reinvent wheels • Data Management Planning (DMP) underpins and pulls together different strands of data management activities. DMP is the process of planning, describing and communicating the activities carried out during the research lifecycle in order to… • Keep sensitive data safe • Maximise data’s re-use potential • Support longer-term preservation • Data Management Plans are a means of communication, with contemporaries and future re-users alike
  • 23. Step 2. Write things down • In a data management plan / record • In metadata to describe the data and help others to understand it • In workflows and README files • In version management • In justifying decisions re. access, embargo, selection and appraisal… the list can be very long… Communication is crucial!
  • 24. Step 3. Don’t try to do everything yourself • See Step 1 ;)
  • 25. RDM / Open Data in practice: key points 1. Understand your funder’s policies (and perhaps national policy initiatives – see recent SPARC-Europe reports) 2. Create a data management plan (e.g. with DMPonline) 3. Decide which data to preserve (e.g. using the DCC How-To guide and checklist, “Five Steps to Decide what Data to Keep”) 4. Identify a long-term home for your data (e.g. via re3data.org) 5. Link your data to your publications with a persistent identifier (e.g. via DataCite) • N.B. Many archives, including Zenodo, will do this for you 6. Investigate EU infrastructure services and resources
  • 26. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 27. A few do’s and don’ts for RDM DO DON’T Have a plan for your data Make it up as you go along Keep backups. Make this easy with automated syncing services like Dropbox, provided your data isn’t too sensitive Carry the only copy around on a memory card, your laptop, your phone, etc Describe your data as you collect it. This makes it possible for others to interpret it, and for you to do the same a few years down the line Leave this till the end. The quality of metadata decreases with time, and the best metadata is created at the moment of data capture Save your work in open file formats, where possible, and use accepted metadata standards to enable like-with-like comparison Invent new ‘standards’ where community norms already exist Deposit your data in a data centre or repository, and link it to your publications Be afraid to ask for help. This will exist both within your institution, and via national / European support organisations
  • 28. Rules of thumb • Without intervention, data + time = no data • See Vines, above • Prioritise: could anyone die or go to jail? • Legal issues (e.g. protecting vulnerable subjects) are the most important • Storage is not the same as management • Think of data as plants and the servers as a greenhouse • The plants still need to be fed, watered, pruned, etc… and sometimes disposed of • Management is not the same as sharing • Not all data should be shared • Approach: “As open as possible, as closed as necessary”
  • 29. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 30. • Phase 1 (2014-2016): Spread the Seeds of Open Science and Open Access • Creation of Open Science Taxonomy • 2000+ training materials, categorized in the FOSTER Portal • More than 100 f2f training events in 28 countries and 25 online courses, totalling more than 6300 participants FacilitateOpenScienceTrainingforEuropeanResearch The project http://fosteropenscience.eu
  • 31. • Phase 2 (2017-2019): Let the Flowers of Open Science Bloom • Focus on: • Training for the practical implementation of Open Science (face to face and online) including RDM and Open Data • Developing intermediate/advanced level/discipline-specific training resources in collaboration with three disciplinary communities (and related RIs): Life Sciences (ELIXIR), Social Sciences (CESSDA) and Humanities (DARIAH) • Update the FOSTER Portal to support moderated learning, badges and gamification • In concrete terms: • 150 new training resources • Over 50 training events (outcome-oriented, providing participants with tangible skills) and 20 e-learning courses • Multi-module Open Science Toolkit • Trainers Network, Open Science Bootcamp, Open Science Training Handbook, and more… FacilitateOpenScienceTrainingforEuropeanResearch The project http://fosteropenscience.eu
  • 32. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  • 33. The Digital Curation Centre (DCC) • UK national centre of expertise in digital preservation and data management, est. 2004 • Principal audience is the UK higher education sector, but we increasingly work further afield (continental Europe, North America, South Africa, Asia…) • Provide guidance, training, tools (e.g. DMPonline) and other services on all aspects of research data management and Open Science • Tailored consultancy/training • Organise national and international events and webinars (International Digital Curation Conference, Research Data Management Forum)
  • 34. Contact details • For more information about the FOSTER project: • Website: www.fosteropenscience.eu • Principal investigator: Eloy Rodrigues (eloy@sdum.uminho.pt) • General enquiries: Gwen Franck (gwen.franck@eifl.net) • Twitter: @fosterscience • My contact details: • Email: martin.donnelly@ed.ac.uk • Twitter: @mkdDCC • Slideshare: http://www.slideshare.net/martindo nnelly This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.