SlideShare a Scribd company logo
@openaire_eu
The case of Open
Research Data
HermansEmilie
GhentUniversity
Based on Mounce, R. (2014), “‘The State of Open Research Data”. talk for OpenCon 2014 (Washington D.C.).
https://www.slideshare.net/rossmounce/open-con-mouncedata?qid=d8f441d1-c968-4c4a-ab4d-
eb2d04d7fc3a&v=&b=&from_search=24
Side note….
Whenever I talk about data in this talk,
assume I’m talking about non-sensitive data e.g.
NOT sensitive medical data
NOT bio-weapons research data
etc. etc….
Challenge
Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement
Challenge:
Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement
From liniar
process to
research data
lifecycle!
Open means anyone can
freely access, use,
modify, and share for any
purpose.
Restricted access to limited
amount of people under
certain conditions
Open Data Data sharing
Whatisopendata?
@openaire_eu
Where did
we come from?
Another side note….
Summarizing the state of Open Data is hard
Data sharing (upon request)
e.g. “The full profile listings are on floppy disks
which are available upon request”*
* Fernolz et al (1989) A survey A survey of measurements and measuring techniques in rapidly distorted compressible turbulent boundary layers.
Data sharing in databanks
Datasharingincertaindisciplines
Community agreements
The Bermuda Principles for sharing DNA sequences data
• Automatic release of sequence
assemblies larger than 1 kb
(preferably within 24 hours).
• Immediate publication of finished
annotated sequences.
• Aim to make the entire sequence
freely available in the public domain
Data online as supplementary material
Databydefaultanddatapapers
Data papers
• A searchable metadata document, describing a
particular dataset or a group of datasets, published
as peer—reviewed article
• Primary purpose: to describe data and collection,
rather than to report hypotheses and conclusions.
Journal policy
• Journals are increasingly asking for associated data to be
deposited (PLOS, Springer, Nature, BMC, BMJ….) as well
as required by funders (EC, FWO)
@openaire_eu
WHY?
It’s 2018!
unfortunately….
Introduction to open-data
Research integrity
“It was a mistake in a spreadsheet that could
have been easily overlooked: a few rows left out
of an equation to average the values in a
column. The spreadsheet was used to draw the
conclusion of an influential 2010 economics
paper: that public debt of more than 90% of GDP
slows down growth. This conclusion was later
cited by the International Monetary Fund and
the UK Treasury to justify programmes of
austerity that have arguably led to riots, poverty
and lost jobs.”
Research integrity
Introduction to open-data
1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
Prevents data loss
Maximize usefulness
Write a data paper
Credit & longer shelf life 1
Increases transparency
Promote integrity Influence society
DATAMANAGEMENT
AND
OPENDATA
@openaire_eu
HOW?
What about our project page?
Sustainable?
Services?
Legal aspects?
Technical standards?
Metadata standards?
Findable?
Manyoptionsforsharingdata
Where to deposit data?
• Disciplinary/Institutional data repository
Best practice: Research data repository
• Zenodo cost-free data repository
• Matches data needs
• Directory of data repositories:
www.Re3data.org
FAIRdataprinciples
• How to discover your
data?
• How to understand your
data?
• Where to find your
data?
• Can people access
your data?
• Metadata
• Persistent identifier
• Naming convention
• Keywords
• Versioning
• Software,
documentation
• Data repository
• Open Standards
• Vocabulary
• Methodologies
• Licensing
Findable
ReusableInteroperable
Accessible
FAIR principles
FAIR is best practice
• (Open) licenses for data can help you
greatly
• Can be time-consuming, especially when not
incorporated in research process.
• Importance of commonly used standards,
open file formats and metadata
• e.g. creative commons.
Recommended:
Aim for the (near?) future
It’s somewhere
in some form
It’s somewhere in
a structured form
It’s somewhere in
an open format
And you can
POINT at it!
It can even TALK
(to other data)
5-star deployment scheme for Open Data: 5stardata.info
Hope for the (near) future?
• Research institutions will significantly improve research data
management training for ALL staff & students, old and new alike
• Research funding bodies will tighten-up their rules to ensure
immediate post-publication data sharing. No embargoes, no
bullshit.
• If no published data comes from your funded research, it will negatively
effect your future chances of funding
• Good journals will strictly enforce mandatory data sharing.
Journals that don't will get a bad reputation for irreproducible
research
@openaire_eu
Alternative…
Imagine a world where no-one shared
their data (post-publication)
How would we know what was truth & what was lies / fraud / error?
Imagine the waste of time & resources
if everyone had to re-generate data de novo every time
How would we make progress?
We would be in the dark….
Thank you!
Emilie.herlans@ugent.be
Questions?

More Related Content

PPT
Metadata for Data Rescue and Data at Risk
PPT
Human Genome and Big Data Challenges
PPTX
DataONE Education Module 01: Why Data Management?
PDF
Data management (1)
PPTX
Keeping up to date with information retrieval research: Summarized Research i...
PDF
A basic course on Research data management, part 4: caring for your data, or ...
PDF
Alain Frey Research Data for universities and information producers
Metadata for Data Rescue and Data at Risk
Human Genome and Big Data Challenges
DataONE Education Module 01: Why Data Management?
Data management (1)
Keeping up to date with information retrieval research: Summarized Research i...
A basic course on Research data management, part 4: caring for your data, or ...
Alain Frey Research Data for universities and information producers

What's hot (20)

PPTX
Introduction to data management
PPTX
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
PPTX
Aep mc nairguide
PPTX
Data Management for Librarians
PDF
A basic course on Research data management, part 1: what and why
PPTX
Research Data Management
PPTX
Data and Donuts: The Impact of Data Management
PPT
Data Management for Undergraduate Research
PPTX
Creating Incentives
PPTX
Data and Donuts: How to write a data management plan
PPT
DataCite overview 2014
PDF
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
PPTX
Research data management workshop april12 2016
PPTX
Responsible conduct of research: Data Management
PPTX
Data wranglers in LibraryLand: Finding opportunities in the changing policy l...
PPTX
Introduction to data management
PDF
The State of Open Data Report - Infographic
PPTX
Next generation data services at the Marriott Library
PPTX
Finding statistics2
PDF
Research Data Management: How will Northwestern address new sharing requireme...
Introduction to data management
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Aep mc nairguide
Data Management for Librarians
A basic course on Research data management, part 1: what and why
Research Data Management
Data and Donuts: The Impact of Data Management
Data Management for Undergraduate Research
Creating Incentives
Data and Donuts: How to write a data management plan
DataCite overview 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Research data management workshop april12 2016
Responsible conduct of research: Data Management
Data wranglers in LibraryLand: Finding opportunities in the changing policy l...
Introduction to data management
The State of Open Data Report - Infographic
Next generation data services at the Marriott Library
Finding statistics2
Research Data Management: How will Northwestern address new sharing requireme...
Ad

Similar to Introduction to open-data (20)

PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPTX
Ps rwebinar january2019final
PDF
Open Access Week - Oxford, 20-24 Oct 2014
PPTX
FAIR for the future: embracing all things data
PDF
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
PPTX
Research data life cycle
PDF
The OpenCon Intro to Open Data
PDF
Open Science Governance and Regulation/Simon Hodson
PPTX
Managing and sharing data
PDF
My FAIR share of the work - Diamond Light Source - Dec 2018
PPTX
Research-Data-Management-and-your-PhD
PPTX
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
PPTX
Data Science Meets Biomedicine, Does Anything Change
PDF
Va sla nov 15 final
PDF
Open science curriculum for students, June 2019
PPTX
Critical infrastructure to promote data synthesis
PPTX
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
PPTX
Data as a research output and a research asset: the case for Open Science/Sim...
PDF
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
PDF
Rda nitrd 2015 berman - final
Managing, Sharing and Curating Your Research Data in a Digital Environment
Ps rwebinar january2019final
Open Access Week - Oxford, 20-24 Oct 2014
FAIR for the future: embracing all things data
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Research data life cycle
The OpenCon Intro to Open Data
Open Science Governance and Regulation/Simon Hodson
Managing and sharing data
My FAIR share of the work - Diamond Light Source - Dec 2018
Research-Data-Management-and-your-PhD
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Data Science Meets Biomedicine, Does Anything Change
Va sla nov 15 final
Open science curriculum for students, June 2019
Critical infrastructure to promote data synthesis
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Data as a research output and a research asset: the case for Open Science/Sim...
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Rda nitrd 2015 berman - final
Ad

More from OpenAccessBelgium (20)

PPTX
5_UGent_TrainingCoP_Emilie_v2.pptx
PPTX
2022-11-21_FRDN_open access Belgium FINAL.pptx
PPTX
Leonard&Dhollander_OpenScienceBelgium.pptx
PPTX
7_2022 11 21 OA support_KU Leuven.pptx
PPTX
20221121_OABE_DAFWB_JBiernaux.pptx
PDF
6_ULiege_presentation.pdf
PPTX
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
PPTX
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
PDF
3_OAweek2022_ULB_FVandooren.pdf
PDF
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
PPTX
4_Open Access policy UHasselt.pptx
PPTX
Open science policy in flanders
PPTX
Belgium webinar - openAIRE Research Graph
PDF
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
PPTX
Open access Belgium
PPTX
Zenodo - The catch-all repository
PDF
open peer review at BMC
PDF
Open peer review : Introductuion
PPTX
Open access requirements F.N.R.S.
PDF
20181024 oa week_rdm_myriam_mertens
5_UGent_TrainingCoP_Emilie_v2.pptx
2022-11-21_FRDN_open access Belgium FINAL.pptx
Leonard&Dhollander_OpenScienceBelgium.pptx
7_2022 11 21 OA support_KU Leuven.pptx
20221121_OABE_DAFWB_JBiernaux.pptx
6_ULiege_presentation.pdf
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
3_OAweek2022_ULB_FVandooren.pdf
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
4_Open Access policy UHasselt.pptx
Open science policy in flanders
Belgium webinar - openAIRE Research Graph
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
Open access Belgium
Zenodo - The catch-all repository
open peer review at BMC
Open peer review : Introductuion
Open access requirements F.N.R.S.
20181024 oa week_rdm_myriam_mertens

Recently uploaded (20)

PPTX
2. Earth - The Living Planet earth and life
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
Chemical bonding and molecular structure
PPTX
Microbiology with diagram medical studies .pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
Sciences of Europe No 170 (2025)
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPT
protein biochemistry.ppt for university classes
PDF
The scientific heritage No 166 (166) (2025)
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet earth and life
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Chemical bonding and molecular structure
Microbiology with diagram medical studies .pptx
Biophysics 2.pdffffffffffffffffffffffffff
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
microscope-Lecturecjchchchchcuvuvhc.pptx
neck nodes and dissection types and lymph nodes levels
bbec55_b34400a7914c42429908233dbd381773.pdf
Sciences of Europe No 170 (2025)
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
protein biochemistry.ppt for university classes
The scientific heritage No 166 (166) (2025)
AlphaEarth Foundations and the Satellite Embedding dataset

Introduction to open-data

  • 1. @openaire_eu The case of Open Research Data HermansEmilie GhentUniversity Based on Mounce, R. (2014), “‘The State of Open Research Data”. talk for OpenCon 2014 (Washington D.C.). https://www.slideshare.net/rossmounce/open-con-mouncedata?qid=d8f441d1-c968-4c4a-ab4d- eb2d04d7fc3a&v=&b=&from_search=24
  • 2. Side note…. Whenever I talk about data in this talk, assume I’m talking about non-sensitive data e.g. NOT sensitive medical data NOT bio-weapons research data etc. etc….
  • 4. Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement Challenge: Adapted original source: The University of California, Santa Cruz, Data Management LibGuide, Research Data Management Lifecycle, diagram, viewed 5th May 2018 http://guides.library.ucsc.edu/datamanagement From liniar process to research data lifecycle!
  • 5. Open means anyone can freely access, use, modify, and share for any purpose. Restricted access to limited amount of people under certain conditions Open Data Data sharing Whatisopendata?
  • 7. Another side note…. Summarizing the state of Open Data is hard
  • 8. Data sharing (upon request) e.g. “The full profile listings are on floppy disks which are available upon request”* * Fernolz et al (1989) A survey A survey of measurements and measuring techniques in rapidly distorted compressible turbulent boundary layers.
  • 9. Data sharing in databanks
  • 10. Datasharingincertaindisciplines Community agreements The Bermuda Principles for sharing DNA sequences data • Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). • Immediate publication of finished annotated sequences. • Aim to make the entire sequence freely available in the public domain Data online as supplementary material
  • 11. Databydefaultanddatapapers Data papers • A searchable metadata document, describing a particular dataset or a group of datasets, published as peer—reviewed article • Primary purpose: to describe data and collection, rather than to report hypotheses and conclusions. Journal policy • Journals are increasingly asking for associated data to be deposited (PLOS, Springer, Nature, BMC, BMJ….) as well as required by funders (EC, FWO)
  • 15. Research integrity “It was a mistake in a spreadsheet that could have been easily overlooked: a few rows left out of an equation to average the values in a column. The spreadsheet was used to draw the conclusion of an influential 2010 economics paper: that public debt of more than 90% of GDP slows down growth. This conclusion was later cited by the International Monetary Fund and the UK Treasury to justify programmes of austerity that have arguably led to riots, poverty and lost jobs.”
  • 18. 1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 Prevents data loss Maximize usefulness Write a data paper Credit & longer shelf life 1 Increases transparency Promote integrity Influence society DATAMANAGEMENT AND OPENDATA
  • 20. What about our project page? Sustainable? Services? Legal aspects? Technical standards? Metadata standards? Findable?
  • 22. Where to deposit data? • Disciplinary/Institutional data repository Best practice: Research data repository • Zenodo cost-free data repository • Matches data needs • Directory of data repositories: www.Re3data.org
  • 23. FAIRdataprinciples • How to discover your data? • How to understand your data? • Where to find your data? • Can people access your data? • Metadata • Persistent identifier • Naming convention • Keywords • Versioning • Software, documentation • Data repository • Open Standards • Vocabulary • Methodologies • Licensing Findable ReusableInteroperable Accessible FAIR principles
  • 24. FAIR is best practice • (Open) licenses for data can help you greatly • Can be time-consuming, especially when not incorporated in research process. • Importance of commonly used standards, open file formats and metadata • e.g. creative commons. Recommended:
  • 25. Aim for the (near?) future It’s somewhere in some form It’s somewhere in a structured form It’s somewhere in an open format And you can POINT at it! It can even TALK (to other data) 5-star deployment scheme for Open Data: 5stardata.info
  • 26. Hope for the (near) future? • Research institutions will significantly improve research data management training for ALL staff & students, old and new alike • Research funding bodies will tighten-up their rules to ensure immediate post-publication data sharing. No embargoes, no bullshit. • If no published data comes from your funded research, it will negatively effect your future chances of funding • Good journals will strictly enforce mandatory data sharing. Journals that don't will get a bad reputation for irreproducible research
  • 27. @openaire_eu Alternative… Imagine a world where no-one shared their data (post-publication) How would we know what was truth & what was lies / fraud / error? Imagine the waste of time & resources if everyone had to re-generate data de novo every time How would we make progress? We would be in the dark….

Editor's Notes

  • #4: Idea- experiment – data analyse and writing paper – finally time for some pizza while paper gets reviewed – paper: jeej, al your hard work dissapears
  • #5: FROM DATA IN A SCIENTIFIC PIPELINE TO RESEARCH DATA LIFECYCLE Managing data in a research project is a process that runs throughout the project. Good data management is one of the foundations for reproducible research. Good management is essential to ensure that data can be preserved and remain accessible in the long-term, so it can be re-used and understood by future researchers. Begin thinking about how you’ll manage your data before you start collecting it.
  • #6: Open data is data that is free to access, reuse, repurpose, and redistribute. The Open Research Data Pilot aims to make the research data generated by selected Horizon 2020 projects accessible with as few restrictions as possible, while at the same time protecting sensitive data from inappropriate access Data sharing restricted data to restricted organisations or individuals. Access to this data is usually restricted because it is sensitive in some way, either because it is personal or because its general release might cause security problems.
  • #9: expiration date of mediums and data
  • #10: GenBank is a sequence database released in 1982. being one of the earliest bioinformatics community projects on the Internet
  • #11: The Bermuda Principles set out rules for the rapid and public release of DNA sequence data. The Human Genome Project, a multinational effort to sequence the human genome, generated vast quantities of data about the genetic make-up of humans and other organisms. But, in some respects, even more remarkable than the impressive quantity of data generated by the Human Genome Project is the speed at which that data has been released to the public. At a 1996 summit in Bermuda, leaders of the scientific community agreed on a groundbreaking set of principles requiring that all DNA sequence data be released in publicly accessible databases within twenty-four hours after generation. These “Bermuda Principles” (also known as the "Bermuda Accord") contravened the typical practice in the sciences of making experimental data available only after publication. These principles represent a significant achievement of private ordering in shaping the practices of an entire industry and have established rapid pre-publication data release as the norm in genomics and other fields. The three principles retained originally were: Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). Immediate publication of finished annotated sequences. Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society.
  • #18: Innovatiion and progres s. a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be public immediately.” , At first, the collaboration struck many scientists as worrisome — they would be giving up ownership of data, and anyone could use it, publish papers, maybe even misinterpret it and publish information that was wrong.
  • #19: Prevents data loss: 80% of data is lost after 10 years. Data is fragile and reproducibility very difficult without data. 2, Maximize usefulness and built much more efficient on previous work: Maximize usefulness: organize, make understandable, reusable and avoid duplication. Preserves data for further research by organizing, Stop drowning in irrelevant stuff. Reproducibility crisis. 3. Fosters creativity, interdisciplinary use of data and meta-analysis 4, public participation in scientific research 5. Promote integrity and increases transparency: managing data is part of good research, avoid accusations of sloppy science 4. Data tend to have a (much!) longer shelf life than interpretation After accounting for other factors affecting citation rate, we find a robust citation benefit from open data.1
  • #24: Interoperability: how can my data be combined with other datasets and used in other fields? Licensing: who can access my data and for what perpuse can it be used
  • #26: 3 stars: You can manipulate the data in any way you like 4 stars: link to it, bookmark it, reuse parts of the data, combine with other data 5 stars: discover more related data,