SlideShare a Scribd company logo
Econometrics of Panel Data and Network Analysis
Research Data
Management
Module 1
Dr. Peter Löwe
Berlin, 03. 08. 2017
Agenda
1. Why bother: A crisis, horror stories & a Panda-Oncologist
2. Size is relative: Doctor House, Big Data, and a long tail
3. Reality Check: Doing science in the 21st century
4. Research Data Management according to Gollum and XKCD
5. Persistent Identifiers: Digital dog tags for everything and everyone !
6. Research Data Repositories & good reads
7. Conclusion: Culture change & happy Pandas
3.1. Unterpunkt Nummer eins
3.2. Nächster Unterpunkt
3.3. Und noch ein Unterpunkt
Peter Löwe 2017-08-02
Research Data Management: Module 12
1 Today‘s menue
Peter Löwe 2017-08-02
Research Data Management: Module 13
• Why Research Data Management matters and how it
should work (perfect world)
• How stuff currently works (state of the art)
• How stuff will work soon (outlook)
• How to get started (self help)
1 Drivers for Research Data Management
Peter Löwe 2017-08-02
Research Data Management: Module 14
https://www.kent.ac.uk/library/research/data-
management/manage.html
Why you should care (internal motivation)
• Increase the efficiency of your research process
• Avoid losing data
• Enable data re-use and sharing
Why you are going care (external motivation)
• Meet the requirements of research funders and your institute
• Comply with the policies of a growing number journal publishers on
making the data underlying publications available
• Increase your visibility (citations)
1 Research Data includes
Peter Löwe 2017-08-02
Research Data Management: Module 15
• Questionnaires/surveys
• Raw experimental data
• Analysed data
• Databases
• Simulations and research code (software)
• Audio-visual materials
• Laboratory and field notes
• Clinical data, including clinical records
• Images and photographs
1 The Research Data Spectrum
Peter Löwe 2017-08-02
Research Data Management: Module 16
• Hand written letters
• Images or photos
• Soil samples
• Tissue samples
• Archeological dig sites
• …..
• Scanned & OCR version
• Scanned digital version
• Analysed result of samples
• Analysed result of samples
• 3D models of the dig site
• …..
Physical Digital
1 Issue: The Reproducibility Crisis
Peter Löwe 2017-08-02
Research Data Management: Module 17
Nature 533, 452–454 (26 May 2016) doi:10.1038/533452a
https://www.slideshare.net/AustralianNationalDataService/research-data-management-in-practice-ria-data-management-
workshop-brisbane-2017
• A methodological crisis in
science
• the phrase was coined in the
early 2010s as part of a
growing awareness of the
problem
• 2016: poll of 1,500 scientists
• 70% of them had failed to
reproduce at least one other
scientist's experiment
• results of many scientific
studies are difficult or
impossible to replicate on
subsequent investigation
https://en.wikipedia.org/wiki/Replication_crisis
1 Data Sharing and Management Snafu in 3 Short Acts
Peter Löwe 2017-08-02
Research Data Management: Module 18
[Snafu: „Situation normal, all f***ed up“]
1 Video
Peter Löwe 2017-08-02
Research Data Management: Module 19
1 Discussion
Peter Löwe 2017-08-02
Research Data Management: Module 110
Have you encountered something similar ?
How to deal with such a situation ?
Where do you store your data?
How much data would you lose if your laptop was stolen?
1
Reproducibility decreases of time
due to increasing data loss over time
Peter Löwe 2017-08-02
Research Data Management: Module 111
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
“In their parents' attic, in boxes in the garage, or stored on now-defunct
floppy disks — these are just some of the inaccessible places in which
scientists have admitted to keeping their old research data. Such practices
mean that data are being lost to science at a rapid rate, a study has now
found.”
1 Night of the Living Data
Peter Löwe 2017-08-02
Research Data Management: Module 112
http://www.eweek.com/database/5-data-management-horror-stories-to-avoid
1 Self-help Groups
Peter Löwe 2017-08-02
Research Data Management: Module 113
1 Way Out: Keep Science FAIR (perfect world)
Peter Löwe 2017-08-02
Research Data Management: Module 114
Principles to ensure research data is FAIR:
Findable, Accessible, Interoperable, Reusable
“The problem the FAIR Principles address is the lack of widely shared, clearly
articulated, and broadly applicable best practices around the publication of scientific
data”
“FAIRness is a prerequisite for proper data management and
data stewardship”
Mark D. Wilkinson et al. The FAIR Guiding Principles for scientific data management and
stewardship, Scientific Data (2016). DOI: 10.1038/sdata.2016.18
Data Storage Evolution
https://www.nimbushosting.co.uk/evolution-data-storage/
We are
here
Ancient
times
•2
https://villagevoice.freetls.fastly.net/wp-content/uploads/2014/08/beatleboys560.jpg
2 Life Expectancy of Digital Storage Media
Peter Löwe 2017-08-02
Research Data Management: Module 116
http://www.zeit.de/wissen/2013-10/s37-infografik-speichermedien.pdf
https://homsum.files.wordpress.com/2014/04/dr_house_hugh_laur
ie_desktop_1152x864_wallpaper-83467.jpg
2 Life Expectancy of Digital Storage Media
Peter Löwe 2017-08-02
Research Data Management: Module 117
Storage capacity grows,
but not the lifespan
Average life-span:
about 10- 30 years
2 Big Data Buzzwords: The Four V‘s
Peter Löwe 2017-08-02
Research Data Management: Module 118
2
Size is not everything:
Big Data and the Long Tail of Science
Peter Löwe 2017-08-02
Research Data Management: Module 119
http://www.nature.com/neuro/journal/v17/n11/full/nn.3838.html
Big data from small data:
data-sharing in the 'long tail' of neuroscience
Long Tail of Science
• {Astro|Nuclear}-
physics,
• Genome studies,
• Remote Sensing
Overall amount
continues to
increases due to
„Big Data“
(Volume | Velocity)
3 Data-driven Science
Peter Löwe 2017-08-02
Research Data Management: Module 120
http://www.allthingsdistributed.com/2007/02/help_find_jim_gray.html
Paradigms of Science:
1. empirical,
2. theoretical,
3. Computational
4. data-driven
3 The Fourth Paradigm
Peter Löwe 2017-08-02
Research Data Management: Module 121
"It's the data, stupid"
Dr Gray's call-to-arms was [..] “to have a world
in which
• all of the science literature is online,
• all of the science data is online, and they
• interoperate with each other.”
3 Innovation in Science travels at different velocities
Peter Löwe 2017-08-02
Research Data Management: Module 122
• Science in general is affected by digital innovation
• Every field of science is different
• but some are more ahead embracing different aspects of change.
• Exchange of lessons learned across disciplines needed.
http://i.quoteaddicts.com/media/q1/1487862.png
The Lifecycle of a Scientific Idea
(Elegant High Level Perspective)•3
Influeced by computer-driven Science
and „Big Data“ ?
The Lifecycle of a Scientific Idea :
Reality check
1. Formulate a theory
2. Gather data
3. Learn about data storage
4. Learn about data
movement protocols
5. Lose data
6. Check out of rehab
7. Learn about backup and
replication
8. Gather data
9. Learn about versioning
10. Start preliminary analysis
11. Buy a newer laptop
12. Buy more memory
13. Buy a desktop with more
memory
14. Buy a bigger monitor &
GPUs “for work”
15. Google “250GB Excel
Spreadsheet”
16. Learn about batch
processing
17. Learn about batch
schedulers
18. Learn about patience.
19. Learn more about data
storage
20. Learn about distributed
systems.
21. Go back through notes to
remember the science
question.
22. Learn R & Python
23. Learn linux admin
24. Finish preliminary
analysis.
25. Grow a ponytail
26. Write a paper.
27. Learn about data
publishing
28. Learn about
reproducibility
29. Plot the death of your
advisor/dept. head
30. Apply for grants & research
allocations on public
systems
31. Wait to apply next time
32. Finish analyzing data
33. Reformulate your theory
34.Goto 1
Source: John Fonner (2016) Jupyter Ascending, http://bit.ly/2vmTwCR
Reality Check:
Science is green IT & the rest is blue
Data-wrangling is red
•3
Many data-wrangling challenges !
4
Data Wrangling:
Research Data Management (RDM)
Peter Löwe 2017-08-02
Research Data Management: Module 125
http://www.oclc.org/content/dam/research/images/publications/rdm-framework-4-with-cc.png
Today‘s
menue
YOU
Infrastructure
(is there one - yet ?)
4
RDM
Responsibilities before, during and after a research project
Peter Löwe 2017-08-02
Research Data Management: Module 126
data/assets/pdf_file/0009/394056/research-data-management-in-practice.pdf
YOU
4 Data Curation Continuum
Peter Löwe 2017-08-02
Research Data Management: Module 127
Transfer Transfer Publication
Personal
domain
Group
domain
Persistent
domain
Access
domain
Gliederung des Data Curation Continuum in vier Verantwortungsdomänen.. Im Prozess des
Datentransfers werden die vorliegenden Metadaten um weitere Elemente angereichert.
(Nach Klump, 2009)
Post ResearchPre Research
Research
4 Pre Research: Institutional Requirements
Peter Löwe 2017-08-02
Research Data Management: Module 128
Institutional Policy and
Procedures
Support services - people and
other means of providing advice
and support
IT Infrastructure - the
hardware, software and other
facilities
Metadata management - so that
data records can be meaningful
and fit for purpose
Institutional Data
Management
Framework
4 Pre Research: Data Management Plan (perfect world)
Peter Löwe 2017-08-02
Research Data Management: Module 129
 data organisation and storage;
 metadata standards and guidelines;
 backups;
 archiving for long-term preservation;
 version control and derived data products;
 data sharing or publishing intentions, including licensing;
 ensuring security of confidential data;
 data synchronisation; and
 governance, roles and responsibilities.
4 Documentation 101
Peter Löwe 2017-08-02
Research Data Management: Module 130
a) Document your data sets.
b) Ask your data repository how to document correctly (Metadata !)
c) If you do not document, you‘re wasting an opportunity to receive credit
by citation and reuse
d) Not to be missed:
 Topic (keywords, controleld vocabulary, abstract)
 Observation unit (counties, people, etc)
 Database (random sampling, complete survey, etc.)
 Sampling method
 Extent
 Access: Limitations, embargo, POC
4 Metadata 101
Peter Löwe 2017-08-02
Research Data Management: Module 131
Metadata (structured data about the data)
• Who collected the data?
• Who funded the research project?
• When (and where) was it collected?
• Instruments and setting for collecting the data?
• Title of the dataset
• Methods used to process the data
• Etc. etc.
4 Appropriate File Formats
Peter Löwe 2017-08-02
Research Data Management: Module 132
• Open and non-proprietary
• Human readible, non-binary
• Patent-free
• ISO-standards
• textual data: XML, TXT, HTML, PDF/A (Archival PDF)
• Tabular data (spreadsheets): CSV
• Databases: XML, CSV
• Images: TIFF, PNG, JPEG*
• Audio: FLAC, WAV, MP3
4 Include a Manifest / readme File !
Peter Löwe 2017-08-02
Research Data Management: Module 133
4 Data Life Cycle: Personal Domain Perspective
Peter Löwe 2017-08-02
Research Data Management: Module 134
http://cdn.ttgtmedia.com/informationsecurity/images/vol4iss7/ism_v4i7_f4_DataLifecycle.gif
Most critical stage in the research
data lifecycle is the completion of
the research project. In the most
cases there is no follow up funding
to maintain the research data. Also,
the scientist has to focus on the
next project.
!!!
4 Publishing and Sharing Data
Peter Löwe 2017-08-02
Research Data Management: Module 135
Publishing and Sharing data ≠ Open Access to data
• “Open” and “Closed” are relative concepts.
• “Closed” ≈ conditional access based on individual
permission
• “Closed” ≈ conditional access based on roles
Metadata Research Data
Open Open
Open Closed
Closed Open
Closed Closed
4 Continual data curation across domains
Peter Löwe 2017-08-02
Research Data Management: Module 136
4 Data Curation Continuum: Visibility und Circulation
Peter Löwe 2017-08-02
Research Data Management: Module 137
Transfer Transfer Publication
Personal
domain
Group
domain
Persistent
domain
Access
domain
Low
visibility
High
visibility
4 Data Delay Strategies ?
Peter Löwe 2017-08-02
Research Data Management: Module 138
https://www.explainxkcd.com/wiki/index.php/1805:_Unpublished_Discoveries
4 The Grant Cycle according to XKCD (and Machiavelli ?)
Name + Datum (über Kopf- und Fußleiste einstellen)
Titel und Untertitel39
http://phdcomics.com/comics/archive.php?comicid=1431
4 The Reputation Economy
Peter Löwe 2017-08-02
Research Data Management: Module 140
Open Access to Data:
• Science has become a reputation economy
• The fundamental difference between disciplines is the trade-off between reputation
and collaboration at points of the reputation economy where changes in the form of
capital occur.
• Sharing data as a form of collaboration must be balanced by a similar gain in
reputation.
• […]collaborative disciplines enforce data sharing as a social norm where non-
compliance will result in some form of penalty […]
4
Research Parasites Paradigm:
Open Access for Data is evil
Peter Löwe 2017-08-02
Research Data Management: Module 141
https://media.tenor.com/images/236ee382fdf16973567dc3bb44c21
b51/tenor.gif
Lego
Gollum
4
Alternative Paradigm:
Sharing the fire of the Open Data „torch“
Peter Löwe 2017-08-02
Research Data Management: Module 142
4
A Solution for the Crisis
Open Science enables Reproducible Science
Peter Löwe 2017-08-02
Research Data Management: Module 143
https://en.wikipedia.org/wiki/Op
en_science#/media/File:Open_
Science_-_Prinzipien.png
Benefits:
• Greater availability
and accessibility of
publicly funded
scientific research
outputs;
• Possibility for
rigorous peer-review
processes;
• Greater
reproducibility and
transparency of
scientific works;
• Greater impact of
scientific research.
Open Science is the
movement to make
scientific research
and data accessible
to all
4 Reality check: Gollum (still) beats Prometheus by 10:1
Peter Löwe 2017-08-02
Research Data Management: Module 144
https://s-media-cache-
ak0.pinimg.com/originals/21/94/ed/2194ed6879d5bfd93679326508d382cd.jpg
• Gift culture still prevails
• It‘s not the technology
• It‘s not the generational change
• How to trigger cultural change ?
Science Technology Medicine (STM):
2006-2016: ~ 30 million papers published
~ 3 million data publications
(Klump 2017)
10:1
4
Pradigm Change induced by Funding Agencies:
Watering hole approach instead of stick & carrot
Peter Löwe 2017-08-02
Research Data Management: Module 145
http://i.dailymail.co.uk/i/pix/2016/01/14/17/3025C04C00000578-3398562-image-a-16_1452793763082.jpg
Carrot & stick
did not work
Control the watering hole:
Works (for now)
4 FAIR principles: As guidelines
Peter Löwe 2017-08-02
Research Data Management: Module 146
https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg
http://www.macs.hw.ac.uk/~ajg33/wp-
content/uploads/2016/03/FAIR-Article-Poster.jpg
“The problem the FAIR Principles address
is the lack of widely shared, clearly
articulated, and broadly applicable best
practices around the publication of
scientific data”
5 Technical Requirement for FAIR
Peter Löwe 2017-08-02
Research Data Management: Module 147
• Easy and permanent access to
research data via the internet
• Enhanced discovery, retrieval
and management of data to
enable data reuse and
verification of research results
5 Benefits of Citation
Peter Löwe 2017-08-02
Research Data Management: Module 148
• Including citable data in related publications increases
the citation rate of those publications
• Only cited data can be counted and tracked (in a similar
manner to journal articles) to measure impact
• Routine citation of data will assist in gaining
acknowledgement of data as a first class research output
• Citations for published data can be included in CVs along
with journal articles, reports and conference papers
5
Technical Challenge:
Unbreakable internet-based Citation
Peter Löwe 2017-08-02
Research Data Management: Module 149
Stable linking needed
• Data will move, URL links to Webpages will break.
• Unbreakable alternative needed !
5 Digital Object Identifiers (DOI)
Peter Löwe 2017-08-02
Research Data Management: Module 150
• International DOI Foundation was founded in 1998.
• The DOI system offers long-term persistence and
accessibility of data.
• Based on the Handle system.
• In May 2012 the DOI System ISO Standard 26324 was
published.
• Part of the quality control is mandatory metadata for
each object registered with a DOI.
5 What is a DOI ?
Peter Löwe 2017-08-02
Research Data Management: Module 151
DOI: Acronym for "digital object identifier“.
A DOI name is an identifier (not a location) of an entity on digital
networks.
What you see: alphanumeric string (never changes)
Associated with: location (such as URL)
Accompanied with: who, what, when… (metadata)
5
DataCite Metadata Schema
Mandatory properties
Peter Löwe 2017-08-02
Research Data Management: Module 152
Part of the quality control is mandatory metadata for each
object registered with a DOI:
• Identifier (with type attribute)
• Creator (with type and nameIdentifier attributes)
• Title (with optional type attribute)
• Publisher
• PublicationYear
5 DOI is a quality label for data
Peter Löwe 2017-08-02
Research Data Management: Module 153
Datasets with a DOI have to be:
Stable (i.e. not going to be modified)
Complete (i.e. not going to be updated)
Permanent – by assigning a DOI we’re committing to make
the dataset available for posterity
Good quality – by assigning a DOI its receiving the data
centre’s stamp of approval, saying that it’s complete and all
the metadata is available
DOI:
Seal of
Approval
5 DOI for Research Data
Peter Löwe 2017-08-02
Research Data Management: Module 154
https://support.datacite.org/docs/doi-basics
5 DOI Citation Examples
Peter Löwe 2017-08-02
Research Data Management: Module 155
Fahrenberg, Jochen (2010): Freiburger Beschwerdenliste FBL. Primärdaten der
Normierungsstichprobe 1993. Version 1.0.0. ZPID- Leibniz-Zentrum für Psychologische
Information und Dokumentation.
Dataset. http://doi.org/10.5160/psychdata.fgjn05an08
Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Weßels, Bernhard(2012):
Wahlkampf-Panel (GLES 2009). Version: 3.0.0. GESIS Datenarchiv.
Dataset.doi:10.4232/1.11131.
Schupp, Jürgen; Kroh, Martin; Goebel, Jan; Bartsch, Simone; Giesselmann, Marco et.
al. (2013): Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2012. Version: 29.
SOEP- Sozio-oekonomisches Panel.
Dataset. doi:10.5684/soep.v29.
5 DOI System Architecture
Peter Löwe 2017-08-02
Research Data Management: Module 156
5 DataCite Services
Peter Löwe 2017-08-02
Research Data Management: Module 157
Search.datacite.org
5 Upcoming: Search DOI-registered datasets by ORCID
Peter Löwe 2017-08-02
Research Data Management: Module 158
Find any DOI-registered
publication by ORCID
http://dashboard.project-thor.eu
Example: Löwe / Loewe / Lowe ?
Which of the four Peter Löwe ?
6 Data Curation Continuum: Research Data Repositories
Peter Löwe 2017-08-02
Research Data Management: Module 159
Transfer Transfer Publication
Personal
domain
Group
domain
Persistent
domain
Access
domain
Low
visibility
High
visibility
6 re3data: Registry of Research Data Repositories
Peter Löwe 2017-08-02
Research Data Management: Module 160
1,500 research dara repositories
described by tags:
6 re3data: Search options
Peter Löwe 2017-08-02
Research Data Management: Module 161
6 Research Data Repository (RDR) Development and Services
Peter Löwe 2017-08-02
Research Data Management: Module 162
Currently, DFG funds two RDR-related Projects:
1. SowiDataNet: addressing the social sciences
2. RADAR: addressing the long tail of Science
Technology and Metadata are compatible.
RADAR is a service offering by FIZ Karlsruhe (testing phase)
Near future:
• SowiDtaaNet will become a serice offering (GESIS)
• Datorium will merge with SowiDataNet
6 RADAR: Research Data Repository Services
Peter Löwe 2017-08-02
Research Data Management: Module 163
Van den Broel K, Furtado F, Engel T (2015): RADAR – A Research Data Repository for the “Long-Tail of Science”
6
RADAR:
Research Data Repositories Roles & Responsibilities
Peter Löwe 2017-08-02
Research Data Management: Module 164
6
Datorium.gesis.org: Repository for Social Science and
Economic Science
Peter Löwe 2017-08-02
Research Data Management: Module 165
6 Datorium: Data Set Description
Peter Löwe 2017-08-02
Research Data Management: Module 166
6 Datorium: Terms of Access
Peter Löwe 2017-08-02
Research Data Management: Module 167
4 Where NOT to „publish“ your Data
Peter Löwe 2017-08-02
Research Data Management: Module 168
Required:
Professional repositories which enable
• long term access,
• search,
• retrieval,
• thorough metadata
6
Alternative (Self help):
All-purpose Repositories
Peter Löwe 2017-08-02
Research Data Management: Module 169
Rueda, Laura. (2017, May). Introduction to DataCite. Zenodo.
http://doi.org/10.5281/zenodo.571808
6 OPENAIRE: RDM on the European Level
Peter Löwe 2017-08-02
Research Data Management: Module 170
https://www.openaire.eu/
https://www.slideshare.net/OpenAIRE_eu/enabling-better-science-results-and-vision-of-the-openaire-infrastructure-and-rda-
data-publishing-working-group-55075375
6 Adoption of Open Science in Europe
Peter Löwe 2017-08-02
Research Data Management: Module 171
https://www.fosteropenscience.eu/
6
Forschungsdaten
in den Sozial- und Wirtschaftswissenschaften
Peter Löwe 2017-08-02
Research Data Management: Module 172
http://dx.doi.org/10.4232/10.fisuzida2014.1
http://auffinden-zitieren-dokumentieren.de
6 Handbuch Forschungsdatenmanagement
Peter Löwe 2017-08-02
Research Data Management: Module 173
ISBN 978-3-88347-283-6 PDF: http://bit.ly/2uPJdaf
6 Rat für Sozial- und Wirtschaftdaten / DFG
Peter Löwe 2017-08-02
Research Data Management: Module 174
http://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsd
aten/basisinformationen_forschungsdatenmanagement.pdf
6 WIKI: FORSCHUNGSDATEN.ORG
Peter Löwe 2017-08-02
Research Data Management: Module 175
http://www.forschungsdaten.org
6 RESEARCH DATA ALLIANCE
Peter Löwe 2017-08-02
Research Data Management: Module 176
https://www.rd-alliance.org/
6 Data Carpentry Workshops
Peter Löwe 2017-08-02
Research Data Management: Module 177
http://www.datacarpentry.org/
7 AUSTRALIAN NATIONAL DATA SERVICE (ANDS)
Peter Löwe 2017-08-02
Research Data Management: Module 178
7 Wise Advise
Peter Löwe 2017-08-02
Research Data Management: Module 179
https://nicolahemmings.wordpress.com/2016/04/05/mistakes-ive-
made-as-an-early-career-researcher/
Mistakes I’ve made as an early career researcher
APRIL 5, 2016
Nicola Hemmings (post-doc, University of Sheffield)
Failing to organise my data adequately (circa 2007).
“Prepare your datasets like you would if you were giving them to a
stranger who knew nothing about them. Label, annotate and
meticulously file your R scripts. Incorporate read-me files into everything
and write them for the monkey that will be you in five years, when you
return to your data and/or analyses for some unforeseen but vitally
important reason. Don’t get this wrong. You will regret it.“
7
Back to the start:
Snafu ? Things are getting better
Peter Löwe 2017-08-02
Research Data Management: Module 180
• This film is scientific nontextual information
• It is available on the AV-portal of TIB Hannover, a data portal for
scientic audiovisual content.
• DOI-link: https://doi.org/10.5446/31036
Vielen Dank für Ihre Aufmerksamkeit.
DIW Berlin — Deutsches Institut
für Wirtschaftsforschung e.V.
Mohrenstraße 58, 10117 Berlin
www.diw.de
Redaktion
Peter Löwe (ploewe@diw.de)
http://dilbert.com/strip/2010-08-24
Based on the works of
• Paul Wong (2017) ANDS,Research Integrity Advisor Data Management Workshop
• 3TU.Datacentre (2014): Data citation and DOIs
• and others
Vielen Dank für Ihre Aufmerksamkeit.
DIW Berlin — Deutsches Institut
für Wirtschaftsforschung e.V.
Mohrenstraße 58, 10117 Berlin
www.diw.de
Redaktion
Peter Löwe (ploewe@diw.de)

More Related Content

PPTX
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
PPTX
Introduction to data management
PDF
Share and Reuse: how data sharing can take your research to the next level
PDF
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
PPTX
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
PPTX
HKU Data Curation MLIM7350 Class 8
PPTX
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
PDF
CODATA International Training Workshop in Big Data for Science for Researcher...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Introduction to data management
Share and Reuse: how data sharing can take your research to the next level
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
HKU Data Curation MLIM7350 Class 8
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
CODATA International Training Workshop in Big Data for Science for Researcher...

What's hot (20)

PPTX
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
PPTX
Machine Learning for Data Extraction
PDF
Reproducible research: First steps.
PDF
A basic course on Research data management: part 1 - part 4
PPTX
Towards open and reproducible neuroscience in the age of big data
PPTX
Introduction to research data management; Lecture 01 for GRAD521
PDF
On community-standards, data curation and scholarly communication - BITS, Ita...
PPT
IDs书友会 - 主题1 - Swinburne Next Generation Research
PPTX
Developing data services: a tale from two Oregon universities
PDF
Guy avoiding-dat apocalypse
PPTX
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
PPTX
Introduction to Data Management
PDF
Talk at OHSU, September 25, 2013
PPTX
Machines are people too
PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
PDF
Data management (1)
PDF
Planning for Research Data Management
PDF
Brain Imaging Data Structure and Center for Reproducible Neuroscince
PDF
Research Data Management and the Research Data Lifecycle: a Gentle Introduction
PPT
Science20brussels osimo april2013
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Machine Learning for Data Extraction
Reproducible research: First steps.
A basic course on Research data management: part 1 - part 4
Towards open and reproducible neuroscience in the age of big data
Introduction to research data management; Lecture 01 for GRAD521
On community-standards, data curation and scholarly communication - BITS, Ita...
IDs书友会 - 主题1 - Swinburne Next Generation Research
Developing data services: a tale from two Oregon universities
Guy avoiding-dat apocalypse
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
Introduction to Data Management
Talk at OHSU, September 25, 2013
Machines are people too
The Roots: Linked data and the foundations of successful Agriculture Data
Data management (1)
Planning for Research Data Management
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Research Data Management and the Research Data Lifecycle: a Gentle Introduction
Science20brussels osimo april2013
Ad

Similar to Research Data Management for Econometrics (20)

PPTX
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
PPTX
Publishing your research: Research Data Management (Introduction)
PDF
Research Data Management Services at UWA (July 2015)
PPTX
Winter school in research data science research data management - final
PPTX
Research Data Management
PPTX
Managing and Sharing Research Data: Good practices for an ideal world...in th...
PDF
Data Management for the Digital Humanities
PPTX
Practical Research Data Management: tools and approaches, pre- and post-award
PPT
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
PPTX
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
PPTX
Data Literacy: Creating and Managing Reserach Data
PDF
UCT eResearch Emerging Researcher Series: RDM
PPTX
Introduction to data management
PPTX
20160414 23 Research Data Things
PPTX
Research data management for masters and ph d students
PPSX
Managing Your Research Data for Maximum Impact -Rob Daley 300616_Shared
PDF
Natasha intro to rdm c3 dis may 2018.pptx
PPTX
Managing and sharing data
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Publishing your research: Research Data Management (Introduction)
Research Data Management Services at UWA (July 2015)
Winter school in research data science research data management - final
Research Data Management
Managing and Sharing Research Data: Good practices for an ideal world...in th...
Data Management for the Digital Humanities
Practical Research Data Management: tools and approaches, pre- and post-award
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
Data Literacy: Creating and Managing Reserach Data
UCT eResearch Emerging Researcher Series: RDM
Introduction to data management
20160414 23 Research Data Things
Research data management for masters and ph d students
Managing Your Research Data for Maximum Impact -Rob Daley 300616_Shared
Natasha intro to rdm c3 dis may 2018.pptx
Managing and sharing data
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Ad

More from Peter Löwe (20)

PPTX
EGU GA 2018 OSGeo Townhall
PPTX
EGU GA 2017 OSGeo Townhall
ODP
EGU GA 2014 OSGeo Townhall
PPTX
EGU 2013 Splinter Meeting: FOSS in the Geosciences
ODP
2012 egu foss_splinter_session
PDF
INTEGRATION OPTIONS FOR PERSISTENT IDENTIFIERS IN OSGEO PROJECT REPOSITORIES:...
PDF
FOSSGIS 2015: Das audiovisuelle Erbe der OSGeo-Projekte
PDF
Unlocking conference videos by DOI/MFID for software project communities
PDF
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
PDF
The TIB|AV Portal : OSGeo conference videos as a resource for scientific res...
PDF
TIB's action for research data managament as a national library's strategy in...
PDF
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
PDF
GIS DAY 2015: Guerilla globes
PDF
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
PDF
First public screening of the high resolution version of the GRASS GIS video...
PDF
GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre...
PDF
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
PDF
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
PDF
Data Science: History repeated? – The heritage of the Free and Open Source GI...
PDF
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
EGU GA 2018 OSGeo Townhall
EGU GA 2017 OSGeo Townhall
EGU GA 2014 OSGeo Townhall
EGU 2013 Splinter Meeting: FOSS in the Geosciences
2012 egu foss_splinter_session
INTEGRATION OPTIONS FOR PERSISTENT IDENTIFIERS IN OSGEO PROJECT REPOSITORIES:...
FOSSGIS 2015: Das audiovisuelle Erbe der OSGeo-Projekte
Unlocking conference videos by DOI/MFID for software project communities
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
The TIB|AV Portal : OSGeo conference videos as a resource for scientific res...
TIB's action for research data managament as a national library's strategy in...
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS DAY 2015: Guerilla globes
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
First public screening of the high resolution version of the GRASS GIS video...
GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre...
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
Tectonic Storytelling with Open Source and Digital Object Identifiers - a cas...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)

Recently uploaded (20)

PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PDF
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PPTX
introuction to banking- Types of Payment Methods
PPTX
4.5.1 Financial Governance_Appropriation & Finance.pptx
PDF
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PDF
how_to_earn_50k_monthly_investment_guide.pdf
PPTX
Introduction to Customs (June 2025) v1.pptx
PDF
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
PPTX
Session 3. Time Value of Money.pptx_finance
PPT
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
PPTX
Introduction to Managemeng Chapter 1..pptx
PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PPTX
kyc aml guideline a detailed pt onthat.pptx
PDF
Lecture1.pdf buss1040 uses economics introduction
PDF
Circular Flow of Income by Dr. S. Malini
PDF
discourse-2025-02-building-a-trillion-dollar-dream.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
Copia de Minimal 3D Technology Consulting Presentation.pdf
introuction to banking- Types of Payment Methods
4.5.1 Financial Governance_Appropriation & Finance.pptx
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
Understanding University Research Expenditures (1)_compressed.pdf
how_to_earn_50k_monthly_investment_guide.pdf
Introduction to Customs (June 2025) v1.pptx
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
Session 3. Time Value of Money.pptx_finance
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
Introduction to Managemeng Chapter 1..pptx
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
kyc aml guideline a detailed pt onthat.pptx
Lecture1.pdf buss1040 uses economics introduction
Circular Flow of Income by Dr. S. Malini
discourse-2025-02-building-a-trillion-dollar-dream.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf

Research Data Management for Econometrics

  • 1. Econometrics of Panel Data and Network Analysis Research Data Management Module 1 Dr. Peter Löwe Berlin, 03. 08. 2017
  • 2. Agenda 1. Why bother: A crisis, horror stories & a Panda-Oncologist 2. Size is relative: Doctor House, Big Data, and a long tail 3. Reality Check: Doing science in the 21st century 4. Research Data Management according to Gollum and XKCD 5. Persistent Identifiers: Digital dog tags for everything and everyone ! 6. Research Data Repositories & good reads 7. Conclusion: Culture change & happy Pandas 3.1. Unterpunkt Nummer eins 3.2. Nächster Unterpunkt 3.3. Und noch ein Unterpunkt Peter Löwe 2017-08-02 Research Data Management: Module 12
  • 3. 1 Today‘s menue Peter Löwe 2017-08-02 Research Data Management: Module 13 • Why Research Data Management matters and how it should work (perfect world) • How stuff currently works (state of the art) • How stuff will work soon (outlook) • How to get started (self help)
  • 4. 1 Drivers for Research Data Management Peter Löwe 2017-08-02 Research Data Management: Module 14 https://www.kent.ac.uk/library/research/data- management/manage.html Why you should care (internal motivation) • Increase the efficiency of your research process • Avoid losing data • Enable data re-use and sharing Why you are going care (external motivation) • Meet the requirements of research funders and your institute • Comply with the policies of a growing number journal publishers on making the data underlying publications available • Increase your visibility (citations)
  • 5. 1 Research Data includes Peter Löwe 2017-08-02 Research Data Management: Module 15 • Questionnaires/surveys • Raw experimental data • Analysed data • Databases • Simulations and research code (software) • Audio-visual materials • Laboratory and field notes • Clinical data, including clinical records • Images and photographs
  • 6. 1 The Research Data Spectrum Peter Löwe 2017-08-02 Research Data Management: Module 16 • Hand written letters • Images or photos • Soil samples • Tissue samples • Archeological dig sites • ….. • Scanned & OCR version • Scanned digital version • Analysed result of samples • Analysed result of samples • 3D models of the dig site • ….. Physical Digital
  • 7. 1 Issue: The Reproducibility Crisis Peter Löwe 2017-08-02 Research Data Management: Module 17 Nature 533, 452–454 (26 May 2016) doi:10.1038/533452a https://www.slideshare.net/AustralianNationalDataService/research-data-management-in-practice-ria-data-management- workshop-brisbane-2017 • A methodological crisis in science • the phrase was coined in the early 2010s as part of a growing awareness of the problem • 2016: poll of 1,500 scientists • 70% of them had failed to reproduce at least one other scientist's experiment • results of many scientific studies are difficult or impossible to replicate on subsequent investigation https://en.wikipedia.org/wiki/Replication_crisis
  • 8. 1 Data Sharing and Management Snafu in 3 Short Acts Peter Löwe 2017-08-02 Research Data Management: Module 18 [Snafu: „Situation normal, all f***ed up“]
  • 9. 1 Video Peter Löwe 2017-08-02 Research Data Management: Module 19
  • 10. 1 Discussion Peter Löwe 2017-08-02 Research Data Management: Module 110 Have you encountered something similar ? How to deal with such a situation ? Where do you store your data? How much data would you lose if your laptop was stolen?
  • 11. 1 Reproducibility decreases of time due to increasing data loss over time Peter Löwe 2017-08-02 Research Data Management: Module 111 http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416 “In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data. Such practices mean that data are being lost to science at a rapid rate, a study has now found.”
  • 12. 1 Night of the Living Data Peter Löwe 2017-08-02 Research Data Management: Module 112 http://www.eweek.com/database/5-data-management-horror-stories-to-avoid
  • 13. 1 Self-help Groups Peter Löwe 2017-08-02 Research Data Management: Module 113
  • 14. 1 Way Out: Keep Science FAIR (perfect world) Peter Löwe 2017-08-02 Research Data Management: Module 114 Principles to ensure research data is FAIR: Findable, Accessible, Interoperable, Reusable “The problem the FAIR Principles address is the lack of widely shared, clearly articulated, and broadly applicable best practices around the publication of scientific data” “FAIRness is a prerequisite for proper data management and data stewardship” Mark D. Wilkinson et al. The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data (2016). DOI: 10.1038/sdata.2016.18
  • 15. Data Storage Evolution https://www.nimbushosting.co.uk/evolution-data-storage/ We are here Ancient times •2 https://villagevoice.freetls.fastly.net/wp-content/uploads/2014/08/beatleboys560.jpg
  • 16. 2 Life Expectancy of Digital Storage Media Peter Löwe 2017-08-02 Research Data Management: Module 116 http://www.zeit.de/wissen/2013-10/s37-infografik-speichermedien.pdf https://homsum.files.wordpress.com/2014/04/dr_house_hugh_laur ie_desktop_1152x864_wallpaper-83467.jpg
  • 17. 2 Life Expectancy of Digital Storage Media Peter Löwe 2017-08-02 Research Data Management: Module 117 Storage capacity grows, but not the lifespan Average life-span: about 10- 30 years
  • 18. 2 Big Data Buzzwords: The Four V‘s Peter Löwe 2017-08-02 Research Data Management: Module 118
  • 19. 2 Size is not everything: Big Data and the Long Tail of Science Peter Löwe 2017-08-02 Research Data Management: Module 119 http://www.nature.com/neuro/journal/v17/n11/full/nn.3838.html Big data from small data: data-sharing in the 'long tail' of neuroscience Long Tail of Science • {Astro|Nuclear}- physics, • Genome studies, • Remote Sensing Overall amount continues to increases due to „Big Data“ (Volume | Velocity)
  • 20. 3 Data-driven Science Peter Löwe 2017-08-02 Research Data Management: Module 120 http://www.allthingsdistributed.com/2007/02/help_find_jim_gray.html Paradigms of Science: 1. empirical, 2. theoretical, 3. Computational 4. data-driven
  • 21. 3 The Fourth Paradigm Peter Löwe 2017-08-02 Research Data Management: Module 121 "It's the data, stupid" Dr Gray's call-to-arms was [..] “to have a world in which • all of the science literature is online, • all of the science data is online, and they • interoperate with each other.”
  • 22. 3 Innovation in Science travels at different velocities Peter Löwe 2017-08-02 Research Data Management: Module 122 • Science in general is affected by digital innovation • Every field of science is different • but some are more ahead embracing different aspects of change. • Exchange of lessons learned across disciplines needed. http://i.quoteaddicts.com/media/q1/1487862.png
  • 23. The Lifecycle of a Scientific Idea (Elegant High Level Perspective)•3 Influeced by computer-driven Science and „Big Data“ ?
  • 24. The Lifecycle of a Scientific Idea : Reality check 1. Formulate a theory 2. Gather data 3. Learn about data storage 4. Learn about data movement protocols 5. Lose data 6. Check out of rehab 7. Learn about backup and replication 8. Gather data 9. Learn about versioning 10. Start preliminary analysis 11. Buy a newer laptop 12. Buy more memory 13. Buy a desktop with more memory 14. Buy a bigger monitor & GPUs “for work” 15. Google “250GB Excel Spreadsheet” 16. Learn about batch processing 17. Learn about batch schedulers 18. Learn about patience. 19. Learn more about data storage 20. Learn about distributed systems. 21. Go back through notes to remember the science question. 22. Learn R & Python 23. Learn linux admin 24. Finish preliminary analysis. 25. Grow a ponytail 26. Write a paper. 27. Learn about data publishing 28. Learn about reproducibility 29. Plot the death of your advisor/dept. head 30. Apply for grants & research allocations on public systems 31. Wait to apply next time 32. Finish analyzing data 33. Reformulate your theory 34.Goto 1 Source: John Fonner (2016) Jupyter Ascending, http://bit.ly/2vmTwCR Reality Check: Science is green IT & the rest is blue Data-wrangling is red •3 Many data-wrangling challenges !
  • 25. 4 Data Wrangling: Research Data Management (RDM) Peter Löwe 2017-08-02 Research Data Management: Module 125 http://www.oclc.org/content/dam/research/images/publications/rdm-framework-4-with-cc.png Today‘s menue YOU Infrastructure (is there one - yet ?)
  • 26. 4 RDM Responsibilities before, during and after a research project Peter Löwe 2017-08-02 Research Data Management: Module 126 data/assets/pdf_file/0009/394056/research-data-management-in-practice.pdf YOU
  • 27. 4 Data Curation Continuum Peter Löwe 2017-08-02 Research Data Management: Module 127 Transfer Transfer Publication Personal domain Group domain Persistent domain Access domain Gliederung des Data Curation Continuum in vier Verantwortungsdomänen.. Im Prozess des Datentransfers werden die vorliegenden Metadaten um weitere Elemente angereichert. (Nach Klump, 2009) Post ResearchPre Research Research
  • 28. 4 Pre Research: Institutional Requirements Peter Löwe 2017-08-02 Research Data Management: Module 128 Institutional Policy and Procedures Support services - people and other means of providing advice and support IT Infrastructure - the hardware, software and other facilities Metadata management - so that data records can be meaningful and fit for purpose Institutional Data Management Framework
  • 29. 4 Pre Research: Data Management Plan (perfect world) Peter Löwe 2017-08-02 Research Data Management: Module 129  data organisation and storage;  metadata standards and guidelines;  backups;  archiving for long-term preservation;  version control and derived data products;  data sharing or publishing intentions, including licensing;  ensuring security of confidential data;  data synchronisation; and  governance, roles and responsibilities.
  • 30. 4 Documentation 101 Peter Löwe 2017-08-02 Research Data Management: Module 130 a) Document your data sets. b) Ask your data repository how to document correctly (Metadata !) c) If you do not document, you‘re wasting an opportunity to receive credit by citation and reuse d) Not to be missed:  Topic (keywords, controleld vocabulary, abstract)  Observation unit (counties, people, etc)  Database (random sampling, complete survey, etc.)  Sampling method  Extent  Access: Limitations, embargo, POC
  • 31. 4 Metadata 101 Peter Löwe 2017-08-02 Research Data Management: Module 131 Metadata (structured data about the data) • Who collected the data? • Who funded the research project? • When (and where) was it collected? • Instruments and setting for collecting the data? • Title of the dataset • Methods used to process the data • Etc. etc.
  • 32. 4 Appropriate File Formats Peter Löwe 2017-08-02 Research Data Management: Module 132 • Open and non-proprietary • Human readible, non-binary • Patent-free • ISO-standards • textual data: XML, TXT, HTML, PDF/A (Archival PDF) • Tabular data (spreadsheets): CSV • Databases: XML, CSV • Images: TIFF, PNG, JPEG* • Audio: FLAC, WAV, MP3
  • 33. 4 Include a Manifest / readme File ! Peter Löwe 2017-08-02 Research Data Management: Module 133
  • 34. 4 Data Life Cycle: Personal Domain Perspective Peter Löwe 2017-08-02 Research Data Management: Module 134 http://cdn.ttgtmedia.com/informationsecurity/images/vol4iss7/ism_v4i7_f4_DataLifecycle.gif Most critical stage in the research data lifecycle is the completion of the research project. In the most cases there is no follow up funding to maintain the research data. Also, the scientist has to focus on the next project. !!!
  • 35. 4 Publishing and Sharing Data Peter Löwe 2017-08-02 Research Data Management: Module 135 Publishing and Sharing data ≠ Open Access to data • “Open” and “Closed” are relative concepts. • “Closed” ≈ conditional access based on individual permission • “Closed” ≈ conditional access based on roles Metadata Research Data Open Open Open Closed Closed Open Closed Closed
  • 36. 4 Continual data curation across domains Peter Löwe 2017-08-02 Research Data Management: Module 136
  • 37. 4 Data Curation Continuum: Visibility und Circulation Peter Löwe 2017-08-02 Research Data Management: Module 137 Transfer Transfer Publication Personal domain Group domain Persistent domain Access domain Low visibility High visibility
  • 38. 4 Data Delay Strategies ? Peter Löwe 2017-08-02 Research Data Management: Module 138 https://www.explainxkcd.com/wiki/index.php/1805:_Unpublished_Discoveries
  • 39. 4 The Grant Cycle according to XKCD (and Machiavelli ?) Name + Datum (über Kopf- und Fußleiste einstellen) Titel und Untertitel39 http://phdcomics.com/comics/archive.php?comicid=1431
  • 40. 4 The Reputation Economy Peter Löwe 2017-08-02 Research Data Management: Module 140 Open Access to Data: • Science has become a reputation economy • The fundamental difference between disciplines is the trade-off between reputation and collaboration at points of the reputation economy where changes in the form of capital occur. • Sharing data as a form of collaboration must be balanced by a similar gain in reputation. • […]collaborative disciplines enforce data sharing as a social norm where non- compliance will result in some form of penalty […]
  • 41. 4 Research Parasites Paradigm: Open Access for Data is evil Peter Löwe 2017-08-02 Research Data Management: Module 141 https://media.tenor.com/images/236ee382fdf16973567dc3bb44c21 b51/tenor.gif Lego Gollum
  • 42. 4 Alternative Paradigm: Sharing the fire of the Open Data „torch“ Peter Löwe 2017-08-02 Research Data Management: Module 142
  • 43. 4 A Solution for the Crisis Open Science enables Reproducible Science Peter Löwe 2017-08-02 Research Data Management: Module 143 https://en.wikipedia.org/wiki/Op en_science#/media/File:Open_ Science_-_Prinzipien.png Benefits: • Greater availability and accessibility of publicly funded scientific research outputs; • Possibility for rigorous peer-review processes; • Greater reproducibility and transparency of scientific works; • Greater impact of scientific research. Open Science is the movement to make scientific research and data accessible to all
  • 44. 4 Reality check: Gollum (still) beats Prometheus by 10:1 Peter Löwe 2017-08-02 Research Data Management: Module 144 https://s-media-cache- ak0.pinimg.com/originals/21/94/ed/2194ed6879d5bfd93679326508d382cd.jpg • Gift culture still prevails • It‘s not the technology • It‘s not the generational change • How to trigger cultural change ? Science Technology Medicine (STM): 2006-2016: ~ 30 million papers published ~ 3 million data publications (Klump 2017) 10:1
  • 45. 4 Pradigm Change induced by Funding Agencies: Watering hole approach instead of stick & carrot Peter Löwe 2017-08-02 Research Data Management: Module 145 http://i.dailymail.co.uk/i/pix/2016/01/14/17/3025C04C00000578-3398562-image-a-16_1452793763082.jpg Carrot & stick did not work Control the watering hole: Works (for now)
  • 46. 4 FAIR principles: As guidelines Peter Löwe 2017-08-02 Research Data Management: Module 146 https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg http://www.macs.hw.ac.uk/~ajg33/wp- content/uploads/2016/03/FAIR-Article-Poster.jpg “The problem the FAIR Principles address is the lack of widely shared, clearly articulated, and broadly applicable best practices around the publication of scientific data”
  • 47. 5 Technical Requirement for FAIR Peter Löwe 2017-08-02 Research Data Management: Module 147 • Easy and permanent access to research data via the internet • Enhanced discovery, retrieval and management of data to enable data reuse and verification of research results
  • 48. 5 Benefits of Citation Peter Löwe 2017-08-02 Research Data Management: Module 148 • Including citable data in related publications increases the citation rate of those publications • Only cited data can be counted and tracked (in a similar manner to journal articles) to measure impact • Routine citation of data will assist in gaining acknowledgement of data as a first class research output • Citations for published data can be included in CVs along with journal articles, reports and conference papers
  • 49. 5 Technical Challenge: Unbreakable internet-based Citation Peter Löwe 2017-08-02 Research Data Management: Module 149 Stable linking needed • Data will move, URL links to Webpages will break. • Unbreakable alternative needed !
  • 50. 5 Digital Object Identifiers (DOI) Peter Löwe 2017-08-02 Research Data Management: Module 150 • International DOI Foundation was founded in 1998. • The DOI system offers long-term persistence and accessibility of data. • Based on the Handle system. • In May 2012 the DOI System ISO Standard 26324 was published. • Part of the quality control is mandatory metadata for each object registered with a DOI.
  • 51. 5 What is a DOI ? Peter Löwe 2017-08-02 Research Data Management: Module 151 DOI: Acronym for "digital object identifier“. A DOI name is an identifier (not a location) of an entity on digital networks. What you see: alphanumeric string (never changes) Associated with: location (such as URL) Accompanied with: who, what, when… (metadata)
  • 52. 5 DataCite Metadata Schema Mandatory properties Peter Löwe 2017-08-02 Research Data Management: Module 152 Part of the quality control is mandatory metadata for each object registered with a DOI: • Identifier (with type attribute) • Creator (with type and nameIdentifier attributes) • Title (with optional type attribute) • Publisher • PublicationYear
  • 53. 5 DOI is a quality label for data Peter Löwe 2017-08-02 Research Data Management: Module 153 Datasets with a DOI have to be: Stable (i.e. not going to be modified) Complete (i.e. not going to be updated) Permanent – by assigning a DOI we’re committing to make the dataset available for posterity Good quality – by assigning a DOI its receiving the data centre’s stamp of approval, saying that it’s complete and all the metadata is available DOI: Seal of Approval
  • 54. 5 DOI for Research Data Peter Löwe 2017-08-02 Research Data Management: Module 154 https://support.datacite.org/docs/doi-basics
  • 55. 5 DOI Citation Examples Peter Löwe 2017-08-02 Research Data Management: Module 155 Fahrenberg, Jochen (2010): Freiburger Beschwerdenliste FBL. Primärdaten der Normierungsstichprobe 1993. Version 1.0.0. ZPID- Leibniz-Zentrum für Psychologische Information und Dokumentation. Dataset. http://doi.org/10.5160/psychdata.fgjn05an08 Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Weßels, Bernhard(2012): Wahlkampf-Panel (GLES 2009). Version: 3.0.0. GESIS Datenarchiv. Dataset.doi:10.4232/1.11131. Schupp, Jürgen; Kroh, Martin; Goebel, Jan; Bartsch, Simone; Giesselmann, Marco et. al. (2013): Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2012. Version: 29. SOEP- Sozio-oekonomisches Panel. Dataset. doi:10.5684/soep.v29.
  • 56. 5 DOI System Architecture Peter Löwe 2017-08-02 Research Data Management: Module 156
  • 57. 5 DataCite Services Peter Löwe 2017-08-02 Research Data Management: Module 157 Search.datacite.org
  • 58. 5 Upcoming: Search DOI-registered datasets by ORCID Peter Löwe 2017-08-02 Research Data Management: Module 158 Find any DOI-registered publication by ORCID http://dashboard.project-thor.eu Example: Löwe / Loewe / Lowe ? Which of the four Peter Löwe ?
  • 59. 6 Data Curation Continuum: Research Data Repositories Peter Löwe 2017-08-02 Research Data Management: Module 159 Transfer Transfer Publication Personal domain Group domain Persistent domain Access domain Low visibility High visibility
  • 60. 6 re3data: Registry of Research Data Repositories Peter Löwe 2017-08-02 Research Data Management: Module 160 1,500 research dara repositories described by tags:
  • 61. 6 re3data: Search options Peter Löwe 2017-08-02 Research Data Management: Module 161
  • 62. 6 Research Data Repository (RDR) Development and Services Peter Löwe 2017-08-02 Research Data Management: Module 162 Currently, DFG funds two RDR-related Projects: 1. SowiDataNet: addressing the social sciences 2. RADAR: addressing the long tail of Science Technology and Metadata are compatible. RADAR is a service offering by FIZ Karlsruhe (testing phase) Near future: • SowiDtaaNet will become a serice offering (GESIS) • Datorium will merge with SowiDataNet
  • 63. 6 RADAR: Research Data Repository Services Peter Löwe 2017-08-02 Research Data Management: Module 163 Van den Broel K, Furtado F, Engel T (2015): RADAR – A Research Data Repository for the “Long-Tail of Science”
  • 64. 6 RADAR: Research Data Repositories Roles & Responsibilities Peter Löwe 2017-08-02 Research Data Management: Module 164
  • 65. 6 Datorium.gesis.org: Repository for Social Science and Economic Science Peter Löwe 2017-08-02 Research Data Management: Module 165
  • 66. 6 Datorium: Data Set Description Peter Löwe 2017-08-02 Research Data Management: Module 166
  • 67. 6 Datorium: Terms of Access Peter Löwe 2017-08-02 Research Data Management: Module 167
  • 68. 4 Where NOT to „publish“ your Data Peter Löwe 2017-08-02 Research Data Management: Module 168 Required: Professional repositories which enable • long term access, • search, • retrieval, • thorough metadata
  • 69. 6 Alternative (Self help): All-purpose Repositories Peter Löwe 2017-08-02 Research Data Management: Module 169 Rueda, Laura. (2017, May). Introduction to DataCite. Zenodo. http://doi.org/10.5281/zenodo.571808
  • 70. 6 OPENAIRE: RDM on the European Level Peter Löwe 2017-08-02 Research Data Management: Module 170 https://www.openaire.eu/ https://www.slideshare.net/OpenAIRE_eu/enabling-better-science-results-and-vision-of-the-openaire-infrastructure-and-rda- data-publishing-working-group-55075375
  • 71. 6 Adoption of Open Science in Europe Peter Löwe 2017-08-02 Research Data Management: Module 171 https://www.fosteropenscience.eu/
  • 72. 6 Forschungsdaten in den Sozial- und Wirtschaftswissenschaften Peter Löwe 2017-08-02 Research Data Management: Module 172 http://dx.doi.org/10.4232/10.fisuzida2014.1 http://auffinden-zitieren-dokumentieren.de
  • 73. 6 Handbuch Forschungsdatenmanagement Peter Löwe 2017-08-02 Research Data Management: Module 173 ISBN 978-3-88347-283-6 PDF: http://bit.ly/2uPJdaf
  • 74. 6 Rat für Sozial- und Wirtschaftdaten / DFG Peter Löwe 2017-08-02 Research Data Management: Module 174 http://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsd aten/basisinformationen_forschungsdatenmanagement.pdf
  • 75. 6 WIKI: FORSCHUNGSDATEN.ORG Peter Löwe 2017-08-02 Research Data Management: Module 175 http://www.forschungsdaten.org
  • 76. 6 RESEARCH DATA ALLIANCE Peter Löwe 2017-08-02 Research Data Management: Module 176 https://www.rd-alliance.org/
  • 77. 6 Data Carpentry Workshops Peter Löwe 2017-08-02 Research Data Management: Module 177 http://www.datacarpentry.org/
  • 78. 7 AUSTRALIAN NATIONAL DATA SERVICE (ANDS) Peter Löwe 2017-08-02 Research Data Management: Module 178
  • 79. 7 Wise Advise Peter Löwe 2017-08-02 Research Data Management: Module 179 https://nicolahemmings.wordpress.com/2016/04/05/mistakes-ive- made-as-an-early-career-researcher/ Mistakes I’ve made as an early career researcher APRIL 5, 2016 Nicola Hemmings (post-doc, University of Sheffield) Failing to organise my data adequately (circa 2007). “Prepare your datasets like you would if you were giving them to a stranger who knew nothing about them. Label, annotate and meticulously file your R scripts. Incorporate read-me files into everything and write them for the monkey that will be you in five years, when you return to your data and/or analyses for some unforeseen but vitally important reason. Don’t get this wrong. You will regret it.“
  • 80. 7 Back to the start: Snafu ? Things are getting better Peter Löwe 2017-08-02 Research Data Management: Module 180 • This film is scientific nontextual information • It is available on the AV-portal of TIB Hannover, a data portal for scientic audiovisual content. • DOI-link: https://doi.org/10.5446/31036
  • 81. Vielen Dank für Ihre Aufmerksamkeit. DIW Berlin — Deutsches Institut für Wirtschaftsforschung e.V. Mohrenstraße 58, 10117 Berlin www.diw.de Redaktion Peter Löwe (ploewe@diw.de) http://dilbert.com/strip/2010-08-24 Based on the works of • Paul Wong (2017) ANDS,Research Integrity Advisor Data Management Workshop • 3TU.Datacentre (2014): Data citation and DOIs • and others
  • 82. Vielen Dank für Ihre Aufmerksamkeit. DIW Berlin — Deutsches Institut für Wirtschaftsforschung e.V. Mohrenstraße 58, 10117 Berlin www.diw.de Redaktion Peter Löwe (ploewe@diw.de)

Editor's Notes

  • #2: https://www.ucl.ac.uk/reward/reward-events-publication/nov_workshop.png
  • #7: https://www.slideshare.net/AustralianNationalDataService/research-data-management-in-practice-ria-data-management-workshop-brisbane-2017
  • #9: https://doi.org/10.5446/31036 Lets look at it the other way around: Post Science
  • #12: In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data. Such practices mean that data are being lost to science at a rapid rate, a study has now found. The authors of the study, which is published today in Current Biology1, looked for the data behind 516 ecology papers published between 1991 and 2011. The researchers selected studies that involved measuring characteristics associated with the size and form of plants and animals, something that has been done in the same way for decades. By contacting the authors of the papers, they found that, whereas data for almost all studies published just two years ago were still accessible, the chance of them being so fell by 17% per year. Availability dropped to as little as 20% for research from the early 1990s.  “Most of the time, researchers said ‘it’s probably in this or that location’, such as their parents' attic, or on a zip drive for which they haven’t seen the hardware in 15 years," says Timothy Vines, the lead author on the study and an evolutionary ecologist at the University of British Columbia in Vancouver. "In theory, the data still exist, but the time and effort required by the researcher to get them to you is prohibitive.”
  • #13: Apparenty ist an issue,
  • #14: From personal perspective: icky.
  • #15: Best practices for data handling
  • #17: Should I store my data at home ?
  • #22: The basic idea is that our capacity for collecting scientific data has far outstripped our present capacity to analyze it, and so our focus should be on developing technologies that will make sense of this "Deluge of Data" 
  • #24: Replicable: results can be reproduced from an independent analysis (different lab, model system, software…) Reproducible: Results can be reproduced using your code and data
  • #26: OCLC, currently incorporated as OCLC Online Computer Library Center, Incorporated,[3] is an American nonprofit cooperativeorganization "dedicated to the public purposes of furthering access to the world's information and reducing information costs".[4] It was founded in 1967 as the Ohio College Library Center. OCLC and its member libraries cooperatively produce and maintain WorldCat, the largest online public access catalog (OPAC) in the world. OCLC is funded mainly by the fees that libraries have to pay for its services (around $200 million annually as of 2016).[1]
  • #28: Dies geschieht mit der Unterstützung von Informationsfachleuten und mit informationstechnischen Werkzeugen. (Abbildung Klump, 2009)
  • #31: Datensätze dokumentieren Den eigenen Datensatz sinnvoll zu dokumentieren sollte dem Datenproduzenten in Hinblick auf die gute wissenschaftliche Praxis sowie aufgrund von Reproduzierbarkeit und Transparenz gegenüber Dritten eine Herzensangelegenheit sein. Fragen der Dokumentation von Forschungsdaten in den Sozial- und Wirtschaftswissenschaften noch zu wenig in der akademischen Lehre verankert. Datenproduzenten sollte freilich klar sein: Eine gute Dokumentation macht es externen Datennutzern einfacher, die Daten zu re-analysieren und die vom Datenproduzenten geleistete Arbeit mit einer Referenz, also einem Zitat, zu honorieren. Fehlt die Dokumentation, verschenkt der Datenproduzent eine mögliche Anerkennung seiner Arbeit („credit“) durch Dritte. Hauptziel einer Dokumentation ist es, die Entstehung des Datensatzes nachvollziehbar zu machen und ihn so zu beschreiben, dass Dritte damit arbeiten können. Der Aufwand, der dafür nötig ist, hängt zum einen vom Umfang des Datensatzes selber ab. Zudem gibt es einige übergeordnete Informationen zu Datensätzen, die pauschal zur Verfügung gestellt werden sollten. Diese Informationen helfen den möglichen Nachnutzern bei der Entscheidung, ob die Daten relevant sein können. Folgende Punkte lassen sich darunter fassen: Inhalt Potentielle Nachnutzer eines Datensatzes werden im Allgemeinen versuchen, Angaben und Informationen über den Inhalt eines Datensatzes zu finden. Hilfreich dafür sind schlagwortartige Beschreibungen (z.B. „Arbeitsmarkt“, „Partnerschaften“, „Wahlen“, „Xenophobie“, „Investitionsgüter“) ebenso wie die Angabe von standardisierten inhaltsbezogenen Codes, z.B. JEL-Codes (ein Kder US-Ökonomenvereinigung American Economic Association),kreispfeil die eine Einordnung in bestimmte Forschungsfelder erlauben. Der Nachteil dieser spezifischen Codes ist allerdings, dass ein Datenproduzent manchmal nicht abschätzen kann, in welchen ihm unbekannten bzw. wenig vertrauten Forschungsfeldern seine Daten für andere nutzbar sein könnten Daher empfiehlt es sich, ein Abstract zu schreiben, das den Dateninhalt genauer spezifiziert als es ein einzelnes Schlagwort kann. Hier findet sich ein gutes Beispiel für das Abstract eines Datensatzes.lassifikationsschema für Forschungsinhalte  2. Beobachtungseinheit Die Beobachtungseinheit ist die kleinste Ebene, die im Datensatz vorhanden ist. Sie muss in der Dokumentation klar benannt und beschrieben werden. Im sozial- und wirtschaftswissenschaftlichen Kontext können dies Länder, Personen oder Güter sein. 3. Datengrundlage Als Nächstes muss der potenzielle Nutzer informiert werden, ob es sich bei den Daten um eine Vollerhebung oder um eine Stichprobe aus einer Grundgesamtheit handelt. Hierdurch erhält er im Idealfall direkt die Information darüber, welche Aussagen aufgrund der Daten überhaupt möglich sind. Bei Stichproben ist eine Definition der Grundgesamtheit sowie die Frage, wie versucht wurde, die Stichprobe aus der Grundgesamtheit abzuleiten, essentiell Bei einer Stichprobe stellt sich deswegen immer die Frage, wie sie erhoben wurde. Handelt es sich um eine Zufallsstichprobe, um eine Quotenstichprobe oder um eine Ziehung ohne Die Art der Stichprobe hat wiederum Einfluss auf die Aussagekraft der Daten – und somit auch auf die Breite der Fragestellungen, für die eine Nachnutzung der Daten sinnvoll ist. Zur Einschätzung der Validität der Daten sind Angaben zum Prozess der Erhebung essentiell. So sollte z.B. dokumentiert werden, wie viele Einheiten (etwa Personen oder Betriebe) ursprünglich befragt werden sollten („Bruttosample“) und wie viele letztendlich teilgenommen haben („Nettosample“). 4. Erhebungsmethode Daten können ganz unterschiedlich gewonnen werden und in verschiedenen Formen vorliegen. Dies genau darzulegen ist wichtig, um die Daten richtig interpretieren sowie deren Reliabilität (Messgenauigkeit) und Validität (Aussagekraft) einschätzen zu können. Beispielsweise lassen sich Zeitungsauschnitte zu einem Thema als Daten erfassen, Interviews mit Personen (die quantitativ oder qualitativ sein können) oder Suchanfragen auf Internetseiten können dabei eine Datengrundlage bilden. Insbesondere durch die fortschreitende Digitalisierung unseres Alltags lassen sich immer mehr Wege finden, an Daten zu kommen und diese zu wissenschaftlichen Zwecken zu nutzen. Umso wichtiger wird in diesem Zusammenhang die Dokumentation der Erhebungsmethode (für Standarderhebungsmethoden in persönlichen Interviews, siehe z.B. Schnell, 2012), so dass zusätzliche Informationen auch aus Fragebögen, Skalenhandbüchern, Testbeschreibungen, Kodierungsvorschriften, Übersetzungshilfen, oder Anschreiben gezogen werden können – kurzum alles, was den Prozess der Datenerstellung für den Nutzer konkretisiert. 5. Umfang Der Umfang der Daten ist wesentlich, wenn über den weiteren Gebrauch entschieden wird. Dabei geht es zum einen um die Anzahl an Beobachtungen. Wesentlich wichtiger ist aber, wie der in Punkt 1 angegebene Inhalt erfasst wird, also wie viele Variablen im Datensatz enthalten sind und was sie konkret messen. Hier kann eine veröffentlichte Aufsatz-Dokumentation, die den Lesern einen ersten Überblick geben soll, in der Regel nicht weit ins Detail gehen. Weiterführende Dokumentationen sind dann für die tatsächlichen Nutzer gedacht, die Genaueres über die Erhebung erfahren möchten. Hierfür ist die Erstellung eines so genannten Codebuches bzw. Datenhandbuches sinnvoll. Ein Beispiel für ein sehr ausführliches Codebuch findet sich beim SOEP: „Codebook: Household level questionnaires“. 6. Zugang Zu guter Letzt ist es wichtig, anzugeben, ob und wie ein Nachnutzer an die betreffenden Daten gelangen kann. Zunächst muss dabei ein Ansprechpartner oder eine Institution genannt werden, der oder die verantwortlich für den Zugang und den Vertrieb der Daten ist (falls dies vorgesehen ist). Die meisten Datensätze können nicht einfach öffentlich zur Verfügung gestellt werden, denn auch bei selbst erstellten Daten müssen datenschutzrechtliche Bestimmungen eingehalten werden. Erste Ansprechpartner für Fragen in diesem Zusammenhang sind die Datenschützer der jeweiligen Institution, die im Zweifel immer vor einer Studie mit selbst erhobenen Daten kontaktiert werden sollten. Immer häufiger ist es möglich, Daten per Download bereitzustellen und dafür besondere Zertifikate auszugeben (in der Regel auf Basis eines Nutzungsvertrags). Dabei ist der Unterschied zwischen kommerziellen und wissenschaftlichen Nutzern, für die meist unterschiedliche Bedingungen gesetzt werden, zu beachten. Auch Kosten der Nachnutzung, die selbst bei grundsätzlich kostenfreien Daten allein durch den Versand entstehen können, sind zu benennen. Besonders im universitären Umfeld ist es wichtig, ob es eine Version der Daten für die Lehre gibt, die datenschutzrechtlich weniger sensibel ist und die ggf. für Studierende verbilligt oder vollständig kostenfrei abgegeben wird (z. B. per Downloadmöglichkeit).
  • #38: Technische Aspekte ! Deckt nicht alles ab ! Soziale Aspekte !
  • #41: Conversion of social capital (credibility) into other forms of capital: funding, access to equipment, data, new arguments, publication, resulting in a reputation gain through reception and recognition by peers. Success is measured by the efficiency of conversion of one form of capital into another. (Modified after Latour and Woolgar (1982). The value of making research data available is broadly accepted. Policies concerning the open access to research data try to implement new norms calling for researchers to make their data more openly available. These policies either appeal to the common good or focus on publication and citationas an incentive to bring about a cultural change in how researchers share their data with their peers. But when we compare the total number of publications in the fields of science, technology and medicine with the number data publications from the same time period, the number of openly available datasets is rather small. This indicates that current policies on data sharing are not effective in changing behaviours and bringing about the wanted cultural change. By looking at research communities that are more open to data sharing we can study the social patterns that influence data sharing and point us to possible points for intervention and change.
  • #44: Open Science is the movement to make scientific research and data accessible to all. It includes practices such as publishing open scientific research, campaigning for open access and generally making it easier to publish and communicate scientific knowledge. Additionally, it includes other ways to make science more transparent and accessible during the research process. This includes open notebook science, citizen science, and aspects of open source software and crowdfunded research projects. The many advantages of this movement include: Greater availability and accessibility of publicly funded scientific research outputs; Possibility for rigorous peer-review processes; Greater reproducibility and transparency of scientific works; Greater impact of scientific research. (http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/open-science-movement/)
  • #47: Today, March 15 2016, the FAIR Guiding Principles for scientific data management and stewardship were formally published in the Nature Publishing Group journal Scientific Data. The problem the FAIR Principles address is the lack of widely shared, clearly articulated, and broadly applicable best practices around the publication of scientific data. While the history of scholarly publication in journals is long and well established, the same cannot be said of formal data publication. Yet, data could be considered the primary output of scientific research, and its publication and reuse is necessary to ensure validity, reproducibility, and to drive further discoveries. The FAIR Principles address these needs by providing a precise and measurable set of qualities a good data publication should exhibit – qualities that ensure that the data is Findable, Accessible, Interoperable, and Reusable (FAIR). The principles were formulated after a Lorentz Center workshop in January, 2014 where a diverse group of stakeholders, sharing an interest in scientific data publication and reuse, met to discuss the features required of contemporary scientific data publishing environments. The first-draft FAIR Principles were published on the Force11 website for evaluation and comment by the wider community – a process that lasted almost two years. This resulted in the clear, concise, broadly-supported principles that were published today. The principles support a wide range of new international initiatives, such as the European Open Science Cloud and the NIH Big Data to Knowledge (BD2K), by providing clear guidelines that help ensure all data and associated services in the emergent ‘Internet of Data’ will be Findable, Accessible, Interoperable and Reusable, not only by people, but notably also by machines. The recognition that computers must be capable of accessing a data publication autonomously, unaided by their human operators, is core to the FAIR Principles. Computers are now an inseparable companion in every research endeavour. Contemporary scientific datasets are large, complex, and globally-distributed, making it almost impossible for humans to manually discover, integrate, inspect and interpret them. This (re)usability barrier has, until now, prevented us from maximizing the return-on-investment from the massive global financial support of big data research and development projects, especially in the life and health sciences. This wasteful barrier has not gone unnoticed by key agencies and regulatory bodies. As a result, rigorous data management stewardship – applicable to both human and computational “users” – will soon become a funded, core activity within modern research projects. In fact, FAIR-oriented data management activities will increasingly be made mandatory by public funding bodies. The high level of abstraction of the FAIR Principles, sidestepping controversial issues such as the technology or approach used in the implementation, has already made them acceptable to a variety of research funding bodies and policymakers. Examples include FAIR Data workshops from EU-ELIXIR, inclusion of FAIR in the future plans of Horizon 2020, and advocacy from the American National Institutes of Health. As such, it seems assured that these principles will rapidly become a key basis for innovation in the global move towards Open Science environments. Therefore, the timing of the Principles publication is aligned with the Open Science Conference in April 2016. With respect to Open Science, the FAIR Principles advocate being “intelligently open”, rather than “religiously open”. The Principles do not propose that all data should be freely available – in particular with respect to privacy-sensitive data. Rather, they propose that all data should be made available for reuse under clearly-defined conditions and licenses, available through a well-defined process, and with proper and complete acknowledgement and citation.This will allow much wider participation of players from, for instance, the biomedical domain and industry where rigorous and transparent data usage conditions are a core requirement for data reuse. “I am very proud that just over two years after the meeting where we came up with the early FAIR Principles. They play such an important role in many forward looking policy documents around the world and the authors on this paper are also in positions that allow them to follow these Principles. I sincerely hope that FAIR data will become a ‘given’ in the future of Open Science, in the Netherlands and globally”, says Barend Mons, Professor in Biosemantics at the Leiden University Medical Center.
  • #52: DOI is an acronym for "digital object identifier", meaning a "digital identifier of an object". A DOI name is an identifier (not a location) of an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI name can be assigned to any entity — physical, digital or abstract — primarily for sharing with an interested user community or managing as intellectual property. The DOI system is designed for interoperability; that is to use, or work with, existing identifier and metadata schemes.
  • #55: A Digital Object Identifier (DOI) is an alphanumeric string assigned to uniquely identify an object. It is tied to a metadata description of the object as well as to a digital location, such as a URL, where all the details about the object are accessible. In order to create new DOIs and assign them to your content, it is necessary to become a DataCite member or work with one of the current members.
  • #60: Technische Aspekte ! Deckt nicht alles ab ! Soziale Aspekte !
  • #64: https://www.radar-projekt.org/download/attachments/753675/GCC_2015_RADAR.pdf?version=1&modificationDate=1448461974000&api=v2
  • #71: A network of Open Access repositories, archives and journals that support Open Access policies. The OpenAIRE Consortium is a Horizon 2020 (FP8) project, aimed to support the implementation of the EC and ERC Open Access policies. Its successor OpenAIREplus is aimed at linking the aggregated research publications to the accompanying research and project information, datasets and author information. Open access to scientific peer reviewed publications has evolved from a pilot project with limited scope in FP7 to an underlying principle in the Horizon 2020 funding scheme, obligatory for all H2020 funded projects. The goal is to make as much European funded research output as possible available to all, via the OpenAIRE portal. — openaire.eu FAQ[25] The Zenodo research data repository is a product of OpenAIRE. The OpenAIRE portal is online