SlideShare a Scribd company logo
Is your code worth a full citation? 
Dominik Reusser, RD2 
Martin Hammitsch, GFZ-CeGIT 
27. November 2014, PIK Modelling Strategy Seminar
Reproduzierbarkeit als wissenschaftliches Prinzip 
2 
Publikation 
Daten Software
3 
DFG: 
„Primärdat 
en als 
Grundlage 
n für 
Veröffentlic 
hungen 
sollen auf 
haltbaren
For example, in December 2006, Geoffrey Chang from the 
Department of Molecular Biology at the Scripps Research Institute, 
California, US, was horrified when a hand-written program flipped two 
columns of data, inverting an electron-density map. As a result, a 
number of papers had to be retracted from the journal Science. 
4
The missing link 
“Establish the missing link between papers and data publications.” 
 Findings, papers, data … and software? 
- Data already is professionally published, either with papers or self-contained 
- Not standard practice with the related software 
- Findings are not only based on raw data, they are also based on data obtained in analyses most likely supported 
by software 
 Software is the link between the findings presented in papers and the 
data the findings are based on. 
- Software used to gain findings play a crucial role in the scientific work 
- However, software is rarely seen publishable in terms of scientific publications 
- Researchers may not reproduce the findings without the software which is in conflict with the principle of 
reproducibility in natural sciences 
 The provision of software lacks solutions serving researchers’ needs. 
- Software publications would fix the missing link between data and papers of findings 
- Software publications would foster their interplay
6 
Nature 2011 
• The code was not written for others to use. 
• Scientist and reviewers wanted a longer conversation with a 
chance to go back and forth
Best practices 
“Establish standard software engineering rules, 
best practices and processes in science.” 
 The treatment of source code is associated with additional work that is not covered in 
the primary research task. 
- This includes code design, version control, documentation, and testing … 
- To safeguard traceability and reusability this scientific work has to be planned and supported 
- This includes the adoption of processes following the software development life cycle 
 Adoption of software engineering rules and best practices have to be recognized and 
accepted as part of the scientific performance. 
- Most scientists have little incentive to improve code 
- They do not publish code either with their papers or self-contained 
- Software engineering habits are rarely practised by faculty and research facility staff, postdocs, doctoral and 
graduate students and thus undergraduate students 
- Software engineering skills are not passed on to followers as for paper writing skill 
 It is often felt that the software or code produced is not publishable. 
- The quality of software and its source code has a decisive influence on the quality of research results 
 Establishing best practices from software engineering not only adopted but also 
adapted to serve scientific needs is crucial for the success of software publications
Scientific achievement 
“Make software recognized as scientific achievement.” 
 Disciplinary journals require that articles discuss scientific problems. 
- Software is often seen only as a contribution to the solution of a question or problem 
- Software is not perceived as an independent contribution to science 
- Authors of software must first find a question to motivate the publication in a desired journal 
 A direct release of software in kind of scientific publications is not possible. 
- Scientific achievements of software and its contributions to sciences are poorly perceived and hardly 
measurable 
 The resulting gap in interdisciplinary communication regarding scientific 
software might be closed by software publications. 
- It requires common understanding of how to handle scientific software with defined processes 
- It requires commonly accepted and adopted metrics 
- Thus software could be valued and assessed as a contribution to science
The paper must be accompanied by the code, or means of 
accessing the code, for the purpose of peer-review. If the code 
is normally distributed in a way which could compromise the 
anonymity of the referees, then the code must be made available 
to the editor. The referee/editor is not required to review the 
code in any way, but they may do so if they so wish. 
All papers must include a section at the end of the paper 
entitled “Code availability”. In this section, instructions for 
obtaining the code (e.g. from a supplement, or from a website) 
should be included; alternatively, contact information should be 
given where the code can be obtained on request, or the reasons 
why the code is not available should be clearly stated. 
We strongly encourage authors to upload any user manuals 
associated with the code. 
For models where this is practicable, we strongly encourage 
referees to compile the code, and run test cases supplied by the 
authors where appropriate. 
Unlike internal reports, GMD papers undergo a peer-review 
process, which establishes the criteria for full publication of 
model developments. Submitted code has not itself been peer-reviewed 
by GMD, but, as one of our new initiatives, we will 
require that referees and editors obtain the model code, and 
encourage them to execute test cases where practical. 
example #1
example #2 
Both Figshare and Zenodo integrate with Github 
Neither repository offers long-term storage of executable code 
(e.g. storing all software dependencies or virtual machines)
example #3 
Persistent identifiers for software are not (yet) common practice.
example #4
example #5
example #6
Open science 
“Leverage open access and open science.” 
 Scientific software development often implies that the software and code is not written 
for others to use. 
- Code is kept and maintained on own computers and servers 
- If the code grows or groups work together code repositories and version control systems are set up 
- In many cases these systems are available for internal use, usually not reachable from the outside 
 Reuse mainly happens informally or anonymously, even in sciences. 
- Scientists use existing software and code from open source software repositories 
- Only few contribute their code back into the repositories 
 For cooperation and reuse of software, there is already a number of software 
platforms 
- SourceForge and GitHub are used already by scientists 
- Platforms fulfill partly scientific needs to serve software and code as part of the scientific tradition 
- It is unclear, if these platforms can be augmented for scientific purposes or whether special repositories must be 
created 
 Subsequent users have to be able to run the code 
- It requires the provision of sufficient documentation, sample data sets, tests and comments which in turn can be 
proven by adequate and qualified reviews 
- This assumes that scientist learn to write and release code and software as they learn to write and publish papers
Publication and Citation of Scientific 
Software with Persistent Identifiers 
Why? 
Software development in general is not perceived as a scientific achievement – but: 
Software accounts for an increasingly prominent space in research and has become an indispensable commodity. 
Software has become an integral part of science, yet software is not properly integrated into the scientific discourse. 
Make software recognized 
as scientific achievement. 
Leverage open access and 
open science. 
Where is it going? 
Establish the missing link 
between papers and data 
Establish standard software 
engineering rules, best 
practices and processes in 
Find and implement solutions serving researchers’ needs so that 
publications. 
science. 
software development can be part of the academic tradition and is regarded as scientific achievement of its authors. 
Recognize, create, and act upon opportunities for the development of concepts establishing defined processes.

More Related Content

PPTX
Software Citation and Other Incentives at BD2K Software Discovery Workshop
PDF
The Future Publication of Software
PPT
sciforge lightning talk at Collaborations Workshop 2015 (CW15)
PPTX
Reproducibility and replicability: a practical approach
DOCX
140127 Performance Metrics WG
PPTX
A practical guide to practicing open science
PDF
Coming to terms to FAIR semantics
PDF
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
Software Citation and Other Incentives at BD2K Software Discovery Workshop
The Future Publication of Software
sciforge lightning talk at Collaborations Workshop 2015 (CW15)
Reproducibility and replicability: a practical approach
140127 Performance Metrics WG
A practical guide to practicing open science
Coming to terms to FAIR semantics
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles

What's hot (20)

PPTX
Towards Reusable Research Software
PDF
FAIR Workflows: A step closer to the Scientific Paper of the Future
PPTX
A Template-Based Approach for Annotating Long-Tailed Datasets
PPT
Publishing data and code openly
PPTX
Towards Knowledge Graphs of Reusable Research Software Metadata
PDF
CV - DCHATTERJI
PDF
Use of open_linked_data_in_bioinformatics
PPTX
Modern tools for sharing and synthesizing neuroimaging results
PPTX
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
PDF
Link Analysis of Life Sciences Linked Data
PPTX
Reproducible research: theory
PPTX
Reproducibility: 10 Simple Rules
PDF
Python Coursera MKXK78WCHYX4
PPTX
Avoiding the tower of babel - The Role of Data Description Standards in Biome...
PDF
ownR extended technical introduction
PDF
Mindtrek 2015 - Tampere Finland
PPTX
The End-to-End Use of Source Code Example: An Exploratory Study ICSM'09
PPTX
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
PDF
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
PPTX
Being FAIR: Enabling Reproducible Data Science
Towards Reusable Research Software
FAIR Workflows: A step closer to the Scientific Paper of the Future
A Template-Based Approach for Annotating Long-Tailed Datasets
Publishing data and code openly
Towards Knowledge Graphs of Reusable Research Software Metadata
CV - DCHATTERJI
Use of open_linked_data_in_bioinformatics
Modern tools for sharing and synthesizing neuroimaging results
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Link Analysis of Life Sciences Linked Data
Reproducible research: theory
Reproducibility: 10 Simple Rules
Python Coursera MKXK78WCHYX4
Avoiding the tower of babel - The Role of Data Description Standards in Biome...
ownR extended technical introduction
Mindtrek 2015 - Tampere Finland
The End-to-End Use of Source Code Example: An Exploratory Study ICSM'09
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
Being FAIR: Enabling Reproducible Data Science
Ad

Similar to SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014 (20)

PDF
Citation and reproducibility in software
PPTX
Software Citation: Principles, Implementation, and Impact
PPTX
20160607 citation4software panel
PDF
Pawlik
PPTX
20171003 lancaster data conversations Chue-Hong
PPTX
Scientific Software Challenges and Community Responses
PPTX
Software Citation in Theory and Practice
PPTX
Research software identification - Catherine Jones
PDF
Software as a Well-Formed Research Object
PDF
Software systems engineering PRINCIPLES
PDF
Software Analytics
PPT
An Open Source Framework for Teaching BIoinformatics
PPT
Teaching Bioinformatics
PPTX
Lecture 1 SE.pptx
PDF
Stat Tech Reportv1
PDF
Requirementv4
PDF
Software Engineering For Data Scientists Meap V2 Chapters 1 To 7 Of 14 Andrew...
PPTX
Slcm sharbani bhattacharya
PPTX
Open Source and Science at the National Science Foundation (NSF)
PPTX
CSE_2014 SE MODULE 1 V.10.pptx
Citation and reproducibility in software
Software Citation: Principles, Implementation, and Impact
20160607 citation4software panel
Pawlik
20171003 lancaster data conversations Chue-Hong
Scientific Software Challenges and Community Responses
Software Citation in Theory and Practice
Research software identification - Catherine Jones
Software as a Well-Formed Research Object
Software systems engineering PRINCIPLES
Software Analytics
An Open Source Framework for Teaching BIoinformatics
Teaching Bioinformatics
Lecture 1 SE.pptx
Stat Tech Reportv1
Requirementv4
Software Engineering For Data Scientists Meap V2 Chapters 1 To 7 Of 14 Andrew...
Slcm sharbani bhattacharya
Open Source and Science at the National Science Foundation (NSF)
CSE_2014 SE MODULE 1 V.10.pptx
Ad

Recently uploaded (20)

PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Cell Membrane: Structure, Composition & Functions
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
HPLC-PPT.docx high performance liquid chromatography
Classification Systems_TAXONOMY_SCIENCE8.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
neck nodes and dissection types and lymph nodes levels
Introduction to Fisheries Biotechnology_Lesson 1.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Cell Membrane: Structure, Composition & Functions
Viruses (History, structure and composition, classification, Bacteriophage Re...
ECG_Course_Presentation د.محمد صقران ppt
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
7. General Toxicologyfor clinical phrmacy.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
2. Earth - The Living Planet Module 2ELS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
HPLC-PPT.docx high performance liquid chromatography

SciForge Workshop@Potsdam Institute for Climate Impact Reserach; Nov 2014

  • 1. Is your code worth a full citation? Dominik Reusser, RD2 Martin Hammitsch, GFZ-CeGIT 27. November 2014, PIK Modelling Strategy Seminar
  • 2. Reproduzierbarkeit als wissenschaftliches Prinzip 2 Publikation Daten Software
  • 3. 3 DFG: „Primärdat en als Grundlage n für Veröffentlic hungen sollen auf haltbaren
  • 4. For example, in December 2006, Geoffrey Chang from the Department of Molecular Biology at the Scripps Research Institute, California, US, was horrified when a hand-written program flipped two columns of data, inverting an electron-density map. As a result, a number of papers had to be retracted from the journal Science. 4
  • 5. The missing link “Establish the missing link between papers and data publications.”  Findings, papers, data … and software? - Data already is professionally published, either with papers or self-contained - Not standard practice with the related software - Findings are not only based on raw data, they are also based on data obtained in analyses most likely supported by software  Software is the link between the findings presented in papers and the data the findings are based on. - Software used to gain findings play a crucial role in the scientific work - However, software is rarely seen publishable in terms of scientific publications - Researchers may not reproduce the findings without the software which is in conflict with the principle of reproducibility in natural sciences  The provision of software lacks solutions serving researchers’ needs. - Software publications would fix the missing link between data and papers of findings - Software publications would foster their interplay
  • 6. 6 Nature 2011 • The code was not written for others to use. • Scientist and reviewers wanted a longer conversation with a chance to go back and forth
  • 7. Best practices “Establish standard software engineering rules, best practices and processes in science.”  The treatment of source code is associated with additional work that is not covered in the primary research task. - This includes code design, version control, documentation, and testing … - To safeguard traceability and reusability this scientific work has to be planned and supported - This includes the adoption of processes following the software development life cycle  Adoption of software engineering rules and best practices have to be recognized and accepted as part of the scientific performance. - Most scientists have little incentive to improve code - They do not publish code either with their papers or self-contained - Software engineering habits are rarely practised by faculty and research facility staff, postdocs, doctoral and graduate students and thus undergraduate students - Software engineering skills are not passed on to followers as for paper writing skill  It is often felt that the software or code produced is not publishable. - The quality of software and its source code has a decisive influence on the quality of research results  Establishing best practices from software engineering not only adopted but also adapted to serve scientific needs is crucial for the success of software publications
  • 8. Scientific achievement “Make software recognized as scientific achievement.”  Disciplinary journals require that articles discuss scientific problems. - Software is often seen only as a contribution to the solution of a question or problem - Software is not perceived as an independent contribution to science - Authors of software must first find a question to motivate the publication in a desired journal  A direct release of software in kind of scientific publications is not possible. - Scientific achievements of software and its contributions to sciences are poorly perceived and hardly measurable  The resulting gap in interdisciplinary communication regarding scientific software might be closed by software publications. - It requires common understanding of how to handle scientific software with defined processes - It requires commonly accepted and adopted metrics - Thus software could be valued and assessed as a contribution to science
  • 9. The paper must be accompanied by the code, or means of accessing the code, for the purpose of peer-review. If the code is normally distributed in a way which could compromise the anonymity of the referees, then the code must be made available to the editor. The referee/editor is not required to review the code in any way, but they may do so if they so wish. All papers must include a section at the end of the paper entitled “Code availability”. In this section, instructions for obtaining the code (e.g. from a supplement, or from a website) should be included; alternatively, contact information should be given where the code can be obtained on request, or the reasons why the code is not available should be clearly stated. We strongly encourage authors to upload any user manuals associated with the code. For models where this is practicable, we strongly encourage referees to compile the code, and run test cases supplied by the authors where appropriate. Unlike internal reports, GMD papers undergo a peer-review process, which establishes the criteria for full publication of model developments. Submitted code has not itself been peer-reviewed by GMD, but, as one of our new initiatives, we will require that referees and editors obtain the model code, and encourage them to execute test cases where practical. example #1
  • 10. example #2 Both Figshare and Zenodo integrate with Github Neither repository offers long-term storage of executable code (e.g. storing all software dependencies or virtual machines)
  • 11. example #3 Persistent identifiers for software are not (yet) common practice.
  • 15. Open science “Leverage open access and open science.”  Scientific software development often implies that the software and code is not written for others to use. - Code is kept and maintained on own computers and servers - If the code grows or groups work together code repositories and version control systems are set up - In many cases these systems are available for internal use, usually not reachable from the outside  Reuse mainly happens informally or anonymously, even in sciences. - Scientists use existing software and code from open source software repositories - Only few contribute their code back into the repositories  For cooperation and reuse of software, there is already a number of software platforms - SourceForge and GitHub are used already by scientists - Platforms fulfill partly scientific needs to serve software and code as part of the scientific tradition - It is unclear, if these platforms can be augmented for scientific purposes or whether special repositories must be created  Subsequent users have to be able to run the code - It requires the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews - This assumes that scientist learn to write and release code and software as they learn to write and publish papers
  • 16. Publication and Citation of Scientific Software with Persistent Identifiers Why? Software development in general is not perceived as a scientific achievement – but: Software accounts for an increasingly prominent space in research and has become an indispensable commodity. Software has become an integral part of science, yet software is not properly integrated into the scientific discourse. Make software recognized as scientific achievement. Leverage open access and open science. Where is it going? Establish the missing link between papers and data Establish standard software engineering rules, best practices and processes in Find and implement solutions serving researchers’ needs so that publications. science. software development can be part of the academic tradition and is regarded as scientific achievement of its authors. Recognize, create, and act upon opportunities for the development of concepts establishing defined processes.