SlideShare a Scribd company logo
A decades experiences in transparent and interactive
publication of FAIR data and software via an end-to-end XML
publishing platform
Scott Edmunds 0000-0001-6444-1436
https://www.telegraph.co.uk/technology/2020/05/16/neil-fergusons-imperial-model-could-devastating-software-mistake/
Scientists: need to convince public + politicians
The “Infodemic” Era
Imperial College: Report 9
GigaSolution: rewarding open data & code
http://gigasciencejournal.com/
Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
Transparent: Open Peer Review and linked to preprints. Mandates code in repo.
Integrated GigaDB repository: DataCite DOIs, no size limits, code snapshots, APC covers curation
http://gigadb.org/
GigaSolution: rewarding open data & code
0 1 2
4
2
5 6 6
8
2
0
0
0
0
0
0
3 2
1
0
0
0
0
0
0
0
1
1
2
1
0
0
0
0
0
0
0
5 0
2
2 1
2
8
7
28
35
34
48
45
0
10
20
30
40
50
60
70
GigaScience software/workflow papers (Technical Notes), 2012-2021
Galaxy Snakemake Nextflow CWL Other
Changes in how research is shared: workflows
gigagalaxy.net
Experience publishing Galaxy workflows: 2013
https://doi.org/10.1186/2047-217X-3-23
• Downloadable as virtual hard-disk/available as Amazon Machine Image
• Unclear how to describe licensing & security issues?
Experience publishing VMs: 2014
https://doi.org/10.1186/s13742-015-0087-0
https://doi.org/10.1186/s13742-015-0073-6
• From 2015 increasing submissions leveraging containers
• Promoted experiments in standardization such as bioboxes
• Integrated with CodeOcean & tested with Gigantum
• Carried out reproducibility case-studies (can be expensive)
Experience publishing containers: 2015
Independent execution of computations underlying research articles.
Experience publishing CODECHECK: 2020
CODECHECK tackles one of the main challenges of computational research by supporting
codecheckers with a workflow, guidelines and tools to evaluate computer programs
underlying scientific papers. The independent time-stamped runs conducted by
codecheckers will award a “certificate of executable computation” and increase availability,
discovery and reproducibility of crucial artefacts for computational sciences.
https://codecheck.org.uk/
Experience publishing CODECHECK: 2020
http://gigasciencejournal.com/blog/codecheck-certificate/
https://doi.org/10.1093/gigascience/giaa026
Experience publishing CODECHECK: 2020
https://www.nature.com/articles/d41586-020-01685-y
http://doi.org/10.5281/zenodo.3865491
Tech really the
bottleneck
Process much too
slow & expensive
Still too focused on
narrative and static
“version of record”
Still not very FAIR
Lessons learned in a decade of data & software
publishing:
D ATA C O D E E N T I T I E S FA C T S S TA B I L I T Y
A new approach
Follow the Software
Paradigm?
C O D E R E L E A S E F O R K U P D AT E R E P E AT
Deconstruct the “Version
of Record”?
Move to new XML end-to-end pipeline
Custom end-to-end workflow makes integrations simpler with one integration point
Features of new journal:
Main advantage of workflow is XML from start to end
https://gigabytejournal.com/
Several modules acting as one platform: no
import/export of files, so fast and accurate
Cutting out production allows huge time & cost saving
(currently as little as 3.5hrs per paper)
Any number of versions can be published instantly,
including typographic quality PDF or updates/forks
Allows instantaneous switch of views
Leverage embeddable dynamic content/widgets
Initial focus on forkable open source products:
data + software + update papers
Focusing beyond VoR allows different views…
16
What does focusing on Data + software + XML allow us to do?
https://doi.org/10.46471/gigabyte.1
https://doi.org/10.46471/gigabyte.6
High quality rich XML
CC-BY open licensed, open citations, open corpus
Structured schema.org metadata
No hiding of material in supplemental files
Maximise use of persistent identifiers (PIDs)
Who
ORCID IDs
CASRAI contributorship
Funder (Fundref)
Institution (ROR)
What
Species (NCBI, fishbase)
Cell/strain (RRID)
How
Equipment (RRID)
Software (RRID, bio.tools)
Output
Data (accessions, DOIs)
Results (DOIs)
Helping to make research “AI-ready”
Thinking about users: machines
Interaction: increasing understanding & trust
https://doi.org/10.46471/gigabyte.13
Do you trust an immunoinformatics tool to predict whether memory T cells generated from
previous exposure to common cold coronaviruses are cross-reactive against SARSCoV2?
Interaction: software and code via Stencila and CodeOcean
http://gigasciencejournal.com/blog/gigabyte-executable-research-articles/
Code Ocean “Compute Capsule”: readers can
directly interact with software via an embedded
version in the article; or deploy and run in their
own cloud computing environment.
Popout Stencila “Executable Research Article”
where figures are accompanied by editable
code blocks that can be edited and re-
executed to immediately see the changes.
Interact with Stenci.la “code chunks” & Code Ocean “compute
capsules” of COVID-19 immunoinformatics papers
https://doi.org/10.46471/gigabyte.13
A new way of publishing FAIR research with new tech
• Share & get credit for updatable data & software papers
• Follow the software paradigm, bring your research to life
• XML makes it much easier to embed interactive content
• Use automation & interaction to increase scrutiny & trust
• XML only workflow cuts time and cost to publish
• Rethink “Version of Record”: focus on facts/data/code &
discard the packaging
Help us change scientific publishing, contact: editorial@gigabytejournal.com
https://gigabytejournal.com/
Thanks to:
@GigaByteJournal
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
+
Weibo
& WeChat
Laurie Goodman, Publisher
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Head of IT
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Editor
Rija Ménagé, Senior Software Engineer
Ken Cho, Systems Programmer Analyst
Chen Qi, Shenzhen Office.
https://gigabytejournal.com/
editorial@gigabytejournal.com
Questions?

More Related Content

PPTX
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
PPTX
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
PPT
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
PPTX
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
PDF
G3 talk rld_2
PPT
Aaas Data Intensive Science And Grid
PPT
Scott Edmunds ISMB talk on Big Data Publishing
PPTX
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
G3 talk rld_2
Aaas Data Intensive Science And Grid
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...

Similar to IDW2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform (20)

PPT
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
PPTX
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
PPT
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
PPTX
Laurie Goodman at #CSE2014: Open Access - Viewpoint of New Journal
PPT
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
PPT
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
PPTX
Scott Edmunds flashtalk slides from Beyond the PDF2
PPTX
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
PPTX
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
PPT
Services For Science April 2009
PPTX
GigaScience: a new resource for the big-data community.
PPTX
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPTX
Mtsr2015 goble-keynote
PDF
Cytoscape: Now and Future
PPTX
So Long Computer Overlords
PPTX
Rpi talk foster september 2011
PPT
Knowledge Infrastructure for Global Systems Science
PPTX
Scott Edmunds: Preparing a data paper for GigaByte
PPTX
Reproducibility - The myths and truths of pipeline bioinformatics
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds & Rob Davidson's talk at the Metabolomics Society 2014 Meeting ...
Laurie Goodman at #CSE2014: Open Access - Viewpoint of New Journal
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Services For Science April 2009
GigaScience: a new resource for the big-data community.
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Mtsr2015 goble-keynote
Cytoscape: Now and Future
So Long Computer Overlords
Rpi talk foster september 2011
Knowledge Infrastructure for Global Systems Science
Scott Edmunds: Preparing a data paper for GigaByte
Reproducibility - The myths and truths of pipeline bioinformatics
Ad

More from GigaScience, BGI Hong Kong (20)

PPTX
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
PPTX
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
PPTX
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
PPTX
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PPTX
Democratising biodiversity and genomics research: open and citizen science to...
PPTX
Hong Kong Open Access & GigaScience: CCHK@10
PDF
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
PDF
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
PPTX
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
PDF
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
PDF
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
PPTX
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
PPTX
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
PPTX
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
PPTX
Reproducible method and benchmarking publishing for the data (and evidence) d...
PPTX
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
PPTX
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
PPTX
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
PPTX
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
PPTX
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
Democratising biodiversity and genomics research: open and citizen science to...
Hong Kong Open Access & GigaScience: CCHK@10
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
Reproducible method and benchmarking publishing for the data (and evidence) d...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a mult...
Valerie de Anda at #ICG12: A new multi-genomic approach for the study of biog...
Ad

Recently uploaded (20)

PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
famous lake in india and its disturibution and importance
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Pharmacology of Autonomic nervous system
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Sciences of Europe No 170 (2025)
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
BIOMOLECULES PPT........................
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
famous lake in india and its disturibution and importance
POSITIONING IN OPERATION THEATRE ROOM.ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Pharmacology of Autonomic nervous system
HPLC-PPT.docx high performance liquid chromatography
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Sciences of Europe No 170 (2025)
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ECG_Course_Presentation د.محمد صقران ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
BIOMOLECULES PPT........................
The KM-GBF monitoring framework – status & key messages.pptx
Biophysics 2.pdffffffffffffffffffffffffff

IDW2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform

  • 1. A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform Scott Edmunds 0000-0001-6444-1436
  • 3. GigaSolution: rewarding open data & code http://gigasciencejournal.com/ Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software. Transparent: Open Peer Review and linked to preprints. Mandates code in repo.
  • 4. Integrated GigaDB repository: DataCite DOIs, no size limits, code snapshots, APC covers curation http://gigadb.org/ GigaSolution: rewarding open data & code
  • 5. 0 1 2 4 2 5 6 6 8 2 0 0 0 0 0 0 3 2 1 0 0 0 0 0 0 0 1 1 2 1 0 0 0 0 0 0 0 5 0 2 2 1 2 8 7 28 35 34 48 45 0 10 20 30 40 50 60 70 GigaScience software/workflow papers (Technical Notes), 2012-2021 Galaxy Snakemake Nextflow CWL Other Changes in how research is shared: workflows
  • 7. https://doi.org/10.1186/2047-217X-3-23 • Downloadable as virtual hard-disk/available as Amazon Machine Image • Unclear how to describe licensing & security issues? Experience publishing VMs: 2014
  • 8. https://doi.org/10.1186/s13742-015-0087-0 https://doi.org/10.1186/s13742-015-0073-6 • From 2015 increasing submissions leveraging containers • Promoted experiments in standardization such as bioboxes • Integrated with CodeOcean & tested with Gigantum • Carried out reproducibility case-studies (can be expensive) Experience publishing containers: 2015
  • 9. Independent execution of computations underlying research articles. Experience publishing CODECHECK: 2020 CODECHECK tackles one of the main challenges of computational research by supporting codecheckers with a workflow, guidelines and tools to evaluate computer programs underlying scientific papers. The independent time-stamped runs conducted by codecheckers will award a “certificate of executable computation” and increase availability, discovery and reproducibility of crucial artefacts for computational sciences. https://codecheck.org.uk/
  • 10. Experience publishing CODECHECK: 2020 http://gigasciencejournal.com/blog/codecheck-certificate/ https://doi.org/10.1093/gigascience/giaa026
  • 11. Experience publishing CODECHECK: 2020 https://www.nature.com/articles/d41586-020-01685-y http://doi.org/10.5281/zenodo.3865491
  • 12. Tech really the bottleneck Process much too slow & expensive Still too focused on narrative and static “version of record” Still not very FAIR Lessons learned in a decade of data & software publishing:
  • 13. D ATA C O D E E N T I T I E S FA C T S S TA B I L I T Y A new approach Follow the Software Paradigm? C O D E R E L E A S E F O R K U P D AT E R E P E AT Deconstruct the “Version of Record”?
  • 14. Move to new XML end-to-end pipeline Custom end-to-end workflow makes integrations simpler with one integration point
  • 15. Features of new journal: Main advantage of workflow is XML from start to end https://gigabytejournal.com/ Several modules acting as one platform: no import/export of files, so fast and accurate Cutting out production allows huge time & cost saving (currently as little as 3.5hrs per paper) Any number of versions can be published instantly, including typographic quality PDF or updates/forks Allows instantaneous switch of views Leverage embeddable dynamic content/widgets Initial focus on forkable open source products: data + software + update papers
  • 16. Focusing beyond VoR allows different views… 16 What does focusing on Data + software + XML allow us to do? https://doi.org/10.46471/gigabyte.1
  • 17. https://doi.org/10.46471/gigabyte.6 High quality rich XML CC-BY open licensed, open citations, open corpus Structured schema.org metadata No hiding of material in supplemental files Maximise use of persistent identifiers (PIDs) Who ORCID IDs CASRAI contributorship Funder (Fundref) Institution (ROR) What Species (NCBI, fishbase) Cell/strain (RRID) How Equipment (RRID) Software (RRID, bio.tools) Output Data (accessions, DOIs) Results (DOIs) Helping to make research “AI-ready” Thinking about users: machines
  • 18. Interaction: increasing understanding & trust https://doi.org/10.46471/gigabyte.13 Do you trust an immunoinformatics tool to predict whether memory T cells generated from previous exposure to common cold coronaviruses are cross-reactive against SARSCoV2?
  • 19. Interaction: software and code via Stencila and CodeOcean http://gigasciencejournal.com/blog/gigabyte-executable-research-articles/ Code Ocean “Compute Capsule”: readers can directly interact with software via an embedded version in the article; or deploy and run in their own cloud computing environment. Popout Stencila “Executable Research Article” where figures are accompanied by editable code blocks that can be edited and re- executed to immediately see the changes.
  • 20. Interact with Stenci.la “code chunks” & Code Ocean “compute capsules” of COVID-19 immunoinformatics papers https://doi.org/10.46471/gigabyte.13
  • 21. A new way of publishing FAIR research with new tech • Share & get credit for updatable data & software papers • Follow the software paradigm, bring your research to life • XML makes it much easier to embed interactive content • Use automation & interaction to increase scrutiny & trust • XML only workflow cuts time and cost to publish • Rethink “Version of Record”: focus on facts/data/code & discard the packaging Help us change scientific publishing, contact: editorial@gigabytejournal.com https://gigabytejournal.com/
  • 22. Thanks to: @GigaByteJournal facebook.com/GigaScience http://gigasciencejournal.com/blog/ Follow us: + Weibo & WeChat Laurie Goodman, Publisher Nicole Nogoy, Editor Hans Zauner, Assistant Editor Hongling Zhao, Assistant Editor Peter Li, Head of IT Chris Hunter, Lead BioCurator Chris Armit, Data Scientist Mary Ann Tulli, Data Editor Rija Ménagé, Senior Software Engineer Ken Cho, Systems Programmer Analyst Chen Qi, Shenzhen Office. https://gigabytejournal.com/ editorial@gigabytejournal.com