SlideShare a Scribd company logo
The best practices hereby proposed aim at promoting findable and accessible data analysis pipelines through web-based resources. This
process allows to package and re-execute pipelines in the long run, and to adapt to continuously evolving environments. Our future works
include two main directions: i) handling data resources as part of the pipeline distribution process (e.g. BioMaj), and ii) studying how to
promote interoperability between multiple systems and infrastructures.
Achieved pipelines:
☑ SingleCell RNASeq
☑ Exome variant calling
In progress pipelines:
☐ DGESeq
☐ Gene fusion detection
☐ RNASeq variant calling
☐ RNASeq Differential gene expression
Benefits:
F : Indexed and searchable packages on https://anaconda.org
A : Web-based package management and installation
Controlled deployment on Linux containers / systems
I : Virtual environments to handle incompatible libraries
Multi-platform, multi-language: Snakemake + Conda
R : Versioned software environments to foster reproducibility
Limitations:
○ Heavy data resources required (reference genomes, etc.)
○ Some tools / libraries need to be packaged beforehand
Y. Lelièvre1
, A. Bihouée2
, E. Charpentier2
, A. Gaignard2,4
, S. Souchet3
and D. Vintache1
1
LS2N, UMR CNRS 6004, IMT Atlantique, ECN, Université de Nantes, Nantes, France
2
l’institut du thorax, INSERM, CNRS, Université de Nantes, Nantes, France
3
Angers Academic Hospital, CHU d’Angers, France
4
Nantes Academic Hospital, CHU de Nantes, France
Contacts:
Yohann.Lelievre@univ-nantes.fr Audrey.Bihouee@univ-nantes.fr  Damien.Vintache@univ-nantes.fr
Web portal:
http://bird_pipeline_registry.univ-nantes.io/PipelinesPortal
Process:
○ virtual environment creation with
the pipeline package
○ virtual environment activation
○ pipeline configuration
○ pipeline execution
packages
repositories
End-user
conda env
packages
Developer
Iterative development:
1. virtual environment setup
2. source code publication
3. continuous integration
Deployment:
4. pipeline packaging
source code
conda env
packages
Developers
Build reproducible pipelines
1 4
source code
conda env
packages
docker
3
2
3
packages
repositories
source code
reposirory
Life-sciences are nowadays conducted in multi-disciplinary and multi-centric studies. In this context, the same software components must
be deployed in multiple environments for reproducibility and scalability issues. In addition, data analysis pipelines are usually composed of
multiple components, continuously evolving, which leads to maintenance and long-term support challenges. To promote FAIR (Findable –
Accessible – Interoperable – Reusable) principles, providing controlled software environments becomes mandatory. We propose a set of
best practices taking advantage of proven or promising tools: Git, Conda, SnakeMake, Jenkins and Docker.
Introduction
End-users
Deploy and launch pipelines
Results Discussion
Conclusion
Developingandsharingreproducible
bioinformaticspipelines:bestpractices

More Related Content

PPT
Reproducible bioinformatics pipelines with Docker and Anduril
PDF
PDF
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
PPTX
Desktop as a Service supporting Environmental ‘omics
PPTX
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
PDF
Bio2RDF presentation at Combine 2012
PPTX
Technical integration of data repositories status and challenges
 
Reproducible bioinformatics pipelines with Docker and Anduril
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
Desktop as a Service supporting Environmental ‘omics
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
Bio2RDF presentation at Combine 2012
Technical integration of data repositories status and challenges
 

Similar to Developing and sharing reproducible bioinformatics pipelines: best practices (20)

PPTX
Containers in Science: neuroimaging use cases
PDF
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
PDF
2016 nov-ieee-sdn-wiki
PPTX
The BlueBRIDGE approach to collaborative research
PPTX
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
PPTX
FAIR Computational Workflows
PPTX
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
PDF
2016-04-21 BioExcel Usecase Open PHACTS
PDF
Sgci esip-7-20-18
PDF
Dependency Issues in Open Source Software Package Registries
PDF
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
PDF
IDB-Cloud Providing Bioinformatics Services on Cloud
PPTX
Scientific Software: Sustainability, Skills & Sociology
PPTX
EOSC-Life Workflow Collaboratory
PPTX
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
PPTX
Data-intensive applications on cloud computing resources: Applications in lif...
PPTX
Computational Resources In Infectious Disease
PDF
Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...
PDF
Effectively using Open Source with conda
PPTX
Dataverse in the European Open Science Cloud
 
Containers in Science: neuroimaging use cases
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2016 nov-ieee-sdn-wiki
The BlueBRIDGE approach to collaborative research
EURISCO demo installations of IPT, at GBIF EU Nodes meeting in Alicante (11 M...
FAIR Computational Workflows
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
2016-04-21 BioExcel Usecase Open PHACTS
Sgci esip-7-20-18
Dependency Issues in Open Source Software Package Registries
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
IDB-Cloud Providing Bioinformatics Services on Cloud
Scientific Software: Sustainability, Skills & Sociology
EOSC-Life Workflow Collaboratory
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
Data-intensive applications on cloud computing resources: Applications in lif...
Computational Resources In Infectious Disease
Bridging Environmental Data Providers and SeaDataNet DIVA Service within a Co...
Effectively using Open Source with conda
Dataverse in the European Open Science Cloud
 
Ad

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
BIOMOLECULES PPT........................
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPT
protein biochemistry.ppt for university classes
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
An interstellar mission to test astrophysical black holes
PPTX
2. Earth - The Living Planet earth and life
7. General Toxicologyfor clinical phrmacy.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
ECG_Course_Presentation د.محمد صقران ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Biophysics 2.pdffffffffffffffffffffffffff
POSITIONING IN OPERATION THEATRE ROOM.ppt
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
2. Earth - The Living Planet Module 2ELS
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
HPLC-PPT.docx high performance liquid chromatography
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
microscope-Lecturecjchchchchcuvuvhc.pptx
The scientific heritage No 166 (166) (2025)
BIOMOLECULES PPT........................
Phytochemical Investigation of Miliusa longipes.pdf
. Radiology Case Scenariosssssssssssssss
protein biochemistry.ppt for university classes
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
An interstellar mission to test astrophysical black holes
2. Earth - The Living Planet earth and life
Ad

Developing and sharing reproducible bioinformatics pipelines: best practices

  • 1. The best practices hereby proposed aim at promoting findable and accessible data analysis pipelines through web-based resources. This process allows to package and re-execute pipelines in the long run, and to adapt to continuously evolving environments. Our future works include two main directions: i) handling data resources as part of the pipeline distribution process (e.g. BioMaj), and ii) studying how to promote interoperability between multiple systems and infrastructures. Achieved pipelines: ☑ SingleCell RNASeq ☑ Exome variant calling In progress pipelines: ☐ DGESeq ☐ Gene fusion detection ☐ RNASeq variant calling ☐ RNASeq Differential gene expression Benefits: F : Indexed and searchable packages on https://anaconda.org A : Web-based package management and installation Controlled deployment on Linux containers / systems I : Virtual environments to handle incompatible libraries Multi-platform, multi-language: Snakemake + Conda R : Versioned software environments to foster reproducibility Limitations: ○ Heavy data resources required (reference genomes, etc.) ○ Some tools / libraries need to be packaged beforehand Y. Lelièvre1 , A. Bihouée2 , E. Charpentier2 , A. Gaignard2,4 , S. Souchet3 and D. Vintache1 1 LS2N, UMR CNRS 6004, IMT Atlantique, ECN, Université de Nantes, Nantes, France 2 l’institut du thorax, INSERM, CNRS, Université de Nantes, Nantes, France 3 Angers Academic Hospital, CHU d’Angers, France 4 Nantes Academic Hospital, CHU de Nantes, France Contacts: Yohann.Lelievre@univ-nantes.fr Audrey.Bihouee@univ-nantes.fr  Damien.Vintache@univ-nantes.fr Web portal: http://bird_pipeline_registry.univ-nantes.io/PipelinesPortal Process: ○ virtual environment creation with the pipeline package ○ virtual environment activation ○ pipeline configuration ○ pipeline execution packages repositories End-user conda env packages Developer Iterative development: 1. virtual environment setup 2. source code publication 3. continuous integration Deployment: 4. pipeline packaging source code conda env packages Developers Build reproducible pipelines 1 4 source code conda env packages docker 3 2 3 packages repositories source code reposirory Life-sciences are nowadays conducted in multi-disciplinary and multi-centric studies. In this context, the same software components must be deployed in multiple environments for reproducibility and scalability issues. In addition, data analysis pipelines are usually composed of multiple components, continuously evolving, which leads to maintenance and long-term support challenges. To promote FAIR (Findable – Accessible – Interoperable – Reusable) principles, providing controlled software environments becomes mandatory. We propose a set of best practices taking advantage of proven or promising tools: Git, Conda, SnakeMake, Jenkins and Docker. Introduction End-users Deploy and launch pipelines Results Discussion Conclusion Developingandsharingreproducible bioinformaticspipelines:bestpractices