Standardising Swedish
genomics analyses
using nextflow
Phil Ewels
@ewels
@tallphil
Nextflow Meeting
2017-09-14
CRG, Barcelona
2 x MiSeq 5 x HiSeq 2500 5 x HiSeq X10 NovaSeq
RNA-Seq
WG Re-Seq
Targeted Re-Seq
Metagenomics
Others
0 2000 4000 6000 8000 10000 12000 14000
1,265
2,580
3,214
8,934
12,017
Number of
Samples in 2016
1141 Gbp/day
1X Human Genome
every 4 minutes
NGI stockholmstockholm
SciLifeLab NGI
CumulativeOutput(MBp)
0
250,000,000
500,000,000
750,000,000
1,000,000,000
Jan
2012
Sep
2012
M
ay
2013
Jan
2014
Sep
2014
M
ay
2015
Jan
2016
Sep
2016
M
ay
2017
NGI stockholmstockholm - sequencing output
SciLifeLab NGI
NGI bioinformatics
• Initial data analysis for major protocols
• Internal QC and standardised starting
point for users
• Team of 10 bioinformaticians
• Accredited facility
analysis requirements
Automated
Reliable
Easy for others to run
Reproducible results
icons: the noun project
NGI pipelines
NouGAT (de-novo)
what have we learnt?
sharing is caring
sharing is caring
• Open-source on GitHub from day one
• Easier help and feedback from others
• Other people may help to develop your code
• https://github.com/nextflow-io/awesome-nextflow
use containers
use containers
• Create a docker image, even if you don’t think you
need to
• Makes local and automated testing possible
• Future proof for cloud / singularity / other people
test, test and test again
test, test and test again
• Find a small test dataset
• Make a bash script to fetch data and launch pipeline
• Different flags to launch with different parameters
• Use Travis build matrix to launch parallel test runs
use versioned releases
use versioned releases
minimal configs
minimal configs
• Build config files around blocks of function
• Hardware / software deps / genome references
• Use nextflow profiles
• Even if only using ‘standard’ default
• Don’t be afraid to use multiple configs per profile
• Build on a base profile and be clever with
limits
minimal configs
def	check_max(obj,	type)	{	
		if(type	==	'memory'){	
				if(obj.compareTo(params.max_memory))	
						return	params.max_memory	
				else	
						return	obj	
		}	else	if(type	==	'time'){	
				if(obj.compareTo(params.max_time))	
						return	params.max_time	
				else	
						return	obj	
		}	else	if(type	==	'cpus'){	
				return	Math.min(	obj,	params.max_cpus	)	
		}	
}
nextflow.config
process	{	
		cpus	=	{	check_max(16,	'cpus')	}	
		memory	=	{	check_max(128.GB,	'memory')	}	
		time	=	{	check_max(10.h,	'time')	}	
}
conf/base.config
profiles	{	
		standard	{	
				includeConfig	'conf/base.config'	
				includeConfig	'conf/igenomes.config'	
				includeConfig	'conf/uppmax.config'	
		}	
		devel	{	
				includeConfig	'conf/base.config'	
				includeConfig	'conf/igenomes.config'	
				includeConfig	'conf/uppmax.config'	
				includeConfig	'conf/uppmax-dev.config'	
		}	
}
nextflow.config
params	{	
		max_cpus	=	1	
		max_memory	=	16.GB	
		max_time	=	1.h	
}
conf/uppmax-dev.config
reference genomes
reference genomes
params	{	
		genomes	{	
				'GRCh37'	{	
						fasta	=	'/refs/human/genome.fasta'	
						gtf	=	'/refs/human/genes.gtf'	
				}	
				'GRCm38'	{	
						fasta	=	'/refs/mouse/genome.fasta'	
						gtf	=	'/refs/mouse/genes.gtf'	
				}	
		}	
}
conf/references.conf
params.fasta	=	params.genome	?	params.genomes[	params.genome	].fasta	?:	false	:	false	
params.gtf	=	params.genome	?	params.genomes[	params.genome	].gtf	?:	false	:	false
main.nf
$	nextflow	run	main.nf	--genome	GRCh37
$	nextflow	run	main.nf	--fasta	/path/to/my/genome.fa
reference genomes
• illumina iGenomes is a great resource for this
• Standard organisation allows easy use of multiple
genomes
• Use AWS iGenomes for free on AWS S3
• See https://ewels.github.io/AWS-iGenomes/
problems we’ve hit
dodgy file patterns
dodgy file patterns
Channel	
				.fromFilePairs(	
								params.reads,	
								size:	-1	
				)
Channel	
				.fromFilePairs(	
								params.reads,	
								size:	params.singleEnd	?	1	:	2	
				)
If glob pattern doesn’t use {1,2}
then all PE files are run in SE mode
If glob pattern doesn’t use {1,2}
then pipeline exits with no matching files
overwriting params
overwriting params
//	Custom	trimming	options	
params.clip_r1	=	0	
params.clip_r2	=	0	
params.three_prime_clip_r1	=	0	
params.three_prime_clip_r2	=	0	
//	Preset	trimming	options	
params.pico	=	false	
if	(params.pico){	
		params.clip_r1	=	3	
		params.clip_r2	=	0	
		params.three_prime_clip_r1	=	0	
		params.three_prime_clip_r2	=	3	
}
//	Custom	trimming	options	
params.clip_r1	=	0	
params.clip_r2	=	0	
params.three_prime_clip_r1	=	0	
params.three_prime_clip_r2	=	0	
//	Define	regular	variables	
clip_r1	=	params.clip_r1	
clip_r2	=	params.clip_r2	
tp_clip_r1	=	params.three_prime_clip_r1	
tp_clip_r2	=	params.three_prime_clip_r2	
//	Preset	trimming	options	
params.pico	=	false	
if	(params.pico){	
		clip_r1	=	3	
		clip_r2	=	0	
		tp_clip_r1	=	0	
		tp_clip_r2	=	3	
}
regular variables
(this now triggers a
warning message)
quick-fire round
MultiQC in workflows
MultiQC in workflows
params.multiqc_config	=	"$baseDir/conf/multiqc_config.yaml"	
multiqc_config	=	file(params.multiqc_config)	
process	run_multiqc	{	
				input:	
				file	multiqc_config	
				file	('fastqc/*')	from	fastqc_results.collect()	
				file	('alignment/*')	from	alignment_logs.collect()	
				output:	
				file	"*multiqc_report.html"	into	multiqc_report	
				file	"*_data"	
				script:	
				"""	
				multiqc	-f	--config	$multiqc_config	.	
				"""	
}
extra_fn_clean_exts:	
				-	_R1	
				-	_R2	
report_comment:	>	
				This	report	has	been	generated	by	the	NGI-RNAseq	analysis	pipeline.	
				For	information	about	how	to	interpret	these	results,	please	see	the	docs.
conf/multiqc_config.yaml
software versions
process	get_software_versions	{	
				output:	
				file	'software_versions_mqc.yaml'	into	software_versions_yaml	
				script:	
				"""	
				echo	$pipeline_version	>	v_ngi_methylseq.txt	
				echo	$workflow.nextflow.version	>	v_nextflow.txt	
				fastqc	--version	>	v_fastqc.txt	
				samtools	--version	>	v_samtools.txt	
				scrape_software_versions.py	>	software_versions_mqc.yaml	
				"""	
}
main.nf
email notifications
email notifications
workflow.onComplete	{	
				def	subject	=	'My	pipeline	execution'	
				def	recipient	=	'me@gmail.com'	
				['mail',	'-s',	subject,	recipient].execute()	<<	"""	
				Pipeline	execution	summary	
				---------------------------	
				Completed	at:	${workflow.complete}	
				Duration				:	${workflow.duration}	
				Success					:	${workflow.success}	
				workDir					:	${workflow.workDir}	
				exit	status	:	${workflow.exitStatus}	
				Error	report:	${workflow.errorReport	?:	'-'}	
				"""	
}
Nextflow documentation
email notifications
workflow.onComplete	{	
				//	Render	the	HTML	template	
				def	hf	=	new	File("$baseDir/assets/email_template.html")	
				def	html_template	=	engine.createTemplate(hf).make(email_fields)	
				def	email_html	=	html_template.toString()	
				//	Send	the	HTML	e-mail	
				if	(params.email)	{	
						[	'sendmail',	'-t'	].execute()	<<	sendmail_html	
						log.info	"[NGI-MethylSeq]	Sent	summary	e-mail	to	$params.email	(sendmail)"	
				}	
				//	Write	summary	e-mail	HTML	to	a	file	
				def	output_d	=	new	File(	"${params.outdir}/pipeline_info/"	)	
				if(	!output_d.exists()	)	{	
						output_d.mkdirs()	
				}	
				def	output_hf	=	new	File(	output_d,	"pipeline_report.html"	)	
				output_hf.withWriter	{	w	->	w	<<	email_html	}	
}
main.nf
email notifications
<html>	
<head><title>NGI-MethylSeq	Pipeline	Report</title></head>	
<body>	
<h1>NGI-MethylSeq:	Bisulfite-Seq	Best	Practice	v${version}</h1>	
<h2>Run	Name:	$runName</h2>	
<%	if	(success){	
				out	<<	"""	
				<div	style="color:	green;">NGI-MethylSeq	execution	completed	successfully!</div>	
				"""	
}	else	{	
				out	<<	"""	
				<div	style="color:	#red;">	
								<h4>NGI-MethylSeq	execution	completed	unsuccessfully!</h4>	
								<p>The	exit	status	of	the	failed	task	was:	<code>$exitStatus</code>.</p>	
								<p>The	full	error	message	was:</p>	
								<pre>${errorReport}</pre>	
				</div>	
				"""	
}	
%>
assets/email_template.html
[NGI-RNAseq] Successful: Test RNA Run
email notifications
email notifications
email notifications
[NGI-RNAseq] FAILED: Test RNA Run
Groovy syntax highlighting
run-STAR	=	params.runstar
run-STAR	=	params.runstar
#!/usr/bin/env	nextflow	
/*	
vim:	syntax=groovy	
-*-	mode:	groovy;-*-	
*/
main.nf
without highlighting:
with highlighting:
saving intermediates
publishDir	"${params.outdir}/trim_galore",	
				mode:	'copy',	
				saveAs:	{fn	->	
								if	(fn.indexOf("_fastqc")	>	0)	"FastQC/$fn"	
								else	if	(fn.indexOf("trimming_report")	>	0)	"logs/$fn"	
								else	params.saveTrimmed	?	fn	:	null	
				}
publishDir	"${params.outdir}/STAR",	
				mode:	'copy',	
				saveAs:	{	
								fn	->	params.saveAlignedIntermediates	?	fn	:	null	
				}
future plans
• Use singularity for everything
• Benchmark AWS run pricing for future
planning
• Refine pipelines
• Improve resource requests
• Automate launch and run management
Phil Ewels
phil.ewels@scilifelab.se
ewels
tallphil
Acknowledgements
http://github.com/SciLifeLab
http://opensource.scilifelab.se
NGI stockholm
Max Käller
Rickard Hammarén
Denis Moreno
Francesco Vezzi
NGI Stockholm Genomics
Applications Development Group
Paolo Di Tommaso
The nextflow community

More Related Content

PDF
Reproducible Computational Pipelines with Docker and Nextflow
PDF
FPGAスタートアップ資料
PDF
containerdの概要と最近の機能
PDF
Rapids: Data Science on GPUs
PPTX
VPP事始め
PDF
Topology Managerについて / Kubernetes Meetup Tokyo 50
PDF
20180729 Preferred Networksの機械学習クラスタを支える技術
PDF
さくらのクラウドインフラの紹介
Reproducible Computational Pipelines with Docker and Nextflow
FPGAスタートアップ資料
containerdの概要と最近の機能
Rapids: Data Science on GPUs
VPP事始め
Topology Managerについて / Kubernetes Meetup Tokyo 50
20180729 Preferred Networksの機械学習クラスタを支える技術
さくらのクラウドインフラの紹介

What's hot (20)

PDF
OCIランタイムの筆頭「runc」を俯瞰する
PDF
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
PDF
GPU on OpenStack - GPUインターナルクラウドのベストプラクティス - OpenStack最新情報セミナー 2017年7月
PDF
Node-RED TIPS:functionノード間で関数を共有する方法
PDF
PPT
Blue brain
PDF
[Cloud OnAir] BigQuery へデータを読み込む 2019年3月14日 放送
PDF
DevOps核心理念和實踐
PPTX
Ansible Network Automation session1
PDF
P2P Container Image Distribution on IPFS With containerd and nerdctl
PDF
EnrootとPyxisで快適コンテナ生活
PPTX
First steps on CentOs7
PPTX
Verilator勉強会 2021/05/29
PDF
FPGA Hardware Accelerator for Machine Learning
PPTX
Dockerからcontainerdへの移行
PDF
Hands-on demo of PDI using webSpoon
PPTX
Dockerのネットワークについて
PPTX
Blue brain
PDF
OpenStack超入門シリーズ いまさら聞けないSwiftの使い方
PDF
計算力学シミュレーションに GPU は役立つのか?
OCIランタイムの筆頭「runc」を俯瞰する
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
GPU on OpenStack - GPUインターナルクラウドのベストプラクティス - OpenStack最新情報セミナー 2017年7月
Node-RED TIPS:functionノード間で関数を共有する方法
Blue brain
[Cloud OnAir] BigQuery へデータを読み込む 2019年3月14日 放送
DevOps核心理念和實踐
Ansible Network Automation session1
P2P Container Image Distribution on IPFS With containerd and nerdctl
EnrootとPyxisで快適コンテナ生活
First steps on CentOs7
Verilator勉強会 2021/05/29
FPGA Hardware Accelerator for Machine Learning
Dockerからcontainerdへの移行
Hands-on demo of PDI using webSpoon
Dockerのネットワークについて
Blue brain
OpenStack超入門シリーズ いまさら聞けないSwiftの使い方
計算力学シミュレーションに GPU は役立つのか?
Ad

Similar to Standardising Swedish genomics analyses using nextflow (20)

PDF
NBIS ChIP-seq course
PDF
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
PDF
NBIS RNA-seq course
PDF
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
PDF
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
PDF
Ontology-based data access: why it is so cool!
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PPTX
Pynvme introduction
PDF
CEPH DAY BERLIN - CEPH ON THE BRAIN!
PDF
Manycores for the Masses
PDF
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
PDF
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
PDF
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
PPTX
FPGAs in the cloud? (October 2017)
PDF
High Performance With Java
PDF
What to expect from Java 9
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
PDF
Kognitio - an overview
PDF
Sanger OpenStack presentation March 2017
NBIS ChIP-seq course
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
NBIS RNA-seq course
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
Ontology-based data access: why it is so cool!
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Pynvme introduction
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Manycores for the Masses
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
FPGAs in the cloud? (October 2017)
High Performance With Java
What to expect from Java 9
OS for AI: Elastic Microservices & the Next Gen of ML
Kognitio - an overview
Sanger OpenStack presentation March 2017
Ad

More from Phil Ewels (16)

PDF
Reproducible bioinformatics for everyone: Nextflow & nf-core
PDF
Reproducible bioinformatics workflows with Nextflow and nf-core
PDF
ELIXIR Proteomics Community - Connection with nf-core
PDF
Coffee 'n code: Regexes
PDF
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
PDF
Nextflow Camp 2019: nf-core tutorial
PDF
EpiChrom 2019 - Updates in Epigenomics at the NGI
PDF
The future of genomics in the cloud
PDF
SciLifeLab NGI NovaSeq seminar
PDF
Lecture: NGS at the National Genomics Infrastructure
PDF
SBW 2016: MultiQC Workshop
PDF
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
PDF
Developing Reliable QC at the Swedish National Genomics Infrastructure
PDF
Using visual aids effectively
PDF
Analysis of ChIP-Seq Data
PPT
Internet McMenemy
Reproducible bioinformatics for everyone: Nextflow & nf-core
Reproducible bioinformatics workflows with Nextflow and nf-core
ELIXIR Proteomics Community - Connection with nf-core
Coffee 'n code: Regexes
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial
EpiChrom 2019 - Updates in Epigenomics at the NGI
The future of genomics in the cloud
SciLifeLab NGI NovaSeq seminar
Lecture: NGS at the National Genomics Infrastructure
SBW 2016: MultiQC Workshop
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Developing Reliable QC at the Swedish National Genomics Infrastructure
Using visual aids effectively
Analysis of ChIP-Seq Data
Internet McMenemy

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPT
Presentation of a Romanian Institutee 2.
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
limit test definition and all limit tests
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
LEC Synthetic Biology and its application.ppt
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
Computional quantum chemistry study .ppt
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Presentation of a Romanian Institutee 2.
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
BODY FLUIDS AND CIRCULATION class 11 .pptx
limit test definition and all limit tests
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Seminar Hypertension and Kidney diseases.pptx
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
LEC Synthetic Biology and its application.ppt
Introcution to Microbes Burton's Biology for the Health
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Hypertension_Training_materials_English_2024[1] (1).pptx
Animal tissues, epithelial, muscle, connective, nervous tissue
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Computional quantum chemistry study .ppt

Standardising Swedish genomics analyses using nextflow