SlideShare a Scribd company logo
NGI stockholm
NGI-ChIPseq
Processing ChIP-seq data at the
National Genomics Infrastructure
Phil Ewels
phil.ewels@scilifelab.se
NBIS ChIP-seq tutorial
2017-11-29
NGI stockholm
SciLifeLab NGI
Our mission is to offer a

state-of-the-art infrastructure

for massively parallel DNA sequencing
and SNP genotyping, available to
researchers all over Sweden
NGI stockholm
SciLifeLab NGI
National resource
State-of-the-art
infrastructure
Guidelines and
support
We provide 

guidelines and support

for sample collection, study
design, protocol selection and
bioinformatics analysis
NGI stockholm
NGI Organisation
NGI Stockholm NGI Uppsala
NGI stockholm
NGI Organisation
Funding
Staff salaries
Premises and service
contracts
Capital equipment
Host universities
SciLifeLab
VR
KAW
User fees
Reagent costs
NGI Stockholm NGI Uppsala
NGI stockholm
Project timeline
Sample QC
Library
preparation,
Sequencing,
Genotyping
Data processing
and primary
analysis
Scientific support
and project
consultation
Data delivery
NGI stockholm
Methods offered at NGI
Exome

sequencing
Nanoporesequencing
ATAC-seq
Metagenomics
ChIP-seqBisulphite

sequencing
RAD-seq
RNA-seq
de novo
Whole
Genome
seq
Data
analysis
included for
FREE
Just

Sequencing
Accredited methods
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)

• Min 1 ng input
• Min 10 μl
• 0.2-10 ng/μl
• Ins. size 200-800 bp
• 963 kr / prep
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Illumina HiSeq High
Output mode v4, SR
1x50bp
• 1226 kr / sample
(40M reads)
NGI stockholm
ChIP-seq: NGI Stockholm
• You do the ChIP, we do the seq

• Rubicon ThruPlex DNA (NGI Production)
• Typically run SE 50bp

• Start by organising a
planning meeting
https://ngisweden.scilifelab.se
NGI stockholm
ChIP-seq Pipeline
• Takes raw FastQ sequencing data as input

• Provides range of results

• Alignments (BAM)
• Peaks (optionally filtered)
• Quality Control
• Pipeline in use since early 2017 (on request)
NGI-ChIPseq
ChIP-seq Pipeline
FastQC

TrimGalore!

BWA

Samtools, Picard

Phantompeakqualtools 

deepTools

NGSPlot

MACS2

Bedtools

MultiQC
Sequence QC
Read trimming
Alignment
Sort, index, mark duplicates
Strand cross-correlation QC
Fingerprint, sample correlation
TSS / Gene profile plots
Peak calling
Filtering blacklisted regions
Reporting
NGI-ChIPseq
FastQ
BAM
HTML
BED
NGI stockholm
Nextflow
• Tool to manage computational pipelines

• Handles interaction with compute infrastructure

• Easy to learn how to run, minimal oversight required
NGI stockholm
Nextflow
https://www.nextflow.io/
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
}
https://www.nextflow.io/
NGI stockholm
Default: Run locally, assume
software is installed
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
Submit jobs to SLURM queue
Use environment modules
NGI stockholm
Nextflow
#!/usr/bin/env nextflow
input = Channel.fromFilePairs( params.reads )
process fastqc {
input:
file reads from input
output:
file "*_fastqc.{zip,html}" into results
script:
"""
fastqc -q $reads
"""
} process {
executor = 'slurm'
clusterOptions = { "-A b2017123" }
cpus = 1
memory = 8.GB
time = 2.h
$fastqc {
module = ['bioinfo-tools', ‘FastQC']
}
}
docker {
enabled = true
}
process {
container = 'biocontainers/fastqc'
cpus = 1
memory = 8.GB
time = 2.h
}
Run locally, use docker container
for all software dependencies
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
NGI-ChIPseq
https://github.com/SciLifeLab/NGI-ChIPseq
NGI stockholm
Running NGI-ChIPseq
Step 1: Install Nextflow

• Uppmax - load the Nextflow module
module load nextflow
• Anywhere (including Uppmax) - install Nextflow
curl -s https://get.nextflow.io | bash
Step 2: Try running NGI-ChIPseq pipeline

nextflow run SciLifeLab/NGI-ChIPseq --help
NGI stockholm
Running NGI-ChIPseq
Step 3: Choose your reference

• Common organism - use iGenomes
--genome GRCh37
• MACS peak calling config file
--macsconfig config.csv
Step 4: Organise your data

• One (if single-end) or two (if paired-end) FastQ per sample
• Everything in one directory, simple filenames help!
NGI stockholm
Running NGI-ChIPseq
Step 5: Run the pipeline on your data

• Remember to run detached from your terminal
screen / tmux / nohup
Step 6: Check your results

• Read the Nextflow log and check the MultiQC report
Step 7: Delete temporary files

• Delete the ./work directory, which holds all intermediates
NGI stockholm
Using UPPMAX
nextflow run SciLifeLab/NGI-ChIPseq
--project b2017123
--genome GRCh37 --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Default config is for UPPMAX

• Knows about central iGenomes references
• Uses centrally installed software
NGI stockholm
Using other clusters
nextflow run SciLifeLab/NGI-ChIPseq
-profile hebbe
--bwaindex ./ref --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run just about anywhere

• Supports local, SGE, LSF, SLURM, PBS/Torque,
HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
NGI stockholm
Using Docker
nextflow run SciLifeLab/NGI-ChIPseq
-profile docker
--fasta genome.fa --macsconfig p.txt
--reads "data/*_R{1,2}.fastq.gz"
• Can run anywhere with Docker

• Downloads required software and runs in a container
• Portable and reproducible.
NGI stockholm
Using AWS
nextflow run SciLifeLab/NGI-ChIPseq
-profile aws
--genome GRCh37 --macsconfig p.txt
--reads "s3://my-bucket/*_{1,2}.fq.gz"
--outdir "s3://my-bucket/results/"
• Runs on the AWS cloud with Docker

• Pay-as-you go, flexible computing
• Can launch from anywhere with minimal configuration
NGI stockholm
Input data
ERROR ~ Cannot find any reads matching: XXXX
NB: Path needs to be enclosed in quotes!
NB: Path requires at least one * wildcard!
If this is single-end data, please specify

--singleEnd on the command line.
--reads '*_R{1,2}.fastq.gz'
--reads '*.fastq.gz' --singleEnd
--reads *_R{1,2}.fastq.gz
--reads '*.fastq.gz'
--reads sample.fastq.gz
NGI stockholm
Read trimming
• Pipeline runs TrimGalore! to remove adapter
contamination and low quality bases automatically

• Use --notrim to disable this
• Some library preps also include additional adapters

--clip_r1 [int]
--clip_r2 [int]
--three_prime_clip_r1 [int]
--three_prime_clip_r2 [int]
NGI stockholm
Blacklist filtering
• Some parts of the reference genome collect incorrectly
mapped reads

• Good practice to remove these peaks
• Pipeline has ENCODE regions for Human & Mouse

• Can pass own BED file of custom regions

--blacklist_filtering
--blacklist regions.bed
NGI stockholm
Broad Peaks
• Some chromatin profiles don't have narrow, sharp peaks

• For example, H3K9me3 & H3K27me3
• MACS2 can call peaks in "broad peak" mode

• Pipeline uses default qvalue cutoff of 0.1
--broad
NGI stockholm
Extending Read Length
• When using single-end data, sequenced read length is
shorter than the sequence fragment length

• For DeepTools, need to "extend" the read length

• Set to 100bp by default. Use this parameter to
customise this value.
• Expected fragment length - sequence read length
--extendReadsLen [int]
NGI stockholm
Saving intermediates
• By default, the pipeline doesn't save some intermediate
files to your final results directory

• Reference genome indices that have been built
• FastQ files from TrimGalore!
• BAM files from STAR (we have BAMs from Picard)
--saveReference
--saveTrimmed
--saveAlignedIntermediates
NGI stockholm
Resuming pipelines
• If something goes wrong, you can resume a stopped
pipeline

• Will use cached versions of completed processes
• NB: Only one hyphen!
-resume
• Can resume specific past runs

• Use nextflow log to find job names
-resume job_name
NGI stockholm
Customising output
-name
Give a name to your run. Used in logs
and reports
--outdir Specify the directory for saved results
--saturation
Run saturation analysis, subsampling
reads from 10% - 100%
--email
Get e-mailed a summary report when
the pipeline finishes
NGI stockholm
Nextflow config files
• Can save a config file with defaults

• Anything with two hyphens is a params
params {
email = 'phil.ewels@scilifelab.se'
project = "b2017123"
}
process.$multiqc.module = []
./nextflow.config
~/.nextflow/config
-c /path/to/my.config
NGI stockholm
NGI-ChIPseq config
N E X T F L O W ~ version 0.26.1
Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a
=========================================
NGI-ChIPseq: ChIP-Seq Best Practice v1.4
=========================================
Run Name : deadly_bose
Reads : data/*fastq.gz
Data Type : Single-End
Genome : GRCh37
BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/
MACS Config : data/macsconfig.txt
Saturation analysis : false
MACS broad peaks : false
Blacklist filtering : false
Extend Reads : 100 bp
Current home : /home/phil
Current user : phil
Current path : /home/phil/demo_data/ChIP/Human/test
Working dir : /home/phil/demo_data/ChIP/Human/test/work
Output dir : ./results
R libraries : /home/phil/R/nxtflow_libs/
Script dir : /home/phil/GitHub/NGI-ChIPseq
Save Reference : false
Save Trimmed : false
Save Intermeds : false
Trim R1 : 0
Trim R2 : 0
Trim 3' R1 : 0
Trim 3' R2 : 0
Config Profile : UPPMAX
UPPMAX Project : b2017001
E-mail Address : phil.ewels@scilifelab.se
====================================
NGI stockholm
Version control
$fastqc.module = ['FastQC/0.7.2']
$trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2']
$bw.module = ['bwa/0.7.8', 'samtools/1.5']
$samtools.module = ['samtools/1.5', 'BEDTools/2.26.0']
$picard.module = ['picard/2.10.3', 'samtools/1.5']
$phantompeakqualtools.module = ['phantompeakqualtools/1.1']
$deepTools.module = ['deepTools/2.5.1']
$ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61']
$macs.module = ['MACS/2.1.0']
$saturation.module = ['MACS/2.1.0', 'samtools/1.5']
$saturation_r.module = ['R/3.2.3']
NGI stockholm
Version control
• Pipeline is always released under a stable version tag

• Software versions and code reproducible

• For full reproducibility, specify version revision when
running the pipeline
nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
Conclusion
• Use NGI-ChIPseq to prepare your data if you want:

• To not have to remember every parameter for every tool
• Extreme reproducibility
• Ability to run on virtually any environment
• Now running for all ChIPseq projects at NGI-Stockholm
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
MITLicence
Conclusion
NGI stockholm
SciLifeLab/NGI-RNAseq
https://github.com/
SciLifeLab/NGI-MethylSeq
SciLifeLab/NGI-smRNAseq
SciLifeLab/NGI-ChIPseq
Acknowledgements

Phil Ewels
Chuan Wang
Jakub Westholm
Rickard Hammarén
Max Käller
Denis Moreno
NGI Stockholm Genomics Applications
Development Group
support@ngisweden.se
http://opensource.scilifelab.se

More Related Content

PDF
Linux Kernel Cryptographic API and Use Cases
PPTX
Spy hard, challenges of 100G deep packet inspection on x86 platform
PDF
optimizing_ceph_flash
PDF
High Performance Erlang - Pitfalls and Solutions
PDF
NANO266 - Lecture 9 - Tools of the Modeling Trade
PPTX
Network OS Code Coverage demo using Bullseye tool
PDF
ELK: Moose-ively scaling your log system
PDF
Tuning TCP and NGINX on EC2
Linux Kernel Cryptographic API and Use Cases
Spy hard, challenges of 100G deep packet inspection on x86 platform
optimizing_ceph_flash
High Performance Erlang - Pitfalls and Solutions
NANO266 - Lecture 9 - Tools of the Modeling Trade
Network OS Code Coverage demo using Bullseye tool
ELK: Moose-ively scaling your log system
Tuning TCP and NGINX on EC2

What's hot (20)

PDF
Reproducible Computational Pipelines with Docker and Nextflow
PDF
Real-time streams and logs with Storm and Kafka
PDF
Lenovo system management solutions
PDF
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
PDF
The linux networking architecture
PDF
DPDK in Containers Hands-on Lab
PPTX
Spack - A Package Manager for HPC
PDF
State of Containers and the Convergence of HPC and BigData
PDF
SDAccel Design Contest: Vivado HLS
KEY
Twisted: a quick introduction
PDF
SDAccel Design Contest: Xilinx SDAccel
PDF
iptables and Kubernetes
PPTX
Addressing DHCP and DNS scalability issues in OpenStack Neutron
PPTX
Disaggregating Ceph using NVMeoF
PPTX
Inferno Scalable Deep Learning on Spark
PPTX
Feedback on Big Compute & HPC on Windows Azure
PDF
An Introduction to Twisted
PDF
Red Hat Storage 2014 - Product(s) Overview
PDF
Load Balancing 101
PDF
[En] IPVS for Docker Containers
Reproducible Computational Pipelines with Docker and Nextflow
Real-time streams and logs with Storm and Kafka
Lenovo system management solutions
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
The linux networking architecture
DPDK in Containers Hands-on Lab
Spack - A Package Manager for HPC
State of Containers and the Convergence of HPC and BigData
SDAccel Design Contest: Vivado HLS
Twisted: a quick introduction
SDAccel Design Contest: Xilinx SDAccel
iptables and Kubernetes
Addressing DHCP and DNS scalability issues in OpenStack Neutron
Disaggregating Ceph using NVMeoF
Inferno Scalable Deep Learning on Spark
Feedback on Big Compute & HPC on Windows Azure
An Introduction to Twisted
Red Hat Storage 2014 - Product(s) Overview
Load Balancing 101
[En] IPVS for Docker Containers
Ad

Similar to NBIS ChIP-seq course (20)

PDF
NBIS RNA-seq course
PDF
Lecture: NGS at the National Genomics Infrastructure
PDF
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
PDF
Standardising Swedish genomics analyses using nextflow
PDF
HMW-DNA for long-read single-molecule sequencing
PDF
Reproducible bioinformatics workflows with Nextflow and nf-core
PPTX
2011 jeroen vanhoudt_ngs
PPTX
ngs.pptx
PDF
DNA SEQUENCING_BASICS_NGS_SANGER_NGS_SLIDES
PPTX
2012 sept 18_thug_biotech
PPTX
How to cluster and sequence an ngs library (james hadfield160416)
PPTX
Next Generation Sequencing - An Overview
PDF
Sequencing @ BitLab
PDF
Overview of Next Gen Sequencing Data Analysis
PPTX
Whole genome sequencing of bacteria & analysis
PDF
20110524zurichngs 1st pub
PDF
NGS: Mapping and de novo assembly
PDF
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
PDF
Overview of the commonly used sequencing platforms, bioinformatic search tool...
PPTX
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
NBIS RNA-seq course
Lecture: NGS at the National Genomics Infrastructure
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Standardising Swedish genomics analyses using nextflow
HMW-DNA for long-read single-molecule sequencing
Reproducible bioinformatics workflows with Nextflow and nf-core
2011 jeroen vanhoudt_ngs
ngs.pptx
DNA SEQUENCING_BASICS_NGS_SANGER_NGS_SLIDES
2012 sept 18_thug_biotech
How to cluster and sequence an ngs library (james hadfield160416)
Next Generation Sequencing - An Overview
Sequencing @ BitLab
Overview of Next Gen Sequencing Data Analysis
Whole genome sequencing of bacteria & analysis
20110524zurichngs 1st pub
NGS: Mapping and de novo assembly
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Ad

More from Phil Ewels (13)

PDF
Reproducible bioinformatics for everyone: Nextflow & nf-core
PDF
ELIXIR Proteomics Community - Connection with nf-core
PDF
Coffee 'n code: Regexes
PDF
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
PDF
Nextflow Camp 2019: nf-core tutorial
PDF
EpiChrom 2019 - Updates in Epigenomics at the NGI
PDF
The future of genomics in the cloud
PDF
SciLifeLab NGI NovaSeq seminar
PDF
SBW 2016: MultiQC Workshop
PDF
Developing Reliable QC at the Swedish National Genomics Infrastructure
PDF
Using visual aids effectively
PDF
Analysis of ChIP-Seq Data
PPT
Internet McMenemy
Reproducible bioinformatics for everyone: Nextflow & nf-core
ELIXIR Proteomics Community - Connection with nf-core
Coffee 'n code: Regexes
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial
EpiChrom 2019 - Updates in Epigenomics at the NGI
The future of genomics in the cloud
SciLifeLab NGI NovaSeq seminar
SBW 2016: MultiQC Workshop
Developing Reliable QC at the Swedish National Genomics Infrastructure
Using visual aids effectively
Analysis of ChIP-Seq Data
Internet McMenemy

Recently uploaded (20)

PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
2. Earth - The Living Planet earth and life
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
HPLC-PPT.docx high performance liquid chromatography
Taita Taveta Laboratory Technician Workshop Presentation.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Phytochemical Investigation of Miliusa longipes.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
The scientific heritage No 166 (166) (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ECG_Course_Presentation د.محمد صقران ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
2. Earth - The Living Planet earth and life
2. Earth - The Living Planet Module 2ELS
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Cell Membrane: Structure, Composition & Functions
HPLC-PPT.docx high performance liquid chromatography

NBIS ChIP-seq course

  • 1. NGI stockholm NGI-ChIPseq Processing ChIP-seq data at the National Genomics Infrastructure Phil Ewels phil.ewels@scilifelab.se NBIS ChIP-seq tutorial 2017-11-29
  • 2. NGI stockholm SciLifeLab NGI Our mission is to offer a
 state-of-the-art infrastructure
 for massively parallel DNA sequencing and SNP genotyping, available to researchers all over Sweden
  • 3. NGI stockholm SciLifeLab NGI National resource State-of-the-art infrastructure Guidelines and support We provide 
 guidelines and support
 for sample collection, study design, protocol selection and bioinformatics analysis
  • 4. NGI stockholm NGI Organisation NGI Stockholm NGI Uppsala
  • 5. NGI stockholm NGI Organisation Funding Staff salaries Premises and service contracts Capital equipment Host universities SciLifeLab VR KAW User fees Reagent costs NGI Stockholm NGI Uppsala
  • 6. NGI stockholm Project timeline Sample QC Library preparation, Sequencing, Genotyping Data processing and primary analysis Scientific support and project consultation Data delivery
  • 7. NGI stockholm Methods offered at NGI Exome
 sequencing Nanoporesequencing ATAC-seq Metagenomics ChIP-seqBisulphite
 sequencing RAD-seq RNA-seq de novo Whole Genome seq Data analysis included for FREE Just
 Sequencing Accredited methods
  • 8. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Min 1 ng input • Min 10 μl • 0.2-10 ng/μl • Ins. size 200-800 bp • 963 kr / prep
  • 9. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Illumina HiSeq High Output mode v4, SR 1x50bp • 1226 kr / sample (40M reads)
  • 10. NGI stockholm ChIP-seq: NGI Stockholm • You do the ChIP, we do the seq • Rubicon ThruPlex DNA (NGI Production) • Typically run SE 50bp • Start by organising a planning meeting https://ngisweden.scilifelab.se
  • 11. NGI stockholm ChIP-seq Pipeline • Takes raw FastQ sequencing data as input • Provides range of results • Alignments (BAM) • Peaks (optionally filtered) • Quality Control • Pipeline in use since early 2017 (on request) NGI-ChIPseq
  • 12. ChIP-seq Pipeline FastQC TrimGalore! BWA Samtools, Picard Phantompeakqualtools  deepTools NGSPlot MACS2 Bedtools MultiQC Sequence QC Read trimming Alignment Sort, index, mark duplicates Strand cross-correlation QC Fingerprint, sample correlation TSS / Gene profile plots Peak calling Filtering blacklisted regions Reporting NGI-ChIPseq FastQ BAM HTML BED
  • 13. NGI stockholm Nextflow • Tool to manage computational pipelines • Handles interaction with compute infrastructure • Easy to learn how to run, minimal oversight required
  • 15. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } https://www.nextflow.io/
  • 16. NGI stockholm Default: Run locally, assume software is installed Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } Submit jobs to SLURM queue Use environment modules
  • 17. NGI stockholm Nextflow #!/usr/bin/env nextflow input = Channel.fromFilePairs( params.reads ) process fastqc { input: file reads from input output: file "*_fastqc.{zip,html}" into results script: """ fastqc -q $reads """ } process { executor = 'slurm' clusterOptions = { "-A b2017123" } cpus = 1 memory = 8.GB time = 2.h $fastqc { module = ['bioinfo-tools', ‘FastQC'] } } docker { enabled = true } process { container = 'biocontainers/fastqc' cpus = 1 memory = 8.GB time = 2.h } Run locally, use docker container for all software dependencies
  • 20. NGI stockholm Running NGI-ChIPseq Step 1: Install Nextflow • Uppmax - load the Nextflow module module load nextflow • Anywhere (including Uppmax) - install Nextflow curl -s https://get.nextflow.io | bash Step 2: Try running NGI-ChIPseq pipeline nextflow run SciLifeLab/NGI-ChIPseq --help
  • 21. NGI stockholm Running NGI-ChIPseq Step 3: Choose your reference • Common organism - use iGenomes --genome GRCh37 • MACS peak calling config file --macsconfig config.csv Step 4: Organise your data • One (if single-end) or two (if paired-end) FastQ per sample • Everything in one directory, simple filenames help!
  • 22. NGI stockholm Running NGI-ChIPseq Step 5: Run the pipeline on your data • Remember to run detached from your terminal screen / tmux / nohup Step 6: Check your results • Read the Nextflow log and check the MultiQC report Step 7: Delete temporary files • Delete the ./work directory, which holds all intermediates
  • 23. NGI stockholm Using UPPMAX nextflow run SciLifeLab/NGI-ChIPseq --project b2017123 --genome GRCh37 --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Default config is for UPPMAX • Knows about central iGenomes references • Uses centrally installed software
  • 24. NGI stockholm Using other clusters nextflow run SciLifeLab/NGI-ChIPseq -profile hebbe --bwaindex ./ref --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run just about anywhere • Supports local, SGE, LSF, SLURM, PBS/Torque, HTCondor, DRMAA, DNAnexus, Ignite, Kubernetes
  • 25. NGI stockholm Using Docker nextflow run SciLifeLab/NGI-ChIPseq -profile docker --fasta genome.fa --macsconfig p.txt --reads "data/*_R{1,2}.fastq.gz" • Can run anywhere with Docker • Downloads required software and runs in a container • Portable and reproducible.
  • 26. NGI stockholm Using AWS nextflow run SciLifeLab/NGI-ChIPseq -profile aws --genome GRCh37 --macsconfig p.txt --reads "s3://my-bucket/*_{1,2}.fq.gz" --outdir "s3://my-bucket/results/" • Runs on the AWS cloud with Docker • Pay-as-you go, flexible computing • Can launch from anywhere with minimal configuration
  • 27. NGI stockholm Input data ERROR ~ Cannot find any reads matching: XXXX NB: Path needs to be enclosed in quotes! NB: Path requires at least one * wildcard! If this is single-end data, please specify
 --singleEnd on the command line. --reads '*_R{1,2}.fastq.gz' --reads '*.fastq.gz' --singleEnd --reads *_R{1,2}.fastq.gz --reads '*.fastq.gz' --reads sample.fastq.gz
  • 28. NGI stockholm Read trimming • Pipeline runs TrimGalore! to remove adapter contamination and low quality bases automatically • Use --notrim to disable this • Some library preps also include additional adapters --clip_r1 [int] --clip_r2 [int] --three_prime_clip_r1 [int] --three_prime_clip_r2 [int]
  • 29. NGI stockholm Blacklist filtering • Some parts of the reference genome collect incorrectly mapped reads • Good practice to remove these peaks • Pipeline has ENCODE regions for Human & Mouse • Can pass own BED file of custom regions --blacklist_filtering --blacklist regions.bed
  • 30. NGI stockholm Broad Peaks • Some chromatin profiles don't have narrow, sharp peaks • For example, H3K9me3 & H3K27me3 • MACS2 can call peaks in "broad peak" mode • Pipeline uses default qvalue cutoff of 0.1 --broad
  • 31. NGI stockholm Extending Read Length • When using single-end data, sequenced read length is shorter than the sequence fragment length • For DeepTools, need to "extend" the read length • Set to 100bp by default. Use this parameter to customise this value. • Expected fragment length - sequence read length --extendReadsLen [int]
  • 32. NGI stockholm Saving intermediates • By default, the pipeline doesn't save some intermediate files to your final results directory • Reference genome indices that have been built • FastQ files from TrimGalore! • BAM files from STAR (we have BAMs from Picard) --saveReference --saveTrimmed --saveAlignedIntermediates
  • 33. NGI stockholm Resuming pipelines • If something goes wrong, you can resume a stopped pipeline • Will use cached versions of completed processes • NB: Only one hyphen! -resume • Can resume specific past runs • Use nextflow log to find job names -resume job_name
  • 34. NGI stockholm Customising output -name Give a name to your run. Used in logs and reports --outdir Specify the directory for saved results --saturation Run saturation analysis, subsampling reads from 10% - 100% --email Get e-mailed a summary report when the pipeline finishes
  • 35. NGI stockholm Nextflow config files • Can save a config file with defaults • Anything with two hyphens is a params params { email = 'phil.ewels@scilifelab.se' project = "b2017123" } process.$multiqc.module = [] ./nextflow.config ~/.nextflow/config -c /path/to/my.config
  • 36. NGI stockholm NGI-ChIPseq config N E X T F L O W ~ version 0.26.1 Launching `SciLifeLab/NGI-ChIPseq` [deadly_bose] - revision: 28e24c2a2a ========================================= NGI-ChIPseq: ChIP-Seq Best Practice v1.4 ========================================= Run Name : deadly_bose Reads : data/*fastq.gz Data Type : Single-End Genome : GRCh37 BWA Index : /sw/data/uppnex/igenomes//Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/ MACS Config : data/macsconfig.txt Saturation analysis : false MACS broad peaks : false Blacklist filtering : false Extend Reads : 100 bp Current home : /home/phil Current user : phil Current path : /home/phil/demo_data/ChIP/Human/test Working dir : /home/phil/demo_data/ChIP/Human/test/work Output dir : ./results R libraries : /home/phil/R/nxtflow_libs/ Script dir : /home/phil/GitHub/NGI-ChIPseq Save Reference : false Save Trimmed : false Save Intermeds : false Trim R1 : 0 Trim R2 : 0 Trim 3' R1 : 0 Trim 3' R2 : 0 Config Profile : UPPMAX UPPMAX Project : b2017001 E-mail Address : phil.ewels@scilifelab.se ====================================
  • 37. NGI stockholm Version control $fastqc.module = ['FastQC/0.7.2'] $trim_galore.module = ['TrimGalore/0.4.1', 'FastQC/0.7.2'] $bw.module = ['bwa/0.7.8', 'samtools/1.5'] $samtools.module = ['samtools/1.5', 'BEDTools/2.26.0'] $picard.module = ['picard/2.10.3', 'samtools/1.5'] $phantompeakqualtools.module = ['phantompeakqualtools/1.1'] $deepTools.module = ['deepTools/2.5.1'] $ngsplot.module = ['samtools/1.5', 'R/3.2.3', 'ngsplot/2.61'] $macs.module = ['MACS/2.1.0'] $saturation.module = ['MACS/2.1.0', 'samtools/1.5'] $saturation_r.module = ['R/3.2.3']
  • 38. NGI stockholm Version control • Pipeline is always released under a stable version tag • Software versions and code reproducible • For full reproducibility, specify version revision when running the pipeline nextflow run SciLifeLab/NGI-ChIPseq -r v1.3
  • 39. Conclusion • Use NGI-ChIPseq to prepare your data if you want: • To not have to remember every parameter for every tool • Extreme reproducibility • Ability to run on virtually any environment • Now running for all ChIPseq projects at NGI-Stockholm
  • 41. Conclusion NGI stockholm SciLifeLab/NGI-RNAseq https://github.com/ SciLifeLab/NGI-MethylSeq SciLifeLab/NGI-smRNAseq SciLifeLab/NGI-ChIPseq Acknowledgements Phil Ewels Chuan Wang Jakub Westholm Rickard Hammarén Max Käller Denis Moreno NGI Stockholm Genomics Applications Development Group support@ngisweden.se http://opensource.scilifelab.se