SlideShare a Scribd company logo
Assembly – before and after
Lex Nederbragt
lex.nederbragt@ibv.uio.no
@lexnederbragt
A warning
The list is by no means complete
Nor do we have experience with all the programs mentioned
Sample DNA Reads
Genome
assembly
Sequencing AssemblyDNA isolation
QC QCQC
Reads
Genome
assembly
Assembly
QC
Fastqc
Prinseq
Many others…
www.nipgr.res.in/ngsqctoolkit.html
preqc (sga)
http://arxiv.org/abs/1307.8026
Reads
Genome
assembly
Assembly
Grooming
Format conversion
http://en.wikipedia.org/wiki/FASTQ_format
Fastq format hell
Adapter/quality trimming
http://www.biostars.org/p/53528/
Celera assembler
Overlap based trimming
Fastx Toolkit
Seqtk
PrinSeq
NGS QC Toolkit
Trimmomatic
BioPieces
Cutadapt
…
…
Mate pair splitting and orientation
150 – 600 bases
Illumina paired end reads
2 – 40 kilobases
Illumina mate pair reads
2 – 40 kilobases
454 mate pair reads
linker
Mate pair splitting and orientation
Illumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads
‘contamination’
Mate pair splitting and orientation
Illumina paired end reads
Illumina mate pair reads
454 mate pair reads
linker
junctionjunction
+ +
paired end reads
‘contamination’
Check what orientation
your assembler expects
for the reads!
Reads
Genome
assembly
Assembly
Preparing
Error-correction
Stand-alone or built into assembler
Merging pairs
List from Torsten Seeman’s blog
http://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html
COPE http://sourceforge.net/projects/coperead/
SeqPrep https://github.com/jstjohn/SeqPrep
FLASH http://www.cbcb.umd.edu/software/flash
fastq-join http://code.google.com/p/ea-utils/wiki/FastqJoin
PANDAseq https://github.com/neufeld/pandaseq
mergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py
Recent addition
Extend reads
http://140.116.235.124/~tliu/arf-pe/
Digital normalisation
http://arxiv.org/abs/1203.4802
Estimate kmer to use
preqc (SGA)
http://arxiv.org/abs/1307.8026
Reads
Genome
assembly
Assembly
What can the reads tell us about the genome
kmer-based
preqc (SGA)
Kmerspectrumanalyzer
http://arxiv.org/abs/1307.8026
Khmer from Titus
Reads
Genome
assembly
Assembly
This
talk
Reads
Genome
assembly
Assembly
QC
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Assemblathon stats
http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assembla
thon_stats.pl
OR
https://github.com/lexnederbragt/sequencetools/
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Gap closing
IMAGE2
Correcting bases
Quiver from Pacific Biosciences
Separate scaffolding
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Assembly merging/reconciliation
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Mapped genomic reads
FRCBAM
Mapped transcriptomic reads
Gene finding
Binning
Bacteroides
Proteobacteria
Cyanobacteria
Per-con g read depth
Nederbragt et al, 2010
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Genome browser(s)
IGV
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Comparative measures
Log Average Probability (LAP)
Assembly Likelihood Evaluation (ALE)
See also Howison, Zapata2 and Dunn (2013) Toward a
statistically explicit understanding of de novo sequence
assembly doi: 10.1093/bioinformatics/btt525
Genome
assembly
Comparing to each other
Metrics
Merging
Improvement
Visualization
Validation
Comparing to reference
Reference comparison
Mauve assembly metrics
Review
Too many tools…
http://seqanswers.com/wiki/Software/list
Too many tools…
http://wwwdev.ebi.ac.uk/fg/hts_mappers
88 short-read mappers
Embargo!
Benchmarking, anyone?
All-in-one assembly pipeline
doi:10.1186/1471-2105-15-126

More Related Content

PPTX
2012 august 16 systems biology rna seq v2
PPTX
Master Thesis Presentation
PDF
Genome assembly: then and now — with notes — v1.1
PDF
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
PDF
Thoughts on the feasibility of an Assemblathon 3 contest
PDF
What's in a name? Better vocabularies = better bioinformatics?
PPTX
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
2012 august 16 systems biology rna seq v2
Master Thesis Presentation
Genome assembly: then and now — with notes — v1.1
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Thoughts on the feasibility of an Assemblathon 3 contest
What's in a name? Better vocabularies = better bioinformatics?
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014

Viewers also liked (20)

PDF
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
PPTX
Genome resource databases in horticutural crops
PPTX
Genome Database Systems
PDF
Gene expression introduction
PPT
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
PPTX
Linkers
PDF
2015 12-09 nmdd
PDF
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
PPTX
Bioinformatics and functional genomics
PDF
Kogo 2013 RNA-seq analysis
PDF
Genome Assembly
PPTX
Whole genome sequencing of bacteria & analysis
PPSX
Functional genomics
PPTX
Illumina Sequencing
PPTX
Parks kmer metagenomics
PDF
Dynamic Linker
PPTX
Genomics seminar
PPTX
Functional genomics
PPTX
Types of genomics ppt
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Genome resource databases in horticutural crops
Genome Database Systems
Gene expression introduction
Talk by Jonathan Eisen for GSAC2000 on "Phylogenomics"
Linkers
2015 12-09 nmdd
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Bioinformatics and functional genomics
Kogo 2013 RNA-seq analysis
Genome Assembly
Whole genome sequencing of bacteria & analysis
Functional genomics
Illumina Sequencing
Parks kmer metagenomics
Dynamic Linker
Genomics seminar
Functional genomics
Types of genomics ppt
Ad

Similar to Assembly: before and after (17)

PDF
NGS Assembly Practical Lesson (EBI course)
PPTX
Next-generation sequencing format and visualization with ngs.plot
PDF
NGS: Mapping and de novo assembly
PPTX
Bioinfo ngs data format visualization v2
PPTX
from genome sequencing to genome assembly
PPTX
How to cluster and sequence an ngs library (james hadfield160416)
PPT
20100516 bioinformatics kapushesky_lecture08
PDF
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
PPTX
Next-generation sequencing data format and visualization with ngs.plot 2015
PPTX
Sfu ngs course_workshop tutorial_2.1
PDF
Genome Assembly 2018
PPTX
Intro to illumina sequencing
PPT
Assembly and finishing
PPTX
ECCMID 2015 - So I have sequenced my genome ... what now?
PPTX
SeRC: de novo assembly workshop. Francesco Vezzi
PPT
NGS - QC & Dataformat
PPTX
Sequence assembly
NGS Assembly Practical Lesson (EBI course)
Next-generation sequencing format and visualization with ngs.plot
NGS: Mapping and de novo assembly
Bioinfo ngs data format visualization v2
from genome sequencing to genome assembly
How to cluster and sequence an ngs library (james hadfield160416)
20100516 bioinformatics kapushesky_lecture08
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Next-generation sequencing data format and visualization with ngs.plot 2015
Sfu ngs course_workshop tutorial_2.1
Genome Assembly 2018
Intro to illumina sequencing
Assembly and finishing
ECCMID 2015 - So I have sequenced my genome ... what now?
SeRC: de novo assembly workshop. Francesco Vezzi
NGS - QC & Dataformat
Sequence assembly
Ad

More from Lex Nederbragt (13)

PPTX
Coding & Best Practice in Programming in the NGS era
PPTX
Why of version control
PPTX
Improving and validating the Atlantic Cod genome assembly using PacBio
PPTX
Repeat after me: Is our research reproducible (enough)?
PPTX
A different kettle of fish entirely: bioinformatic challenges and solutions f...
PPTX
Combining PacBio with short read technology for improved de novo genome assembly
PPTX
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
PPTX
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
PPTX
How and why I use blogging
PPTX
How to sequence a large eukaryotic genome
PPTX
Assembly of metagenomes
PPTX
NGS techniques and data
PPTX
NGS: bioinformatic challenges
Coding & Best Practice in Programming in the NGS era
Why of version control
Improving and validating the Atlantic Cod genome assembly using PacBio
Repeat after me: Is our research reproducible (enough)?
A different kettle of fish entirely: bioinformatic challenges and solutions f...
Combining PacBio with short read technology for improved de novo genome assembly
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
How and why I use blogging
How to sequence a large eukaryotic genome
Assembly of metagenomes
NGS techniques and data
NGS: bioinformatic challenges

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A comparative analysis of optical character recognition models for extracting...
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25-Week II
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Assembly: before and after