Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
suryasaha@cornell.edu // Twitter:@SahaSurya
BTI Plant Bioinformatics Course 2016
http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
1953
DNA
Structure
discovery
1977
2012
Sanger DNA
sequencing by
chain-terminating
inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987
Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide concept: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
3/29/2016 BTI Plant Bioinformatics Course 2016 2
Pinus
taeda
(24 Gb)
2014
Nanopore
MinION
2015
10X
Genomics
First generation sequencing
3/29/2016 BTI Plant Bioinformatics Course 2016 3
Sanger. Annu Rev Biochem. 1988;57:1-28.
Thanks to Nick Loman for the mention
Maxam-Gilbert method
3/29/2016 BTI Plant Bioinformatics Course 2016 4
Maxam-Gilbert method
3/29/2016 BTI Plant Bioinformatics Course 2016 5
http://en.wikipedia.org/wiki/File:Maxam-
Gilbert_sequencing_en.svg
https://www.nationaldiagnostics.com/electrophoresis
/article/maxam-gilbert-sequencing
Sanger method
3/29/2016 BTI Plant Bioinformatics Course 2016 6
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
3/29/2016 BTI Plant Bioinformatics Course 2016 7
http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg
http://en.wikipedia.org/wiki/File:
Radioactive_Fluorescent_Seq.jpg
First generation sequencing
• Very high quality sequences (99.999% or Q50)
• Very very low throughput
3/29/2016 BTI Plant Bioinformatics Course 2016 8
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400
http://www.hindawi.com/journals/bmri/2012/251364/tab1/
Next generation sequencing
3/29/2016 BTI Plant Bioinformatics Course 2016 9
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– Oxford Nanopore
3/29/2016 BTI Plant Bioinformatics Course 2016 10
http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-
diepart-2
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
3/29/2016 BTI Plant Bioinformatics Course 2016 11
http://www.genengnews.com/
GS FLX
Titanium
https://mariamuir.com/wp-
content/uploads/2013/04/rip.gif
Illumina
3/29/2016 BTI Plant Bioinformatics Course 2016 12
Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB
Number
of Reads/
Flow cell
25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion
Read
Length
2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
2500
3000
4000
500
Illumina
3/29/2016 BTI Plant Bioinformatics Course 2016 13
Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB
Number
of Reads/
Flow cell
25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion
Read
Length
2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
2500
3000
4000
$1000 human
genome??
500
Illumina
3/29/2016 BTI Plant Bioinformatics Course 2016 14
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
Illumina
3/29/2016 BTI Plant Bioinformatics Course 2016 15
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
Illumina:TruSeqLongRead
3/29/2016 BTI Plant Bioinformatics Course 2016 16
Voskoboynik eLife 2013;2:e00569
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
3/29/2016 BTI Plant Bioinformatics Course 2016 17
http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif
RS II
Sequel
Pacific Biosciences SMRT sequencing
Error correction methods
3/29/2016 BTI Plant Bioinformatics Course 2016 18
Hierarchical genome-assembly
process (HGAP)
English et al., PLOS One. 2012
PBJelly
Pacific Biosciences SMRT sequencing
Error correction methods
3/29/2016 BTI Plant Bioinformatics Course 2016 19
PBcRPipeline
3/29/2016 Centre for Agricultural Bioinformatics, Pusa 20
Pacific Biosciences SMRT sequencing
Read Lengths
Oxford Nanopore
3/29/2016 Centre for Agricultural Bioinformatics, Pusa 21
https://www.nanoporetech.com/
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
3/29/2016 BTI Plant Bioinformatics Course 2016 22
Next generation sequencing
3/29/2016 BTI Plant Bioinformatics Course 2016 23
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing
24h 700 bp Q20-Q30 1 GB $10
Illumina Miseq 27h 2x300bp > Q30 15 GB $0.15
Illumina Hiseq
2500
1 - 10days 2x250bp >Q30 3000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences
30m - 4h 10kb - >40kb
>Q50 consensus
>Q10 single
500 - 1000MB
/SMRT cell
$0.13 - $0.60
http://www.hindawi.com/journals/bmri/2012/251364/
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431227
Note: Some figures might be out of date
Long range scaffolding
3/29/2016 BTI Plant Bioinformatics Course 2016 24
Hi-C Crosslinking
3/29/2016 BTI Plant Bioinformatics Course 2016 25
3/29/2016 BTI Plant Bioinformatics Course 2016 26
http://mms.businesswire.com/media/20150225005296/en
/454639/5/GemCodePlatform.jpg
• Long read information from short reads
using 14bp bar codes
• Very low input DNA (ng) and 20 minute
library preparation time
• 1ng of DNA is split across 100,000 Gel
Coated Beads (GEMs)
• Chromium instrument for single-cell
RNAseq
GemCode
3/29/2016 BTI Plant Bioinformatics Course 2016 27
http://mms.businesswire.com/media/20150225005296/en
/454639/5/GemCodePlatform.jpg
GemCode
http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html
3/29/2016 BTI Plant Bioinformatics Course 2016 28
http://www.bionanogenomics.com/technology/why-genome-mapping/
3/29/2016 BTI Plant Bioinformatics Course 2016 29
Human MHC map
• Sample prep requires very high molecular weight DNA
• Nicks at 10 sites / 100kb
• Individual molecules are assembles into optical maps
• Optical maps and sequences are merged in a hybrid assembly
http://www.bionanogenomics.com/technology/why-genome-mapping/
Many Others..
• Ion Torrent Proton/PGM
• Supporting technologies
– Nabsys
– OpGen
– Fluidigm
3/29/2016 BTI Plant Bioinformatics Course 2016 30
http://nextgenseek.com/2012/11/did-you-know-there-are-
at-least-14-next-gen-sequence-technology-companies/
Sequencing Trends
3/29/2016 BTI Plant Bioinformatics Course 2016 31
https://www.google.com/trends/
3/29/2016 BTI Plant Bioinformatics Course 2016 32
0
5000
10000
15000
20000
25000
30000
35000
2008 2009 2010 2011 2012 2013 2014 2015
Number of Publications
Illumina Pacific Biosciences Roche 454 Ion Torrent
-2000
-1000
0
1000
2000
3000
4000
5000
6000
2009 2010 2011 2012 2013 2014 2015
Increase in Number of Publications
Illumina Pacific Biosciences Roche 454 Ion Torrent
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
2009 2010 2011 2012 2013 2014 2015
% Increase in Number of Publications
Pacific Biosciences Roche 454 Ion Torrent
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
3/29/2016 33BTI Plant Bioinformatics Course 2016
3/29/2016 BTI Plant Bioinformatics Course 2016 34
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-8-125
So What Sequencer Do I Use??
Microbial genome
• Draft genome
– Illumina Miseq (100-130X)
– Illumina Hiseq (<200X)
• Complete genome
– Pacific Biosciences (80-100X)
• Amplicons (16S, ITS)
– Illumina Miseq
Eukaryotic genome
• Denovo assembly
– Pacific Biosciences (70-80X)
– Illumina Hiseq (100X+)
– 10X Genomics
– Bionano
• Genotyping (GBS)
– Illumina Hiseq
• BACs
– Pacific Biosciences
3/29/2016 BTI Plant Bioinformatics Course 2016 35
$$$$ ????
3/29/2016 BTI Plant Bioinformatics Course 2016 36
The diploid
reference
genome
Cornell Sequencing Core
• Illumina Hiseq 2500 (Rapid run and High output)
• Illumina Miseq
• Illumina Nextseq 500
• 10X Genomics GemCode
3/29/2016 BTI Plant Bioinformatics Course 2016 37
http://www.biotech.cornell.edu/brc/g
enomics/services/price-list#overlay-
context=brc/genomics-facility/next-
generation-sequencing
$
$
$
Library Types
Single end
Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
3/29/2016 38
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2016
Implications of Choice of Library
3/29/2016 39
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers) or Optical maps
NNNNN NN
BTI Plant Bioinformatics Course 2016
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
3/29/2016 40
Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
Sequencing
BTI Plant Bioinformatics Course 2016
Data!!
3/29/2016 BTI Plant Bioinformatics Course 2016 41
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
3/29/2016 42
Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2016
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length is identical to sequence
3/29/2016 43
Slide credit: Aureliano Bombarely
File Formats
BTI Plant Bioinformatics Course 2016
3/29/2016 44
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2016
Quality control: Encoding
3/29/2016 45
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2016
3/29/2016 46
Quality control: Encoding
http://en.wikipedia.org/wiki/Phred_quality_score
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated error probability of a base
BTI Plant Bioinformatics Course 2016
Pre-processing: Tools
Trimming
• FastQC
• FASTX toolkit
• Trimmomatic
• Scythe
Joining paired-end reads
• fastq-join
• FLASH
• PANDAseq
3/29/2016 47BTI Plant Bioinformatics Course 2016
Thank you!!
3/29/2016 BTI Plant Bioinformatics Course 2016 48

More Related Content

PDF
Next Generation Sequencing
PDF
Sequencing and Bioinformatics PGRP Summer 2015
PDF
ECCB 2010 Next-gen sequencing Tutorial
PPTX
Toolbox for bacterial population analysis using NGS
PDF
Sequencing: The Next Generation 2015
PDF
Next-generation sequencing and quality control: An Introduction (2016)
PPT
High Throughput Sequencing Technologies: What We Can Know
PPTX
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Next Generation Sequencing
Sequencing and Bioinformatics PGRP Summer 2015
ECCB 2010 Next-gen sequencing Tutorial
Toolbox for bacterial population analysis using NGS
Sequencing: The Next Generation 2015
Next-generation sequencing and quality control: An Introduction (2016)
High Throughput Sequencing Technologies: What We Can Know
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...

What's hot (20)

PPTX
Future of metagenomics
PDF
NGS - Basic principles and sequencing platforms
PPTX
Correlagen next gen presentation 042711
PPTX
Ngs microbiome
PDF
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
PPTX
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
PPTX
Introduction to second generation sequencing
PDF
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
PDF
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
PPTX
A Comparison of NGS Platforms.
PPTX
Bioinfo ngs data format visualization v2
PPTX
Next Gen Sequencing (NGS) Technology Overview
PPTX
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
PDF
Next-generation sequencing course, part 1: technologies
PPTX
Next-generation sequencing from 2005 to 2020
PDF
Thoughts on the recent announcements by Oxford Nanopore Technologies
PPT
Genome walking – a new strategy for identification of nucleotide sequence in ...
PPTX
PDF
Semiconductor Sequencing Applications for Plant Sciences
PPT
CSU Next Generation Sequencing Core 06/09/2015
Future of metagenomics
NGS - Basic principles and sequencing platforms
Correlagen next gen presentation 042711
Ngs microbiome
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Introduction to second generation sequencing
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
A Comparison of NGS Platforms.
Bioinfo ngs data format visualization v2
Next Gen Sequencing (NGS) Technology Overview
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
Next-generation sequencing course, part 1: technologies
Next-generation sequencing from 2005 to 2020
Thoughts on the recent announcements by Oxford Nanopore Technologies
Genome walking – a new strategy for identification of nucleotide sequence in ...
Semiconductor Sequencing Applications for Plant Sciences
CSU Next Generation Sequencing Core 06/09/2015
Ad

Viewers also liked (14)

PDF
Stan Kahler CV 2016
PDF
50 Most Creative CEO's to watch
PDF
International Placements in Chemistry & "Kultur Brillen"
PDF
Information about Admission in NUST Engineering_2016
PPTX
Drug Designing
PDF
Using Broadcast Media in Teaching: An example from the biosciences
PDF
Veterinary vaccine testing and approval
PDF
16_05-03_PR_Traverse Announces ARAD Award_FINAL
PPTX
Introducing the Biosciences eastern and central Africa–International Livestoc...
PDF
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
PDF
Quality Control of Sequencing Data
PDF
Community resources for all y’all Omics
PDF
Introduction to next generation sequencing
PDF
Quality Control of Sequencing Data
Stan Kahler CV 2016
50 Most Creative CEO's to watch
International Placements in Chemistry & "Kultur Brillen"
Information about Admission in NUST Engineering_2016
Drug Designing
Using Broadcast Media in Teaching: An example from the biosciences
Veterinary vaccine testing and approval
16_05-03_PR_Traverse Announces ARAD Award_FINAL
Introducing the Biosciences eastern and central Africa–International Livestoc...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Quality Control of Sequencing Data
Community resources for all y’all Omics
Introduction to next generation sequencing
Quality Control of Sequencing Data
Ad

Similar to Sequencing 2016 (20)

PDF
Sequencing 2017
PDF
Sequencing, Genome Assembly and the SGN Platform
PDF
Sequencing: The Next Generation
PPTX
PLANT GENOME SEQUENCING AND DATA MINING.pptx
PDF
Sequencing
PPTX
Rnaseq forgenefinding
PPTX
from genome sequencing to genome assembly
PDF
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
PPTX
Lecture-1_NGS.pptx important document it
PPT
Next generation seqencing tecnologies and application vegetable crops
PDF
Genome Assembly
PPT
20100516 bioinformatics kapushesky_lecture08
PPTX
Data Management for Quantitative Biology - Data sources (Next generation tech...
PDF
Next-generation genomics: an integrative approach
PDF
Goodwin2016 ngs 10 years
PPTX
Closing the Gap in Time: From Raw Data to Real Science
PPTX
ngs.pptx
PPTX
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
PPTX
NGS overview
PDF
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Sequencing 2017
Sequencing, Genome Assembly and the SGN Platform
Sequencing: The Next Generation
PLANT GENOME SEQUENCING AND DATA MINING.pptx
Sequencing
Rnaseq forgenefinding
from genome sequencing to genome assembly
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
Lecture-1_NGS.pptx important document it
Next generation seqencing tecnologies and application vegetable crops
Genome Assembly
20100516 bioinformatics kapushesky_lecture08
Data Management for Quantitative Biology - Data sources (Next generation tech...
Next-generation genomics: an integrative approach
Goodwin2016 ngs 10 years
Closing the Gap in Time: From Raw Data to Real Science
ngs.pptx
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
NGS overview
Processing Amplicon Sequence Data for the Analysis of Microbial Communities

More from Surya Saha (17)

PDF
An open access resource portal for arthropod vectors and agricultural pathosy...
PDF
Functional annotation of invertebrate genomes
PDF
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
PPTX
Updates on Citrusgreening.org database from USDA NIFA project meeting
PPTX
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
PDF
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
PDF
Visualization of insect vector-plant pathogen interactions in the citrus gree...
PDF
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
PDF
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
PDF
Tomato Genome Build SL3.0
PDF
Tomato Genome SL2.50 and Beyond…
PDF
Quality Control of NGS Data
PDF
Quality Control of NGS Data Solutions
PDF
ICAR Soybean Indore 2014
PDF
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
PDF
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
PDF
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
An open access resource portal for arthropod vectors and agricultural pathosy...
Functional annotation of invertebrate genomes
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
Tomato Genome Build SL3.0
Tomato Genome SL2.50 and Beyond…
Quality Control of NGS Data
Quality Control of NGS Data Solutions
ICAR Soybean Indore 2014
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Recently uploaded (20)

PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
Trump Administration's workforce development strategy
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
HVAC Specification 2024 according to central public works department
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
IGGE1 Understanding the Self1234567891011
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Hazard Identification & Risk Assessment .pdf
AI-driven educational solutions for real-life interventions in the Philippine...
Trump Administration's workforce development strategy
Virtual and Augmented Reality in Current Scenario
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Weekly quiz Compilation Jan -July 25.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
My India Quiz Book_20210205121199924.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Cambridge-Practice-Tests-for-IELTS-12.docx
HVAC Specification 2024 according to central public works department
A powerpoint presentation on the Revised K-10 Science Shaping Paper
IGGE1 Understanding the Self1234567891011
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
History, Philosophy and sociology of education (1).pptx
Chinmaya Tiranga quiz Grand Finale.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Hazard Identification & Risk Assessment .pdf

Sequencing 2016

  • 1. Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY suryasaha@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2016 http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
  • 2. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987 Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide concept: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 3/29/2016 BTI Plant Bioinformatics Course 2016 2 Pinus taeda (24 Gb) 2014 Nanopore MinION 2015 10X Genomics
  • 3. First generation sequencing 3/29/2016 BTI Plant Bioinformatics Course 2016 3 Sanger. Annu Rev Biochem. 1988;57:1-28. Thanks to Nick Loman for the mention
  • 4. Maxam-Gilbert method 3/29/2016 BTI Plant Bioinformatics Course 2016 4
  • 5. Maxam-Gilbert method 3/29/2016 BTI Plant Bioinformatics Course 2016 5 http://en.wikipedia.org/wiki/File:Maxam- Gilbert_sequencing_en.svg https://www.nationaldiagnostics.com/electrophoresis /article/maxam-gilbert-sequencing
  • 6. Sanger method 3/29/2016 BTI Plant Bioinformatics Course 2016 6 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  • 7. Sanger method 3/29/2016 BTI Plant Bioinformatics Course 2016 7 http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg http://en.wikipedia.org/wiki/File: Radioactive_Fluorescent_Seq.jpg
  • 8. First generation sequencing • Very high quality sequences (99.999% or Q50) • Very very low throughput 3/29/2016 BTI Plant Bioinformatics Course 2016 8 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400 http://www.hindawi.com/journals/bmri/2012/251364/tab1/
  • 9. Next generation sequencing 3/29/2016 BTI Plant Bioinformatics Course 2016 9
  • 10. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS I/RS II – Ion Torrent Proton/PGM – SOLiD – Oxford Nanopore 3/29/2016 BTI Plant Bioinformatics Course 2016 10 http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must- diepart-2
  • 11. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 3/29/2016 BTI Plant Bioinformatics Course 2016 11 http://www.genengnews.com/ GS FLX Titanium https://mariamuir.com/wp- content/uploads/2013/04/rip.gif
  • 12. Illumina 3/29/2016 BTI Plant Bioinformatics Course 2016 12 Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB Number of Reads/ Flow cell 25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion Read Length 2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp Cost $99K $250K $740K $10M (10 units) Source: Illumina 2500 3000 4000 500
  • 13. Illumina 3/29/2016 BTI Plant Bioinformatics Course 2016 13 Output 0.3-15 Gb 20-120 GB 10-1500 GB 900-1800 GB Number of Reads/ Flow cell 25 Million 130-400 Million 300 million – 2.5 Billion 3 Billion Read Length 2x300 bp 2x150 bp 2x250 - 2x125 bp 2x150 bp Cost $99K $250K $740K $10M (10 units) Source: Illumina 2500 3000 4000 $1000 human genome?? 500
  • 14. Illumina 3/29/2016 BTI Plant Bioinformatics Course 2016 14 Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
  • 15. Illumina 3/29/2016 BTI Plant Bioinformatics Course 2016 15 Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
  • 16. Illumina:TruSeqLongRead 3/29/2016 BTI Plant Bioinformatics Course 2016 16 Voskoboynik eLife 2013;2:e00569
  • 17. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 3/29/2016 BTI Plant Bioinformatics Course 2016 17 http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif RS II Sequel
  • 18. Pacific Biosciences SMRT sequencing Error correction methods 3/29/2016 BTI Plant Bioinformatics Course 2016 18 Hierarchical genome-assembly process (HGAP) English et al., PLOS One. 2012 PBJelly
  • 19. Pacific Biosciences SMRT sequencing Error correction methods 3/29/2016 BTI Plant Bioinformatics Course 2016 19 PBcRPipeline
  • 20. 3/29/2016 Centre for Agricultural Bioinformatics, Pusa 20 Pacific Biosciences SMRT sequencing Read Lengths
  • 21. Oxford Nanopore 3/29/2016 Centre for Agricultural Bioinformatics, Pusa 21 https://www.nanoporetech.com/ http://erlichya.tumblr.com/post/66376172948/hands-on- experience-with-oxford-nanopore-minion http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
  • 22. 3/29/2016 BTI Plant Bioinformatics Course 2016 22
  • 23. Next generation sequencing 3/29/2016 BTI Plant Bioinformatics Course 2016 23 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 1 GB $10 Illumina Miseq 27h 2x300bp > Q30 15 GB $0.15 Illumina Hiseq 2500 1 - 10days 2x250bp >Q30 3000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 30m - 4h 10kb - >40kb >Q50 consensus >Q10 single 500 - 1000MB /SMRT cell $0.13 - $0.60 http://www.hindawi.com/journals/bmri/2012/251364/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431227 Note: Some figures might be out of date
  • 24. Long range scaffolding 3/29/2016 BTI Plant Bioinformatics Course 2016 24
  • 25. Hi-C Crosslinking 3/29/2016 BTI Plant Bioinformatics Course 2016 25
  • 26. 3/29/2016 BTI Plant Bioinformatics Course 2016 26 http://mms.businesswire.com/media/20150225005296/en /454639/5/GemCodePlatform.jpg • Long read information from short reads using 14bp bar codes • Very low input DNA (ng) and 20 minute library preparation time • 1ng of DNA is split across 100,000 Gel Coated Beads (GEMs) • Chromium instrument for single-cell RNAseq GemCode
  • 27. 3/29/2016 BTI Plant Bioinformatics Course 2016 27 http://mms.businesswire.com/media/20150225005296/en /454639/5/GemCodePlatform.jpg GemCode http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html
  • 28. 3/29/2016 BTI Plant Bioinformatics Course 2016 28 http://www.bionanogenomics.com/technology/why-genome-mapping/
  • 29. 3/29/2016 BTI Plant Bioinformatics Course 2016 29 Human MHC map • Sample prep requires very high molecular weight DNA • Nicks at 10 sites / 100kb • Individual molecules are assembles into optical maps • Optical maps and sequences are merged in a hybrid assembly http://www.bionanogenomics.com/technology/why-genome-mapping/
  • 30. Many Others.. • Ion Torrent Proton/PGM • Supporting technologies – Nabsys – OpGen – Fluidigm 3/29/2016 BTI Plant Bioinformatics Course 2016 30 http://nextgenseek.com/2012/11/did-you-know-there-are- at-least-14-next-gen-sequence-technology-companies/
  • 31. Sequencing Trends 3/29/2016 BTI Plant Bioinformatics Course 2016 31 https://www.google.com/trends/
  • 32. 3/29/2016 BTI Plant Bioinformatics Course 2016 32 0 5000 10000 15000 20000 25000 30000 35000 2008 2009 2010 2011 2012 2013 2014 2015 Number of Publications Illumina Pacific Biosciences Roche 454 Ion Torrent -2000 -1000 0 1000 2000 3000 4000 5000 6000 2009 2010 2011 2012 2013 2014 2015 Increase in Number of Publications Illumina Pacific Biosciences Roche 454 Ion Torrent 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 2009 2010 2011 2012 2013 2014 2015 % Increase in Number of Publications Pacific Biosciences Roche 454 Ion Torrent
  • 33. Real cost of Sequencing!! Sboner, Genome Biology, 2011 3/29/2016 33BTI Plant Bioinformatics Course 2016
  • 34. 3/29/2016 BTI Plant Bioinformatics Course 2016 34 https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-8-125
  • 35. So What Sequencer Do I Use?? Microbial genome • Draft genome – Illumina Miseq (100-130X) – Illumina Hiseq (<200X) • Complete genome – Pacific Biosciences (80-100X) • Amplicons (16S, ITS) – Illumina Miseq Eukaryotic genome • Denovo assembly – Pacific Biosciences (70-80X) – Illumina Hiseq (100X+) – 10X Genomics – Bionano • Genotyping (GBS) – Illumina Hiseq • BACs – Pacific Biosciences 3/29/2016 BTI Plant Bioinformatics Course 2016 35 $$$$ ????
  • 36. 3/29/2016 BTI Plant Bioinformatics Course 2016 36 The diploid reference genome
  • 37. Cornell Sequencing Core • Illumina Hiseq 2500 (Rapid run and High output) • Illumina Miseq • Illumina Nextseq 500 • 10X Genomics GemCode 3/29/2016 BTI Plant Bioinformatics Course 2016 37 http://www.biotech.cornell.edu/brc/g enomics/services/price-list#overlay- context=brc/genomics-facility/next- generation-sequencing $ $ $
  • 38. Library Types Single end Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 3/29/2016 38 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely BTI Plant Bioinformatics Course 2016
  • 39. Implications of Choice of Library 3/29/2016 39 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) or Optical maps NNNNN NN BTI Plant Bioinformatics Course 2016
  • 40. Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 3/29/2016 40 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing BTI Plant Bioinformatics Course 2016
  • 41. Data!! 3/29/2016 BTI Plant Bioinformatics Course 2016 41
  • 42. Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 3/29/2016 42 Slide credit: Aureliano Bombarely BTI Plant Bioinformatics Course 2016
  • 43. Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length is identical to sequence 3/29/2016 43 Slide credit: Aureliano Bombarely File Formats BTI Plant Bioinformatics Course 2016
  • 44. 3/29/2016 44 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64) BTI Plant Bioinformatics Course 2016
  • 45. Quality control: Encoding 3/29/2016 45 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64) BTI Plant Bioinformatics Course 2016
  • 46. 3/29/2016 46 Quality control: Encoding http://en.wikipedia.org/wiki/Phred_quality_score Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated error probability of a base BTI Plant Bioinformatics Course 2016
  • 47. Pre-processing: Tools Trimming • FastQC • FASTX toolkit • Trimmomatic • Scythe Joining paired-end reads • fastq-join • FLASH • PANDAseq 3/29/2016 47BTI Plant Bioinformatics Course 2016
  • 48. Thank you!! 3/29/2016 BTI Plant Bioinformatics Course 2016 48