Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu // Twitter:@SahaSurya
BTI Plant Bioinformatics Course 2014
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
3/12/2014 BTI Plant Bioinformatics Course 2014 2
Pinus
taeda
(24 Gb)
First generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 3
Sanger method
3/12/2014 BTI Plant Bioinformatics Course 2014 4
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
3/12/2014 BTI Plant Bioinformatics Course 2014 5
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
Maxam-Gilbert method
3/12/2014 BTI Plant Bioinformatics Course 2014 6
Maxam-Gilbert method
3/12/2014 BTI Plant Bioinformatics Course 2014 7
http://bit.ly/1noY0fu
http://bit.ly/1lGvJCA
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
3/12/2014 BTI Plant Bioinformatics Course 2014 8
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
Next generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 9
3/12/2014 BTI Plant Bioinformatics Course 2014 10
http://bit.ly/1keDtZQ
• Second generation
• Third generation
• Fourth generation
• Next-next-generation
• Next-next-next
generation
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS1/RSII
– Ion Torrent Proton/PGM
– SOLiD
3/12/2014 BTI Plant Bioinformatics Course 2014 11
http://www.acgt.me/blog/2014/3/10/next-generation-
sequencing-must-diepart-2
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
3/12/2014 BTI Plant Bioinformatics Course 2014 12
http://bit.ly/1ehwxWN
GS FLX
Titanium
http://bit.ly/1ehAcEh
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 13
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 14
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million 400 Million 4 Billion 6 Billion
Read
Length
2x300 bp 2x150 bp 2x125 bp
(2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
$1000 human
genome??
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 15
http://1.usa.gov/1fP9ybl
Illumina:Moleculo
3/12/2014 BTI Plant Bioinformatics Course 2014 16
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 17
http://bit.ly/1naxgTe
Pacific Biosciences SMRT sequencing
Error correction methods
3/12/2014 BTI Plant Bioinformatics Course 2014 18
Hierarchical genome-assembly
process (HGAP)
PBJelly
Enlish et al., PLOS One. 2012
PBJelly
3/12/2014 BTI Plant Bioinformatics Course 2014 19
Pacific Biosciences SMRT sequencing
Read Lengths
http://www.igs.umaryland.edu/labs/grc/
Mean Read Length: 8391 bp
Maximum Subread Length: 24585 bp
Others
• Ion Torrent Proton/PGM
• Oxford Nanopore
• Nabsys
• SOLiD
3/12/2014 BTI Plant Bioinformatics Course 2014 20
Comparison
3/12/2014 BTI Plant Bioinformatics Course 2014 21
Next generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 22
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing
24h 700 bp Q20-Q30 0.7 GB $10
Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15
Illumina Hiseq
2500
11days 2x125bp >Q30 1000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences
2h 5.5-8.5kb
>Q30 consensus
>Q10 single
400-800MB
/SMRT cell
$0.33-$1
http://bit.ly/1clLps3
http://1.usa.gov/1cLqIRd
Summary
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
3/12/2014 BTI Plant Bioinformatics Course 2014 23
http://bit.ly/1ko9Kgh
http://omicsmaps.com/
Next Generation Genomics:
World Map of High-throughput Sequencers
BTI Plant Bioinformatics Course 20143/12/2014 24
3/12/2014 BTI Plant Bioinformatics Course 2014 25
http://bit.ly/18pfUId
3/12/2014 BTI Plant Bioinformatics Course 2014 26
http://bit.ly/18pfUId
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
3/12/2014 27
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
3/12/2014 BTI Plant Bioinformatics Course 2014 28
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
Implications of Choice of Library
3/12/2014 BTI Plant Bioinformatics Course 2014 29
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
3/12/2014 BTI Plant Bioinformatics Course 2014 30
Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
Sequencing
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
3/12/2014 BTI Plant Bioinformatics Course 2014 31
Slide credit: Aureliano Bombarely
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length should be identical to sequence
3/12/2014 BTI Plant Bioinformatics Course 2014 32
Slide credit: Aureliano Bombarely
File Formats
3/12/2014 BTI Plant Bioinformatics Course 2014 33
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
Quality control: Encoding
3/12/2014 BTI Plant Bioinformatics Course 2014 34
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
3/12/2014 BTI Plant Bioinformatics Course 2014 35
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated probability of a base
being wrong
Quality control: Error correction
3/12/2014 BTI Plant Bioinformatics Course 2014 36
Thank you!!
3/12/2014 BTI Plant Bioinformatics Course 2014 37

More Related Content

PDF
Sequencing 2016
PDF
Sequencing: The Next Generation 2015
PDF
Sequencing: The Next Generation
PDF
Sequencing and Bioinformatics PGRP Summer 2015
PDF
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
PPTX
Next-generation sequencing from 2005 to 2020
PPTX
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
PPT
High Throughput Sequencing Technologies: What We Can Know
Sequencing 2016
Sequencing: The Next Generation 2015
Sequencing: The Next Generation
Sequencing and Bioinformatics PGRP Summer 2015
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Next-generation sequencing from 2005 to 2020
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
High Throughput Sequencing Technologies: What We Can Know

What's hot (20)

PPT
CSU Next Generation Sequencing Core 06/09/2015
PPTX
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
PDF
Sequencing, Genome Assembly and the SGN Platform
PDF
Sequencing 2017
PPTX
Next Gen Sequencing (NGS) Technology Overview
PPTX
Toolbox for bacterial population analysis using NGS
PDF
Next-generation sequencing and quality control: An Introduction (2016)
PPTX
Ngs microbiome
PDF
Next-generation sequencing course, part 1: technologies
PPTX
Future of metagenomics
PPTX
Introduction to second generation sequencing
PDF
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
PPT
New Generation Sequencing Technologies: an overview
PDF
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
PPTX
A Comparison of NGS Platforms.
PPTX
Nanopore for dna sequencing by shreya
PDF
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
PDF
New generation Sequencing
PDF
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
PDF
Genome Assembly 2018
CSU Next Generation Sequencing Core 06/09/2015
Lab in a Suitcase and Other Adventures with Nanopore Sequencing
Sequencing, Genome Assembly and the SGN Platform
Sequencing 2017
Next Gen Sequencing (NGS) Technology Overview
Toolbox for bacterial population analysis using NGS
Next-generation sequencing and quality control: An Introduction (2016)
Ngs microbiome
Next-generation sequencing course, part 1: technologies
Future of metagenomics
Introduction to second generation sequencing
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
New Generation Sequencing Technologies: an overview
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
A Comparison of NGS Platforms.
Nanopore for dna sequencing by shreya
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
New generation Sequencing
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
Genome Assembly 2018
Ad

Similar to Next Generation Sequencing (20)

PDF
Sequencing
PDF
Genome Assembly
PPTX
Xin Zhou - Saturday Closing Plenary
PDF
Jan2016 pac bio giab
PDF
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
PDF
Talk by J. Eisen for NZ Computational Genomics meeting
PDF
Open pacbiomodelorgpaper j_landolin_20150121
PPTX
Data Management for Quantitative Biology - Data sources (Next generation tech...
PPTX
Biotechnophysics: DNA Nanopore Sequencing
PPT
Role of bioinformatics in life sciences research
PPT
The Emerging Global Community of Microbial Metagenomics Researchers
PPTX
from genome sequencing to genome assembly
PPT
Bioinformatica 06-10-2011-t2-databases
PDF
ECCB 2010 Next-gen sequencing Tutorial
PPT
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
PPTX
Biological databasesBiological databases
PPT
Tyler future of genomics thurs 0920
PPTX
Biological database
DOCX
Major biological nucleotide databases
PDF
ICAR 2015 Workshop - Nick Provart
Sequencing
Genome Assembly
Xin Zhou - Saturday Closing Plenary
Jan2016 pac bio giab
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Talk by J. Eisen for NZ Computational Genomics meeting
Open pacbiomodelorgpaper j_landolin_20150121
Data Management for Quantitative Biology - Data sources (Next generation tech...
Biotechnophysics: DNA Nanopore Sequencing
Role of bioinformatics in life sciences research
The Emerging Global Community of Microbial Metagenomics Researchers
from genome sequencing to genome assembly
Bioinformatica 06-10-2011-t2-databases
ECCB 2010 Next-gen sequencing Tutorial
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
Biological databasesBiological databases
Tyler future of genomics thurs 0920
Biological database
Major biological nucleotide databases
ICAR 2015 Workshop - Nick Provart
Ad

More from Surya Saha (20)

PDF
An open access resource portal for arthropod vectors and agricultural pathosy...
PDF
Functional annotation of invertebrate genomes
PDF
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
PPTX
Updates on Citrusgreening.org database from USDA NIFA project meeting
PPTX
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
PDF
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
PDF
Visualization of insect vector-plant pathogen interactions in the citrus gree...
PDF
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
PDF
Quality Control of Sequencing Data
PDF
Community resources for all y’all Omics
PDF
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
PDF
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
PDF
Tomato Genome Build SL3.0
PDF
Quality Control of Sequencing Data
PDF
Tomato Genome SL2.50 and Beyond…
PDF
Quality Control of NGS Data
PDF
Quality Control of NGS Data Solutions
PDF
ICAR Soybean Indore 2014
PDF
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
PDF
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...
An open access resource portal for arthropod vectors and agricultural pathosy...
Functional annotation of invertebrate genomes
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Quality Control of Sequencing Data
Community resources for all y’all Omics
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Tomato Genome Build SL3.0
Quality Control of Sequencing Data
Tomato Genome SL2.50 and Beyond…
Quality Control of NGS Data
Quality Control of NGS Data Solutions
ICAR Soybean Indore 2014
Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequen...
Endosymbiont hunting in the metagenome of Asian citrus psyllid (Diaphorina ci...

Recently uploaded (20)

PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
semiconductor packaging in vlsi design fab
PDF
My India Quiz Book_20210205121199924.pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
International_Financial_Reporting_Standa.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Journal of Dental Science - UDMY (2021).pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
HVAC Specification 2024 according to central public works department
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
semiconductor packaging in vlsi design fab
My India Quiz Book_20210205121199924.pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Environmental Education MCQ BD2EE - Share Source.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
What if we spent less time fighting change, and more time building what’s rig...
Cambridge-Practice-Tests-for-IELTS-12.docx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
International_Financial_Reporting_Standa.pdf
Introduction to pro and eukaryotes and differences.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Computer Architecture Input Output Memory.pptx
Unit 4 Computer Architecture Multicore Processor.pptx
Journal of Dental Science - UDMY (2021).pdf

Next Generation Sequencing

  • 1. Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY ss2489@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
  • 2. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide credit: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 3/12/2014 BTI Plant Bioinformatics Course 2014 2 Pinus taeda (24 Gb)
  • 3. First generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 3
  • 4. Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 4 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  • 5. Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 5 http://bit.ly/1g6Cudq http://bit.ly/1lcQO4J
  • 6. Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 6
  • 7. Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 7 http://bit.ly/1noY0fu http://bit.ly/1lGvJCA
  • 8. First generation sequencing • Very high quality sequences (99.999%) • Very low throughput 3/12/2014 BTI Plant Bioinformatics Course 2014 8 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • 9. Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 9
  • 10. 3/12/2014 BTI Plant Bioinformatics Course 2014 10 http://bit.ly/1keDtZQ • Second generation • Third generation • Fourth generation • Next-next-generation • Next-next-next generation http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  • 11. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS1/RSII – Ion Torrent Proton/PGM – SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 11 http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  • 12. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 3/12/2014 BTI Plant Bioinformatics Course 2014 12 http://bit.ly/1ehwxWN GS FLX Titanium http://bit.ly/1ehAcEh
  • 13. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 13 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina
  • 14. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 14 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina $1000 human genome??
  • 15. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 15 http://1.usa.gov/1fP9ybl
  • 16. Illumina:Moleculo 3/12/2014 BTI Plant Bioinformatics Course 2014 16 http://bit.ly/1aEPOBn
  • 17. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 17 http://bit.ly/1naxgTe
  • 18. Pacific Biosciences SMRT sequencing Error correction methods 3/12/2014 BTI Plant Bioinformatics Course 2014 18 Hierarchical genome-assembly process (HGAP) PBJelly Enlish et al., PLOS One. 2012 PBJelly
  • 19. 3/12/2014 BTI Plant Bioinformatics Course 2014 19 Pacific Biosciences SMRT sequencing Read Lengths http://www.igs.umaryland.edu/labs/grc/ Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp
  • 20. Others • Ion Torrent Proton/PGM • Oxford Nanopore • Nabsys • SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 20
  • 21. Comparison 3/12/2014 BTI Plant Bioinformatics Course 2014 21
  • 22. Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 22 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10 Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15 Illumina Hiseq 2500 11days 2x125bp >Q30 1000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 2h 5.5-8.5kb >Q30 consensus >Q10 single 400-800MB /SMRT cell $0.33-$1 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • 23. Summary • Microbial genomes • Eukaryotic genomes • Resequencing genomes • RNAseq and other XXXseq methods 3/12/2014 BTI Plant Bioinformatics Course 2014 23 http://bit.ly/1ko9Kgh
  • 24. http://omicsmaps.com/ Next Generation Genomics: World Map of High-throughput Sequencers BTI Plant Bioinformatics Course 20143/12/2014 24
  • 25. 3/12/2014 BTI Plant Bioinformatics Course 2014 25 http://bit.ly/18pfUId
  • 26. 3/12/2014 BTI Plant Bioinformatics Course 2014 26 http://bit.ly/18pfUId
  • 27. Real cost of Sequencing!! Sboner, Genome Biology, 2011 3/12/2014 27
  • 28. Library Types Single end Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 3/12/2014 BTI Plant Bioinformatics Course 2014 28 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely
  • 29. Implications of Choice of Library 3/12/2014 BTI Plant Bioinformatics Course 2014 29 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) NNNNN NN
  • 30. Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 3/12/2014 BTI Plant Bioinformatics Course 2014 30 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing
  • 31. Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 3/12/2014 BTI Plant Bioinformatics Course 2014 31 Slide credit: Aureliano Bombarely
  • 32. Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length should be identical to sequence 3/12/2014 BTI Plant Bioinformatics Course 2014 32 Slide credit: Aureliano Bombarely File Formats
  • 33. 3/12/2014 BTI Plant Bioinformatics Course 2014 33 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • 34. Quality control: Encoding 3/12/2014 BTI Plant Bioinformatics Course 2014 34 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • 35. 3/12/2014 BTI Plant Bioinformatics Course 2014 35 Quality control: Encoding http://bit.ly/N28yUd Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated probability of a base being wrong
  • 36. Quality control: Error correction 3/12/2014 BTI Plant Bioinformatics Course 2014 36
  • 37. Thank you!! 3/12/2014 BTI Plant Bioinformatics Course 2014 37