SlideShare a Scribd company logo
4
Most read
18
Most read
20
Most read
[MIT]Introduction to 2GS data analysisDrink faster !June 23, 2011
Production Informatics and BioinformaticsJune 23, 2011Produce raw sequence readsBasic ProductionInformaticsMap to genome and generate raw genomic features (e.g. SNPs)Advanced Production Inform.Analyze the data; Uncover the biological meaningBioinformaticsResearchPer one-flowcell project
First Generation: Sanger sequencingSecond Generation: amplified molecule sequencing Third Generation: single molecule sequencingBrief history of sequencing June 23, 2011*** Discussion about category
What steps are involved in sequencing ?June 23, 2011sequencing by synthesis (SBS) technologyFragmentationLibrary generationAmplificationSequencingAnalysisIllumina Marketing: “3h 10 minutes wet-lab30 minutes dry lab”
Illumina sequencing: Library + AmplificationJune 23, 2011“Illumina Sequencing Technology” booklet
Illumina Sequencing: Synthesis + ImagingJune 23, 2011“Illumina Sequencing Technology” booklet
Output: 1.5 Terabyte of dataJune 23, 2011Inspired by anzska information booklet
Sequencer Output Conversion: Production Informatics1.5 TB data : 6 billion clusters with 100 bp reads 	= 600 billion data points June 23, 2011HiSeqCASAVA…× read lengthFor HiSeq: images are converted to flat files (*.bcl or *.cif) visualpharm.comMaysoft
Multiplexing6 billion reads:750 million reads per laneCurrently 12-plex (soon 96-plex):One run  June 23, 2011Oliver Twardowski
DemultiplexingJune 23, 2011CASAVA……× samples× read lengthvisualpharm.com
CASAVA1.8.0 program callJune 23, 2011configureBclToFastq.pl \	--input-dir Data/Intensities/BaseCalls/ \    -output-dir Data/Unaligned \	--sample-sheet SampleSheet.csv \ 	--use-bases-mask y100,I6nn,Y100 >file.log 2>&1cd Data/Unalignedqsub -pe make 16 -jy -v $MYPATH –oqsub.out -cwd –N fastq -by \    make -j 16Runtime: ~ 6h
Fastq filesJune 23, 2011@HWI-ST301_0112:1:1:1169:2044#0/1CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT+HWI-ST301_0112:1:1:1169:2044#0/1dddc\dd^dd`acacdacd`ecdedabdcdddcc\``\`bTa\36 36 36 35 28 …ASCII       @ .. ~DEC        64 .. 126PHRED     0 .. 62Phred scores are estimates only ! Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970
Fastq – PHRED qualityPathologicalJune 23, 2011
Fastq: Quality controlBase-pair quality score Adapter contaminationUneven Amplification June 23, 2011
Three things to rememberDon’t be fooled by marketingFastqfiles are not directly usableBasic-run QC can be made from fastq fileJune 23, 2011“All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production”							Ewan Birney		      European Bioinformatics InstituteWellcome Trust David S. Roos  Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
Next Week:June 23, 2011Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
Walk-in-clinicJune 23, 2011
First Generation: Sanger sequencingSecond Generation: amplified molecule sequencing Third Generation: single molecule sequencingBrief history of sequencing June 23, 2011*** Discussion about category
Helicostrue Single Molecule Sequencing(tSMS)™ technologySequencing by synthesis but much more sensitive so no amplificationJune 23, 2011
Life Technology - Ion TorrentHydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductorDepending on which nucleotide wash cycle the signal coincidesJune 23, 2011
PacBioImmobilized polymerase at the bottom of a wellFluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recordedNo upper limit on the length  June 23, 2011http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
NanoporeMolecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded.June 23, 2011http://www.nanoporetech.com/sections/index/82

More Related Content

PPTX
Next generation sequencing
PPTX
Functional genomics
PPTX
Chromosome walking
PPTX
Whole genome shotgun sequencing
PPTX
Labelling of dna
PPTX
cDNA Library Construction
PDF
Next generation sequencing
PPTX
Massively Parallel Signature Sequencing (MPSS)
Next generation sequencing
Functional genomics
Chromosome walking
Whole genome shotgun sequencing
Labelling of dna
cDNA Library Construction
Next generation sequencing
Massively Parallel Signature Sequencing (MPSS)

What's hot (20)

PPTX
Third Generation Sequencing
PPTX
Genomics(functional genomics)
PPTX
Next Generation Sequencing of DNA
PPTX
DNA Sequencing
PPTX
Pyrosequencing
PDF
Restriction enzymes and their types
PPTX
Ion torrent and SOLiD Sequencing Techniques
PPTX
Phagemid and bac vectors
PPTX
Expressed sequence tag (EST), molecular marker
PPTX
Transcriptome analysis
PPTX
Shotgun and clone contig method
PPTX
In situ hybridization
PDF
Complementary DNA (cDNA) Libraries
PPTX
Introduction to Next Generation Sequencing
PPTX
Comparative genomics
PPTX
Protein dna interactions
PPTX
Electrophoretic mobility shift assay
PPTX
Express sequence tags
PPT
Third Generation Sequencing
Genomics(functional genomics)
Next Generation Sequencing of DNA
DNA Sequencing
Pyrosequencing
Restriction enzymes and their types
Ion torrent and SOLiD Sequencing Techniques
Phagemid and bac vectors
Expressed sequence tag (EST), molecular marker
Transcriptome analysis
Shotgun and clone contig method
In situ hybridization
Complementary DNA (cDNA) Libraries
Introduction to Next Generation Sequencing
Comparative genomics
Protein dna interactions
Electrophoretic mobility shift assay
Express sequence tags
Ad

Viewers also liked (20)

PPTX
Illumina Sequencing
PDF
Introduction to next generation sequencing
PPT
New Generation Sequencing Technologies: an overview
PPTX
Ngs microbiome
PDF
2013 july 25 systems biology rna seq v2
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 2
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 1
PPTX
Functionally annotate genomic variants
PDF
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
PPTX
How to sequence a large eukaryotic genome
PPTX
Bridge Amplification Part 1
PPTX
Amplicon sequencing slides - Trina McMahon - MEWE 2013
PDF
Esa 2014 qiime
PPTX
Histology Portfolio
PPTX
Introduction to Bioinformatics
PPTX
Genome
PDF
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
PPTX
Data Management for Quantitative Biology - Data sources (Next generation tech...
PDF
Part 1 of RNA-seq for DE analysis: Defining the goal
PPT
Feulgen stain
Illumina Sequencing
Introduction to next generation sequencing
New Generation Sequencing Technologies: an overview
Ngs microbiome
2013 july 25 systems biology rna seq v2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Functionally annotate genomic variants
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
How to sequence a large eukaryotic genome
Bridge Amplification Part 1
Amplicon sequencing slides - Trina McMahon - MEWE 2013
Esa 2014 qiime
Histology Portfolio
Introduction to Bioinformatics
Genome
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Data Management for Quantitative Biology - Data sources (Next generation tech...
Part 1 of RNA-seq for DE analysis: Defining the goal
Feulgen stain
Ad

Similar to Introduction to second generation sequencing (20)

PDF
Mouse Genomes Project Summary June 2010
PDF
Apollo Collaborative genome annotation editing
PDF
Genome Assembly
PDF
DNA sequencing: rapid improvements and their implications
PDF
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
PDF
03_Microbio590B_sequencing_2022.pdf
PDF
2015 09-29-sbc322-methods.key
PPTX
The Transformation of Systems Biology Into A Large Data Science
PPTX
Comparison between RNASeq and Microarray for Gene Expression Analysis
PDF
Jan2016 pac bio giab
PDF
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
PDF
Introduction to Next-Generation Sequencing (NGS) Technology
PDF
Avila et al 2010 wnt 3
PDF
Examining gene expression and methylation with next gen sequencing
PPTX
A Journey Through The History Of DNA Sequencing
PPTX
BioSB meeting 2015
PPT
Microarray biotechnologg ppy dna microarrays
PPTX
GMI proficiency testing- Progress report 2016
PPTX
Sequence based Markers
PDF
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
Mouse Genomes Project Summary June 2010
Apollo Collaborative genome annotation editing
Genome Assembly
DNA sequencing: rapid improvements and their implications
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
03_Microbio590B_sequencing_2022.pdf
2015 09-29-sbc322-methods.key
The Transformation of Systems Biology Into A Large Data Science
Comparison between RNASeq and Microarray for Gene Expression Analysis
Jan2016 pac bio giab
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Introduction to Next-Generation Sequencing (NGS) Technology
Avila et al 2010 wnt 3
Examining gene expression and methylation with next gen sequencing
A Journey Through The History Of DNA Sequencing
BioSB meeting 2015
Microarray biotechnologg ppy dna microarrays
GMI proficiency testing- Progress report 2016
Sequence based Markers
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing

More from Denis C. Bauer (18)

PPTX
Cloud-native machine learning - Transforming bioinformatics research
PPTX
Translating genomics into clinical practice - 2018 AWS summit keynote
PPTX
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
PPTX
How novel compute technology transforms life science research
PPTX
How novel compute technology transforms life science research
PPTX
VariantSpark: applying Spark-based machine learning methods to genomic inform...
PPTX
Population-scale high-throughput sequencing data analysis
PPTX
Trip Report Seattle
PPTX
Allelic Imbalance for Pre-capture Whole Exome Sequencing
PPTX
Centralizing sequence analysis
PPTX
Qbi Centre for Brain genomics (Informatics side)
PPTX
Differential gene expression
PPTX
Transcript detection in RNAseq
PPTX
The missing data issue for HiSeq runs
PDF
Deciphering the regulatory code in the genome
PPT
PPT
STAR: Recombination site prediction
PPT
SUMOylation site prediction
Cloud-native machine learning - Transforming bioinformatics research
Translating genomics into clinical practice - 2018 AWS summit keynote
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
How novel compute technology transforms life science research
How novel compute technology transforms life science research
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Population-scale high-throughput sequencing data analysis
Trip Report Seattle
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Centralizing sequence analysis
Qbi Centre for Brain genomics (Informatics side)
Differential gene expression
Transcript detection in RNAseq
The missing data issue for HiSeq runs
Deciphering the regulatory code in the genome
STAR: Recombination site prediction
SUMOylation site prediction

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”

Introduction to second generation sequencing

  • 1. [MIT]Introduction to 2GS data analysisDrink faster !June 23, 2011
  • 2. Production Informatics and BioinformaticsJune 23, 2011Produce raw sequence readsBasic ProductionInformaticsMap to genome and generate raw genomic features (e.g. SNPs)Advanced Production Inform.Analyze the data; Uncover the biological meaningBioinformaticsResearchPer one-flowcell project
  • 3. First Generation: Sanger sequencingSecond Generation: amplified molecule sequencing Third Generation: single molecule sequencingBrief history of sequencing June 23, 2011*** Discussion about category
  • 4. What steps are involved in sequencing ?June 23, 2011sequencing by synthesis (SBS) technologyFragmentationLibrary generationAmplificationSequencingAnalysisIllumina Marketing: “3h 10 minutes wet-lab30 minutes dry lab”
  • 5. Illumina sequencing: Library + AmplificationJune 23, 2011“Illumina Sequencing Technology” booklet
  • 6. Illumina Sequencing: Synthesis + ImagingJune 23, 2011“Illumina Sequencing Technology” booklet
  • 7. Output: 1.5 Terabyte of dataJune 23, 2011Inspired by anzska information booklet
  • 8. Sequencer Output Conversion: Production Informatics1.5 TB data : 6 billion clusters with 100 bp reads = 600 billion data points June 23, 2011HiSeqCASAVA…× read lengthFor HiSeq: images are converted to flat files (*.bcl or *.cif) visualpharm.comMaysoft
  • 9. Multiplexing6 billion reads:750 million reads per laneCurrently 12-plex (soon 96-plex):One run June 23, 2011Oliver Twardowski
  • 10. DemultiplexingJune 23, 2011CASAVA……× samples× read lengthvisualpharm.com
  • 11. CASAVA1.8.0 program callJune 23, 2011configureBclToFastq.pl \ --input-dir Data/Intensities/BaseCalls/ \ -output-dir Data/Unaligned \ --sample-sheet SampleSheet.csv \ --use-bases-mask y100,I6nn,Y100 >file.log 2>&1cd Data/Unalignedqsub -pe make 16 -jy -v $MYPATH –oqsub.out -cwd –N fastq -by \ make -j 16Runtime: ~ 6h
  • 12. Fastq filesJune 23, 2011@HWI-ST301_0112:1:1:1169:2044#0/1CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT+HWI-ST301_0112:1:1:1169:2044#0/1dddc\dd^dd`acacdacd`ecdedabdcdddcc\``\`bTa\36 36 36 35 28 …ASCII @ .. ~DEC 64 .. 126PHRED 0 .. 62Phred scores are estimates only ! Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970
  • 13. Fastq – PHRED qualityPathologicalJune 23, 2011
  • 14. Fastq: Quality controlBase-pair quality score Adapter contaminationUneven Amplification June 23, 2011
  • 15. Three things to rememberDon’t be fooled by marketingFastqfiles are not directly usableBasic-run QC can be made from fastq fileJune 23, 2011“All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production” Ewan Birney European Bioinformatics InstituteWellcome Trust David S. Roos Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
  • 16. Next Week:June 23, 2011Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
  • 18. First Generation: Sanger sequencingSecond Generation: amplified molecule sequencing Third Generation: single molecule sequencingBrief history of sequencing June 23, 2011*** Discussion about category
  • 19. Helicostrue Single Molecule Sequencing(tSMS)™ technologySequencing by synthesis but much more sensitive so no amplificationJune 23, 2011
  • 20. Life Technology - Ion TorrentHydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductorDepending on which nucleotide wash cycle the signal coincidesJune 23, 2011
  • 21. PacBioImmobilized polymerase at the bottom of a wellFluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recordedNo upper limit on the length June 23, 2011http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
  • 22. NanoporeMolecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded.June 23, 2011http://www.nanoporetech.com/sections/index/82

Editor's Notes

  • #2: http://2.bp.blogspot.com/_BPr6hpMG0tg/TSZdkYDcRvI/AAAAAAAAAjY/ReScIkWNySg/s1600/drink.jpg
  • #4: PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  • #5: Some of you have done some library prep already so you have a feel for how realistic 3h10 min are for this. This seminar goes through the analysis steps that are required to answer the question the data was generated for. So by the end of this seminar series you’ll have also a feel for how realistic 30 minutes is for the data analysis.
  • #19: PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  • #20: http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.aspx
  • #23: http://www.nanoporetech.com/sections/index/82