SlideShare a Scribd company logo
GENOME RE SEQUENCING FOR 
SNP DISCOVERY / GENOTYPING 
MARKER 
Presented by 
Monoj Sutradhar 
PALB 3243 
Jr. M.sc(Pl. Biotech) 
UAS,GKVK,Bangalore 
9/8/2014 1
What are SNPs ? 
ACGTTTGGATAC 
TGCAAACCTATG 
ACGTTTGTATAC 
TGCAAACATATG 
Single nucleotide polymorphisms consist of a single 
change in the DNA code 
SNPs occur with various allele frequencies. Those in 
the 20-40% range are useful for genetic mapping. 
Those at frequencies between 1% and 20% may be 
used with candidate gene approaches. Usually bi-allelic. 
Changes at 〈1% are called variants 
9/8/2014 2
What are the effects of SNPs ? 
Where Result Effect 
In coding 
region 
May be silent, o.g.,UUG→CUG, leu in both cases sSNP Usually no change in 
phenotype 
In coding 
region 
May change amino acid sequence, e.g., UUC→UUA, 
phe to leu, Some characterize these as the least 
common and most valuable SNPs, Many being 
patented 
cSNP Phenotype change 
(may be subtle 
depending on amino 
acid replacement and 
position) 
In coding 
region 
May create a "Stop"codon, e. g., UCA→UGA, 
ser to stop 
Phenotype change 
In coding 
region 
May affect the rate of transcription 
(up-or down-regulate) 
cSNP Possible phenotype 
Change 
Other 
regions 
No affect on gene products(7). 
May act as genetic markers for multi-component 
diseases. These are sometimes called anonymous SNPs 
and are the most common. 
rSNP 
9/8/2014 3
How many SNPs are there ? 
It is estimated that the human genome contains between 
3 million and 6 million SNPs spaced irregularly at 
intervals of 500 to 1,000 bases. 
The SNP Consortium estimates that as many as 300,000 
SNPs may be needed to fuel studies. 
100.000 or more SNPs may be required for complex 
disease gene discovery 
9/8/2014 4
SNP Discovery 
SNP Discovery refers to the initial identification of new 
SNPs. 
The established method is DNA sequencing 
with subsequent data analysis. Some indirect Discovery 
techniques (e.g., dHPLC, SSCP) only indicate that a SNP 
(or other mutation) exists. 
DNA sequencing of multiple individuals is used to determine 
the point and type of polymorphism. 
Low throughput, based on established DNA sequencing 
analyses or collected data (also based on electrophoretic data) 
9/8/2014 5
SNP Validation 
SNP Validation refers to genetic validation, the process 
of ensuring that the SNP is not due to sequencing error 
and that it is not extremely rear. This should not be 
confused with assay, target or regulatory validation. 
Confirmation of SNPs found in Discovery 
Larger numbers of individual samples to get statistical 
data on occurrence in the population 
9/8/2014 6
SNP Screening 
SNP Screening refers to researchers running thousands of 
genotypes (may SNPs or many individuals or both) 
Thousands to hundreds of thousands of samples per day 
Two different screening strategies 
- Many SNPs in a few individuals 
- A few SNPs in many individuals 
Different strategies will require different tools 
Important in determining markers for complex genetic states 
9/8/2014 7
Steps of SNP discovery 
Sequence clustering 
Cluster refinement 
Multiple alignment 
SNP detection
Initial SNP Discovery and Mapping 
SNP discovery using Sanger re-sequencing 
- MSNP discovery using Sanger re-sequencing 
- Mostly genic 
- BAC-end and BAC subclones genic 
- BAC-end and BAC subclones 
SNP genotyping and mapping 
- Sequenom mass spectrometer 
- Luminex Flow cytometer 
- Illumina Inc. GoldenGate™ assay
Roche (454) Sequencing 
Pyrosequencing was the first of the new highly parallel sequenci 
ng technologies to reach the market [24]. It is commonly referred 
to as 454 sequencing after the name of the company that first co 
mmercialized it. 
It is an SBS method where single fragments of DNA are hybridiz 
ed to a capture bead array and the beads are emulsified with rea 
gents necessary to PCR amplifying the individually bound templa 
te. 
Each bead in the emulsion acts as an independent PCR where 
millions of copies of the original template are produced and boun 
d to the capture beads which then serve as the templates for the 
subsequent sequencing reaction 
9/8/2014 10
The individual beads are deposited into a picotiter plate along wit 
h DNA polymerase, primers, and the enzymes necessary to creat 
e fluorescence through the consumption of inorganic phosphate p 
roduced during sequencing. 
The instrument washes the picotiter plate with each of the DNA b 
ases in turn. As template-specific incorporation of a base by DNA 
polymerase occurs, a pyrophosphate (PPi) is produced. 
This pyrophosphate is detected by an enzymatic luminometric in 
organic pyrophosphate detection assay (ELIDA) through the gen 
eration of a light signal following the conversion of PPi into ATP 
9/8/2014 11
Shotgun sequencing by PGM/454 
Genomic 
Fragment 
Adapters
Shotgun sequencing by PGM/454 
Genomic 
Fragment 
Barcode
Shotgun sequencing by PGM/454
Shotgun sequencing by PGM/454 
Bead/ISP 
Adapter 
Complement 
Sequences 
The idea is that each bead should be amplified 
all over with a SINGLE library fragment.
Shotgun sequencing by PGM/454 
Problem: How do I do PCR to amplify the fragments 
without having to use 1 tube for each reaction?
Shotgun sequencing by PGM/454
Shotgun sequencing by PGM/454
Shotgun sequencing by PGM/454
Shotgun sequencing by PGM/454
Shotgun sequencing by PGM/454 
~3.5 μm for Ion Torrent, ~30 μm for 454
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
T T 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
T 
T T
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
A A 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
A 
A A
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
G G 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
G 
G G 
G
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
T T 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
G 
T 
T T 
T
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
C C 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
G T 
C 
C C 
C
Shotgun sequencing by PGM/454 
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
G G 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
G T C T T 
G 
G G 
G G G 
The real pow 
er of this met 
hod is that it 
can take plac 
e in millions 
of tiny wells i 
n a single pla 
te at once.
Only give polymerase one nucleotide at a time: 
Prime 
r 
T G C G C G G C C C A 
G G 
A C G C G C C G G G T C A G A A C C C G A T C G C G 
5’ 
3’ 5’ 
If that nucleotide is incorporated, enzymes turn b 
y-products into light: 
T C A G T C A G T C A G 
1 2 3 4 5 
G T C T T 
G 
G G 
G G G 
The real pow 
er of this met 
hod is that it 
can take plac 
e in millions 
of tiny wells i 
n a single pla 
te at once. 
Raw 454 data
The instrument repeats the sequential nucleotide wash cy 
cle hundreds of times to lengthen the sequences. 
The 454 GS FLX Titanium XL+ platform currently generate 
s up to 700 MB of raw 750 bp reads in a 23 hour run 
9/8/2014 29
. Illumina Sequencing 
• Illumina technology, acquired by Illumina from Solexa, followed the 
release of 454 sequencing. 
• With this sequencing approach, fragments of DNA are hybridized to a 
solid substrate called a flow cell. 
• In a process called bridge amplification, the bound DNA template 
fragments are amplified in an isothermal reaction where copies of 
the template are created in close proximity to the original. 
9/8/2014 30
• This results in clusters of DNA fragments on the flow cell 
creating a “lawn” of bound single strand DNA molecules. 
• The molecules are sequenced by flooding the flow cell with 
a new class of cleavable fluorescent nucleotides and the 
reagents necessary for DNA polymerization . 
• 
• A complementary strand of each template is synthesized one 
base at a time using fluorescently labeled nucleotides. 
• The fluorescent molecule is excited by a laser and emits 
light, the colour of which is different for each of the four 
bases. The fluorescent label is then cleaved off and a new 
round of polymerization occurs 
9/8/2014 31
9/8/2014 32
• Unlike 454 sequencing, all four bases are present for the 
polymerization step and only a single molecule is incorporated 
per cycle. 
• The flagship HiSeq2500 sequencing instrument from Illumina 
can generate up to 600 GB per run with a read length of 100 nt 
and 0.1% error rate. 
• The Illumina technique can generate sequence from opposite 
ends of a DNA fragment, so called paired-end (PE) reads. 
9/8/2014 33
. Applied Biosystems (SOLiD) Sequencing 
• The SOLiD system was jointly developed by the Harvard 
Medical School and the Howard Hughes Medical Institute . 
The library preparation in SOLiD is very similar to Roche/454 
in which clonal bead populations are prepared in 
microreactors containing DNA template, beads, primers, and 
PCR components. 
• Beads that contain PCR products amplified by emulsion PCR 
are enriched by a proprietary process. The DNA templates 
on the beads are modified at their 3′ end to allow 
attachment to glass slides. 
• A primer is annealed to an adapter on the DNA template and 
a mixture of fluorescently tagged oligonucleotides is 
pumped into the flow cell 
9/8/2014 34
• . When the oligonucleotide matches the template sequence, it is 
ligated onto the primer and the unincorporated nucleotides are 
washed away. 
• A charged couple device (CCD) camera captures the different colours 
attached to the primer. Each fluorescence wavelength corresponds 
to a particular dinucleotide combination. 
• After image capture, the fluorescent tag is removed and new set of 
oligonucleotides are injected into the flow cell to begin the next 
round of DNA ligation . 
• This sequencing-by-ligation method in SOLiD-5500x1 platform 
generates up to 1,410 million reads of nt each with an error rate of 
0.01% 
9/8/2014 35
9/8/2014 36
9/8/2014 37
Comparision 
9/8/2014 38
Software for Sequence Analysis 
• Both commercial and noncommercial sequence analysis 
software are available for Windows, Macintosh, and Linux 
operating systems. 
• NGS companies offer proprietary software such as 
consensus assessment of sequence and variation (Cassava) 
for Illumina data and Newbler for 454 data. 
• Such software tend to be optimized for their respective 
platform but have limited cross applicability to the others 
9/8/2014 39
•Commercially available software such as CLC-Bio 
(http://www.clcbio.com/) and SeqMan NGen 
(http://www.dnastar.com/t-sub-products-genomics- 
seqman-ngen.aspx) provide a friendly 
user interface, are compatible with different 
operating systems, require minimal computing 
knowledge, and are capable of performing multiple 
downstream analyses. 
•However, they tend to be relatively expensive, have 
narrow customizability, and require locally available 
high computing power. 
9/8/2014 40
• . Linux-based software such as Bowtie [59], BWA [60], and 
SOAP2/3 [61] have been used widely for the analysis of NGS 
data. 
9/8/2014 41
Software and Pipelines for SNP Discovery 
• Broadly used SNP calling software include Samtools [103], 
SNVer [104], and SOAPsnp [74]. Samtools is popular because 
of its various modules for file conversion (SAM to BAM and 
vice-versa), mapping statistics, variant calling, and assembly 
visualization. 
• Recently, SOAPsnp has gained popularity because of its tight 
integration with SOAP aligner and other SOAP modules 
which are constantly upgraded and provide a one stop shop 
for the sequencing analysis continuum. 
• 
9/8/2014 42
Variant calling algorithms such as Samtools and 
SNVer can be used as stand-alone programs or 
incorporated into pipelines for SNP calling 
• A wide array of commonly used file formats such as 
SAM, BAM, SOAP, ACE, FASTQ, and FASTA generated 
by different read assemblers such as Bowtie, BWA, 
SOAP, MAQ, and SeqMan Ngen. 
9/8/2014 43
SNP Discovery 
• NGS-derived SNPs have been reported in humans , Drosophila , 
wheat , eggplant , rice, Arabidopsis, barley, sorghum , cotton, 
common beans, soybean , potato, flax, Aegilops tauschii, alfalfa, oat, 
and maize to name a few. 
• SNP discovery using NGS is readily accomplished in small plant 
genomes for which good reference genomes are available such as 
rice and Arabidopsis Although SNP discovery in complex genomes 
without a reference genome such as wheat , barley , oat, and beans 
can be achieved through NGS, several challenges remain in other 
nonmodel but economically important crops. 
9/8/2014 44
SNP Validation 
• The two major factors affecting the SNP validation rate are 
sequencing and read mapping errors as discussed above. 
• NGS platforms have different levels of sequencing accuracies, 
and this may be the most important factor determining the 
variation in the validation, from 88.2% for SOLiD followed by 
Illumina at 85.4% and Roche 454 at 71% . 
• The SNP validation rates can be improved using RRL for SNP 
discovery and choosing SNPs within the nonrepetitive 
sequences including predicted single copy genes and single copy 
repeat junctions shown to have high validation rates. 
9/8/2014 45
Genome-Wide Association Mapping 
• Association mapping (AM) panels provide a better resolution, consider 
numerous alleles, and may provide faster marker-trait association than 
biparental populations . 
• AM, often referred to as linkage disequilibrium (LD) mapping, relies on 
the nonrandom association between markers and traits. 
• In the past few years, NGS technologies have led to the discovery of 
thousands, even millions of SNPs, and novel application platforms have 
made it possible to produce genome-wide haplotypes of large numbers 
of genotypes, making SNPs the ideal marker for GWASs. 
9/8/2014 46
• A GWAS performed in rice using ~3.6 million SNPs identified genomic 
regions associated with 14 agronomic traits . 
• The genetic structure of northern leaf blight, southern leaf blight, 
and leaf architecture was studied using ~1.6 million SNPs in maize 
• SNP-based GWAS was also performed on species such as barley for 
which a reference genome sequence is not available ‘ 
• So far, 951 GWASs have been reported in humans . 
9/8/2014 47
Future Perspectives 
• SNP discovery incontestably made a quantum leap forward with the advent 
of NGS technologies and large numbers of SNPs are now available from 
several genomes including large and complex ones . 
• Unlike model systems such as humans and Arabidopsis, SNPs from crop 
plants remain limited for the time being, but broad access to reasonable 
cost NGS promises to rapidly increase the production of reference genome 
sequences as well as SNP discovery. 
• The NGS technologies have made SNP discovery affordable even in complex 
genomes and the technologies themselves have improved tremendously in 
the past decade. 
9/8/2014 48
References 
• Genome wide SNP discovery in flax through next generation 
sequencing of reduced representation libraries:; SANTOSH KUMAR, 
FRANK M YOU and SYLVIE CLOUTIER;2012; BioMed Central 
• Identification of Novel SNPs in Glioblastoma Using Targeted 
Resequencing; ANDREAS KELLER1., CHRISTIAN HARZ2., MARK 
MATZAS1, BENJAMIN MEDER3, HUGO A. KATUS3, NICOLE LUDWIG2, 
ULRIKE FISCHER2, ECKART MEESE2;2011; Ohio State University 
Medical Center. 
9/8/2014 49
Thanks for Attention 
9/8/2014 50

More Related Content

PDF
SNP Genotyping Technologies
PPT
PDF
Single Nucleotide Polymorphism Analysis (SNPs)
PPTX
Use of SNP-HapMaps in plant breeding
PPTX
SNp mining in crops
PPTX
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
PPTX
SNPs analysis methods
SNP Genotyping Technologies
Single Nucleotide Polymorphism Analysis (SNPs)
Use of SNP-HapMaps in plant breeding
SNp mining in crops
Single Nucleotide Polymorphism Genotyping Using Kompetitive Allele Specific ...
SNPs analysis methods

What's hot (19)

PPTX
SNP Detection Methods and applications
PPTX
Seminar presentation on snp (mulualem & janvier)
PPTX
PPTX
Snps and microarray
PPTX
Single nucleotide polymorphisms (sn ps), haplotypes,
PPT
Microsatellite
PPTX
Single nucleotide polymorphism
PPTX
Marker devt. workshop 27022012
PPT
Single Nucleotide Polymorphism
PPT
dna sequencing methods
PPTX
SAGE (Serial analysis of Gene Expression)
PPT
Molecular Markers
PPTX
Molecular markers
PPTX
Single nucleotide polymorphism by kk sahu
PPTX
Restriction fragment length polymorphism
PPTX
Single Nucleotide Polymorphism
PPTX
Role of molecular marker
PPTX
Gene Sequencing
PDF
L11 dna__polymorphisms__mutations_and_genetic_diseases4
SNP Detection Methods and applications
Seminar presentation on snp (mulualem & janvier)
Snps and microarray
Single nucleotide polymorphisms (sn ps), haplotypes,
Microsatellite
Single nucleotide polymorphism
Marker devt. workshop 27022012
Single Nucleotide Polymorphism
dna sequencing methods
SAGE (Serial analysis of Gene Expression)
Molecular Markers
Molecular markers
Single nucleotide polymorphism by kk sahu
Restriction fragment length polymorphism
Single Nucleotide Polymorphism
Role of molecular marker
Gene Sequencing
L11 dna__polymorphisms__mutations_and_genetic_diseases4
Ad

Viewers also liked (6)

PPT
PDF
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
PDF
Non-synonymous SNP ID
PPTX
Renaissance in Medicine - Strata - NoSQL and Genomics
PDF
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
PPT
New Strategy to detect SNPs
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
Non-synonymous SNP ID
Renaissance in Medicine - Strata - NoSQL and Genomics
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
New Strategy to detect SNPs
Ad

Similar to Next generation sequencing for snp discovery(final) (20)

PPTX
Next generation sequencing
PPT
New Generation Sequencing Technologies: an overview
PPTX
Genome sequencing
PPTX
20150601 bio sb_assembly_course
PPTX
Introduction to Next Generation Sequencing
PPTX
Gene sequencing technique
PPTX
3 DNA Sequencing and types by taimoor Khan
PPTX
Rnaseq forgenefinding
PPTX
Third Generation Sequencing
PDF
Next Generation Sequencing revolution (February 2010 - PhD retreat)
PDF
nextgenerationsequencing-170606100132.pdf
PPTX
Next generation sequencing
PPTX
Dna sequening
PDF
RMR-Nirma-NGS-Heena.pdf
PPTX
Dna sequencing ppt
PPTX
PPTX
Conventional and next generation sequencing ppt
PPTX
NGS platform.pptx
PPTX
Ngs microbiome
PPTX
DNA sequencing
Next generation sequencing
New Generation Sequencing Technologies: an overview
Genome sequencing
20150601 bio sb_assembly_course
Introduction to Next Generation Sequencing
Gene sequencing technique
3 DNA Sequencing and types by taimoor Khan
Rnaseq forgenefinding
Third Generation Sequencing
Next Generation Sequencing revolution (February 2010 - PhD retreat)
nextgenerationsequencing-170606100132.pdf
Next generation sequencing
Dna sequening
RMR-Nirma-NGS-Heena.pdf
Dna sequencing ppt
Conventional and next generation sequencing ppt
NGS platform.pptx
Ngs microbiome
DNA sequencing

Next generation sequencing for snp discovery(final)

  • 1. GENOME RE SEQUENCING FOR SNP DISCOVERY / GENOTYPING MARKER Presented by Monoj Sutradhar PALB 3243 Jr. M.sc(Pl. Biotech) UAS,GKVK,Bangalore 9/8/2014 1
  • 2. What are SNPs ? ACGTTTGGATAC TGCAAACCTATG ACGTTTGTATAC TGCAAACATATG Single nucleotide polymorphisms consist of a single change in the DNA code SNPs occur with various allele frequencies. Those in the 20-40% range are useful for genetic mapping. Those at frequencies between 1% and 20% may be used with candidate gene approaches. Usually bi-allelic. Changes at 〈1% are called variants 9/8/2014 2
  • 3. What are the effects of SNPs ? Where Result Effect In coding region May be silent, o.g.,UUG→CUG, leu in both cases sSNP Usually no change in phenotype In coding region May change amino acid sequence, e.g., UUC→UUA, phe to leu, Some characterize these as the least common and most valuable SNPs, Many being patented cSNP Phenotype change (may be subtle depending on amino acid replacement and position) In coding region May create a "Stop"codon, e. g., UCA→UGA, ser to stop Phenotype change In coding region May affect the rate of transcription (up-or down-regulate) cSNP Possible phenotype Change Other regions No affect on gene products(7). May act as genetic markers for multi-component diseases. These are sometimes called anonymous SNPs and are the most common. rSNP 9/8/2014 3
  • 4. How many SNPs are there ? It is estimated that the human genome contains between 3 million and 6 million SNPs spaced irregularly at intervals of 500 to 1,000 bases. The SNP Consortium estimates that as many as 300,000 SNPs may be needed to fuel studies. 100.000 or more SNPs may be required for complex disease gene discovery 9/8/2014 4
  • 5. SNP Discovery SNP Discovery refers to the initial identification of new SNPs. The established method is DNA sequencing with subsequent data analysis. Some indirect Discovery techniques (e.g., dHPLC, SSCP) only indicate that a SNP (or other mutation) exists. DNA sequencing of multiple individuals is used to determine the point and type of polymorphism. Low throughput, based on established DNA sequencing analyses or collected data (also based on electrophoretic data) 9/8/2014 5
  • 6. SNP Validation SNP Validation refers to genetic validation, the process of ensuring that the SNP is not due to sequencing error and that it is not extremely rear. This should not be confused with assay, target or regulatory validation. Confirmation of SNPs found in Discovery Larger numbers of individual samples to get statistical data on occurrence in the population 9/8/2014 6
  • 7. SNP Screening SNP Screening refers to researchers running thousands of genotypes (may SNPs or many individuals or both) Thousands to hundreds of thousands of samples per day Two different screening strategies - Many SNPs in a few individuals - A few SNPs in many individuals Different strategies will require different tools Important in determining markers for complex genetic states 9/8/2014 7
  • 8. Steps of SNP discovery Sequence clustering Cluster refinement Multiple alignment SNP detection
  • 9. Initial SNP Discovery and Mapping SNP discovery using Sanger re-sequencing - MSNP discovery using Sanger re-sequencing - Mostly genic - BAC-end and BAC subclones genic - BAC-end and BAC subclones SNP genotyping and mapping - Sequenom mass spectrometer - Luminex Flow cytometer - Illumina Inc. GoldenGate™ assay
  • 10. Roche (454) Sequencing Pyrosequencing was the first of the new highly parallel sequenci ng technologies to reach the market [24]. It is commonly referred to as 454 sequencing after the name of the company that first co mmercialized it. It is an SBS method where single fragments of DNA are hybridiz ed to a capture bead array and the beads are emulsified with rea gents necessary to PCR amplifying the individually bound templa te. Each bead in the emulsion acts as an independent PCR where millions of copies of the original template are produced and boun d to the capture beads which then serve as the templates for the subsequent sequencing reaction 9/8/2014 10
  • 11. The individual beads are deposited into a picotiter plate along wit h DNA polymerase, primers, and the enzymes necessary to creat e fluorescence through the consumption of inorganic phosphate p roduced during sequencing. The instrument washes the picotiter plate with each of the DNA b ases in turn. As template-specific incorporation of a base by DNA polymerase occurs, a pyrophosphate (PPi) is produced. This pyrophosphate is detected by an enzymatic luminometric in organic pyrophosphate detection assay (ELIDA) through the gen eration of a light signal following the conversion of PPi into ATP 9/8/2014 11
  • 12. Shotgun sequencing by PGM/454 Genomic Fragment Adapters
  • 13. Shotgun sequencing by PGM/454 Genomic Fragment Barcode
  • 15. Shotgun sequencing by PGM/454 Bead/ISP Adapter Complement Sequences The idea is that each bead should be amplified all over with a SINGLE library fragment.
  • 16. Shotgun sequencing by PGM/454 Problem: How do I do PCR to amplify the fragments without having to use 1 tube for each reaction?
  • 21. Shotgun sequencing by PGM/454 ~3.5 μm for Ion Torrent, ~30 μm for 454
  • 22. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A T T A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 T T T
  • 23. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A A A A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 A A A
  • 24. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A G G A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 G G G G
  • 25. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A T T A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 G T T T T
  • 26. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A C C A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 G T C C C C
  • 27. Shotgun sequencing by PGM/454 Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A G G A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 G T C T T G G G G G G The real pow er of this met hod is that it can take plac e in millions of tiny wells i n a single pla te at once.
  • 28. Only give polymerase one nucleotide at a time: Prime r T G C G C G G C C C A G G A C G C G C C G G G T C A G A A C C C G A T C G C G 5’ 3’ 5’ If that nucleotide is incorporated, enzymes turn b y-products into light: T C A G T C A G T C A G 1 2 3 4 5 G T C T T G G G G G G The real pow er of this met hod is that it can take plac e in millions of tiny wells i n a single pla te at once. Raw 454 data
  • 29. The instrument repeats the sequential nucleotide wash cy cle hundreds of times to lengthen the sequences. The 454 GS FLX Titanium XL+ platform currently generate s up to 700 MB of raw 750 bp reads in a 23 hour run 9/8/2014 29
  • 30. . Illumina Sequencing • Illumina technology, acquired by Illumina from Solexa, followed the release of 454 sequencing. • With this sequencing approach, fragments of DNA are hybridized to a solid substrate called a flow cell. • In a process called bridge amplification, the bound DNA template fragments are amplified in an isothermal reaction where copies of the template are created in close proximity to the original. 9/8/2014 30
  • 31. • This results in clusters of DNA fragments on the flow cell creating a “lawn” of bound single strand DNA molecules. • The molecules are sequenced by flooding the flow cell with a new class of cleavable fluorescent nucleotides and the reagents necessary for DNA polymerization . • • A complementary strand of each template is synthesized one base at a time using fluorescently labeled nucleotides. • The fluorescent molecule is excited by a laser and emits light, the colour of which is different for each of the four bases. The fluorescent label is then cleaved off and a new round of polymerization occurs 9/8/2014 31
  • 33. • Unlike 454 sequencing, all four bases are present for the polymerization step and only a single molecule is incorporated per cycle. • The flagship HiSeq2500 sequencing instrument from Illumina can generate up to 600 GB per run with a read length of 100 nt and 0.1% error rate. • The Illumina technique can generate sequence from opposite ends of a DNA fragment, so called paired-end (PE) reads. 9/8/2014 33
  • 34. . Applied Biosystems (SOLiD) Sequencing • The SOLiD system was jointly developed by the Harvard Medical School and the Howard Hughes Medical Institute . The library preparation in SOLiD is very similar to Roche/454 in which clonal bead populations are prepared in microreactors containing DNA template, beads, primers, and PCR components. • Beads that contain PCR products amplified by emulsion PCR are enriched by a proprietary process. The DNA templates on the beads are modified at their 3′ end to allow attachment to glass slides. • A primer is annealed to an adapter on the DNA template and a mixture of fluorescently tagged oligonucleotides is pumped into the flow cell 9/8/2014 34
  • 35. • . When the oligonucleotide matches the template sequence, it is ligated onto the primer and the unincorporated nucleotides are washed away. • A charged couple device (CCD) camera captures the different colours attached to the primer. Each fluorescence wavelength corresponds to a particular dinucleotide combination. • After image capture, the fluorescent tag is removed and new set of oligonucleotides are injected into the flow cell to begin the next round of DNA ligation . • This sequencing-by-ligation method in SOLiD-5500x1 platform generates up to 1,410 million reads of nt each with an error rate of 0.01% 9/8/2014 35
  • 39. Software for Sequence Analysis • Both commercial and noncommercial sequence analysis software are available for Windows, Macintosh, and Linux operating systems. • NGS companies offer proprietary software such as consensus assessment of sequence and variation (Cassava) for Illumina data and Newbler for 454 data. • Such software tend to be optimized for their respective platform but have limited cross applicability to the others 9/8/2014 39
  • 40. •Commercially available software such as CLC-Bio (http://www.clcbio.com/) and SeqMan NGen (http://www.dnastar.com/t-sub-products-genomics- seqman-ngen.aspx) provide a friendly user interface, are compatible with different operating systems, require minimal computing knowledge, and are capable of performing multiple downstream analyses. •However, they tend to be relatively expensive, have narrow customizability, and require locally available high computing power. 9/8/2014 40
  • 41. • . Linux-based software such as Bowtie [59], BWA [60], and SOAP2/3 [61] have been used widely for the analysis of NGS data. 9/8/2014 41
  • 42. Software and Pipelines for SNP Discovery • Broadly used SNP calling software include Samtools [103], SNVer [104], and SOAPsnp [74]. Samtools is popular because of its various modules for file conversion (SAM to BAM and vice-versa), mapping statistics, variant calling, and assembly visualization. • Recently, SOAPsnp has gained popularity because of its tight integration with SOAP aligner and other SOAP modules which are constantly upgraded and provide a one stop shop for the sequencing analysis continuum. • 9/8/2014 42
  • 43. Variant calling algorithms such as Samtools and SNVer can be used as stand-alone programs or incorporated into pipelines for SNP calling • A wide array of commonly used file formats such as SAM, BAM, SOAP, ACE, FASTQ, and FASTA generated by different read assemblers such as Bowtie, BWA, SOAP, MAQ, and SeqMan Ngen. 9/8/2014 43
  • 44. SNP Discovery • NGS-derived SNPs have been reported in humans , Drosophila , wheat , eggplant , rice, Arabidopsis, barley, sorghum , cotton, common beans, soybean , potato, flax, Aegilops tauschii, alfalfa, oat, and maize to name a few. • SNP discovery using NGS is readily accomplished in small plant genomes for which good reference genomes are available such as rice and Arabidopsis Although SNP discovery in complex genomes without a reference genome such as wheat , barley , oat, and beans can be achieved through NGS, several challenges remain in other nonmodel but economically important crops. 9/8/2014 44
  • 45. SNP Validation • The two major factors affecting the SNP validation rate are sequencing and read mapping errors as discussed above. • NGS platforms have different levels of sequencing accuracies, and this may be the most important factor determining the variation in the validation, from 88.2% for SOLiD followed by Illumina at 85.4% and Roche 454 at 71% . • The SNP validation rates can be improved using RRL for SNP discovery and choosing SNPs within the nonrepetitive sequences including predicted single copy genes and single copy repeat junctions shown to have high validation rates. 9/8/2014 45
  • 46. Genome-Wide Association Mapping • Association mapping (AM) panels provide a better resolution, consider numerous alleles, and may provide faster marker-trait association than biparental populations . • AM, often referred to as linkage disequilibrium (LD) mapping, relies on the nonrandom association between markers and traits. • In the past few years, NGS technologies have led to the discovery of thousands, even millions of SNPs, and novel application platforms have made it possible to produce genome-wide haplotypes of large numbers of genotypes, making SNPs the ideal marker for GWASs. 9/8/2014 46
  • 47. • A GWAS performed in rice using ~3.6 million SNPs identified genomic regions associated with 14 agronomic traits . • The genetic structure of northern leaf blight, southern leaf blight, and leaf architecture was studied using ~1.6 million SNPs in maize • SNP-based GWAS was also performed on species such as barley for which a reference genome sequence is not available ‘ • So far, 951 GWASs have been reported in humans . 9/8/2014 47
  • 48. Future Perspectives • SNP discovery incontestably made a quantum leap forward with the advent of NGS technologies and large numbers of SNPs are now available from several genomes including large and complex ones . • Unlike model systems such as humans and Arabidopsis, SNPs from crop plants remain limited for the time being, but broad access to reasonable cost NGS promises to rapidly increase the production of reference genome sequences as well as SNP discovery. • The NGS technologies have made SNP discovery affordable even in complex genomes and the technologies themselves have improved tremendously in the past decade. 9/8/2014 48
  • 49. References • Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries:; SANTOSH KUMAR, FRANK M YOU and SYLVIE CLOUTIER;2012; BioMed Central • Identification of Novel SNPs in Glioblastoma Using Targeted Resequencing; ANDREAS KELLER1., CHRISTIAN HARZ2., MARK MATZAS1, BENJAMIN MEDER3, HUGO A. KATUS3, NICOLE LUDWIG2, ULRIKE FISCHER2, ECKART MEESE2;2011; Ohio State University Medical Center. 9/8/2014 49
  • 50. Thanks for Attention 9/8/2014 50