SlideShare a Scribd company logo
Genome Research 19:1124-1132, 2009



Speaker: Eric C.Y., LEE
Aim

• They want to developed a SNP calling
  method for Illumina platform.
• Consider the data quality, alignment and
  experimental error common to this
  platform.
Applications of NGS

• From whole genome sequence to know the
  gene variations between individuals.
 • Disease
 • Drug
 • Environment
Workflow
     Sequencing
       reads




   Map reads onto
  reference genome
                                          Prior probability
                                              of each
                                             genotype



Recalibrate sequencing
    quality score




Calculate likelihood of
   each genotype




                             Inferred
                           genotype via
                          Bayes theorem
Traditional Method

• Phred score is a universal standard.
• Compare the sample sequence with
  reference genome and filter low score
  mismatch.
• A method to detect heterozygous
  polymorphisms.
Prior Probability
• According to existing researches
 • The estimated SNP rate between two
    human haploid chromosome is about
    0.001. (Sachidanandam et al. 2001).
 • Human reference genome sequence has
    an error rate of 0.00001. (Collins et al.
    2004)

  Set the homozygous SNP at 0.0005, and the
  hetrozygous rate is 0.001.
Prior Probability
• According to a previous study on dbSNP,
  transitions are four times more frequent
  than transversions among the substitution
  mutations. (Zhao and Boerwinkle 2002)
Alignment


• Indels is the error source.
• Using SOAP for alignment.
Recalibration

• 3’ -end of reads have a much higher error
  rate than earlier cycles.
• Original quality score can’t represent the
  true error rate.
• Check the mismatch in dbSNP.
Recalibration
• Illumina uses two lasers.
 • A and C use the same laser, G and T use
    another.
 • A-C and G-T substitution were 58%-72%
    overestimated.
• Duplicate reads
 • Penalty for these reads.
Likelihood Calculation

• Observed allele type
• Quality score
• Sequencing cycle
• Observation of the same allele from
  reads with the same mapping location.
Evaluation

• Comparison of the consensus sequence
  with Illumina human 1M BeadChip
  genotyped alleles from the same DNA
  sample showed genotyped alleles on the X
  chromosome and autosomes were
  covered at 99.97% and 99.84% consistency,
  respectively.

More Related Content

PDF
Genome wide association mapping
PPTX
Human genome, genetic mapping, cloning, and cryonics
PPTX
Gene hunting strategies
PPTX
Bioinformatics
PPTX
Mapping population
PPTX
Linkage analysis
PPTX
Gene mapping and cloning of disease gene
PPTX
Genome wide association studies seminar
Genome wide association mapping
Human genome, genetic mapping, cloning, and cryonics
Gene hunting strategies
Bioinformatics
Mapping population
Linkage analysis
Gene mapping and cloning of disease gene
Genome wide association studies seminar

What's hot (20)

PPTX
Gene mapping
PPT
dna fingerprinting powerpoint
PPTX
PPTX
Concept of genome mapping
PPTX
Gene mapping methods
PPTX
Association mapping
PDF
Gene mapping / Genetic map vs Physical Map | determination of map distance a...
PPTX
Snps and microarray
PPTX
Gene mapping tools
PPTX
Mapping population ppt
PPTX
Genomic mapping, genetic mapping
PPT
Structural genomics
PPTX
genome mapping
PPT
Gene expression profiling i
PPTX
Map based cloning
PPTX
Genetic mapping
PPTX
Mapping the genome of bacteria
PPTX
Molecular marker and gene mapping
Gene mapping
dna fingerprinting powerpoint
Concept of genome mapping
Gene mapping methods
Association mapping
Gene mapping / Genetic map vs Physical Map | determination of map distance a...
Snps and microarray
Gene mapping tools
Mapping population ppt
Genomic mapping, genetic mapping
Structural genomics
genome mapping
Gene expression profiling i
Map based cloning
Genetic mapping
Mapping the genome of bacteria
Molecular marker and gene mapping
Ad

Viewers also liked (7)

PDF
Neo4j: JDBC Connection Case Using LibreOffice
KEY
Introduction to 3rd sequencing
PDF
Python and Neo4j
PDF
Introduction to Graph Database
KEY
Google MAP API
PDF
R3 Corda Simple Tutorial
PDF
COSCUP 2016 Workshop : 快快樂樂學Neo4j
Neo4j: JDBC Connection Case Using LibreOffice
Introduction to 3rd sequencing
Python and Neo4j
Introduction to Graph Database
Google MAP API
R3 Corda Simple Tutorial
COSCUP 2016 Workshop : 快快樂樂學Neo4j
Ad

Similar to SNP Detection for Massively Parallel Whole-genome Sequencing (20)

PPTX
Nida ws neale_seq_data_gen
PDF
2007. stephen chanock. technologic issues in gwas and follow up studies
PPTX
Introduction to haplotype blocks .pptx
PPTX
paternity testing pptx.
PPTX
Applications in forensics PCR and DNA fingerprinting.pptx
PPTX
molecular marker RFLP, and application
PPTX
SNPs and ESTs.pptx
PPTX
GENE gene marker blood typing , abo blood typing vntr
PPTX
GENE marker typing , abo blood grouping ,karl lamdsteiner
PDF
A practical approach to Southern, Northern and Western Blot analyses
PPTX
Virulence gene typing and its Applications in Genetic Engineering
PDF
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
PPTX
Ppt snp detection
PPTX
SNPs Presentation Cavalcanti Lab
PPTX
SNPs analysis methods
PDF
Gene expression introduction
PPTX
Single Nucleotide Polymorphisms (2)-1.pptx
PPTX
Importance of Genetic Markers in Forensics
PPT
Dna fingerprinting
PPTX
DNA fingerprinting- criminology and paternal identification
Nida ws neale_seq_data_gen
2007. stephen chanock. technologic issues in gwas and follow up studies
Introduction to haplotype blocks .pptx
paternity testing pptx.
Applications in forensics PCR and DNA fingerprinting.pptx
molecular marker RFLP, and application
SNPs and ESTs.pptx
GENE gene marker blood typing , abo blood typing vntr
GENE marker typing , abo blood grouping ,karl lamdsteiner
A practical approach to Southern, Northern and Western Blot analyses
Virulence gene typing and its Applications in Genetic Engineering
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
Ppt snp detection
SNPs Presentation Cavalcanti Lab
SNPs analysis methods
Gene expression introduction
Single Nucleotide Polymorphisms (2)-1.pptx
Importance of Genetic Markers in Forensics
Dna fingerprinting
DNA fingerprinting- criminology and paternal identification

SNP Detection for Massively Parallel Whole-genome Sequencing

  • 1. Genome Research 19:1124-1132, 2009 Speaker: Eric C.Y., LEE
  • 2. Aim • They want to developed a SNP calling method for Illumina platform. • Consider the data quality, alignment and experimental error common to this platform.
  • 3. Applications of NGS • From whole genome sequence to know the gene variations between individuals. • Disease • Drug • Environment
  • 4. Workflow Sequencing reads Map reads onto reference genome Prior probability of each genotype Recalibrate sequencing quality score Calculate likelihood of each genotype Inferred genotype via Bayes theorem
  • 5. Traditional Method • Phred score is a universal standard. • Compare the sample sequence with reference genome and filter low score mismatch. • A method to detect heterozygous polymorphisms.
  • 6. Prior Probability • According to existing researches • The estimated SNP rate between two human haploid chromosome is about 0.001. (Sachidanandam et al. 2001). • Human reference genome sequence has an error rate of 0.00001. (Collins et al. 2004) Set the homozygous SNP at 0.0005, and the hetrozygous rate is 0.001.
  • 7. Prior Probability • According to a previous study on dbSNP, transitions are four times more frequent than transversions among the substitution mutations. (Zhao and Boerwinkle 2002)
  • 8. Alignment • Indels is the error source. • Using SOAP for alignment.
  • 9. Recalibration • 3’ -end of reads have a much higher error rate than earlier cycles. • Original quality score can’t represent the true error rate. • Check the mismatch in dbSNP.
  • 10. Recalibration • Illumina uses two lasers. • A and C use the same laser, G and T use another. • A-C and G-T substitution were 58%-72% overestimated. • Duplicate reads • Penalty for these reads.
  • 11. Likelihood Calculation • Observed allele type • Quality score • Sequencing cycle • Observation of the same allele from reads with the same mapping location.
  • 12. Evaluation • Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed genotyped alleles on the X chromosome and autosomes were covered at 99.97% and 99.84% consistency, respectively.

Editor's Notes