SlideShare a Scribd company logo
Potato SNPs


Dan Bolser and David Martin

  Next Gen Bug, Dundee
       01/18/2010



                        1
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         2
Creating a SNP calling pipeline




                       3
4
Align (using BWA)
1) Index the potato genome assembly
bwa index [-a bwtsw|div|is]             [-c]
 <in.fasta>
2) Perform the alignment
bwa aln [options] <in.fasta>
 <in.fq>
3) Output results in SAM format (single end)
bwa samse <in.fasta> <in.sai>
 <in.fq>                  5
Align (using Bowtie)
1) Index the potato genome assembly
bowtie-build [options] <in.fasta>
  <ebwt>
2) Perform the alignment and output results
bowtie [options] <ebwt> <in.fq>
7
Convert (using SAMtools)
1) Convert SAM to BAM for sorting
samtools view -S -b <in.sam>
2) Sort BAM for SNP calling
samtools sort <in.bam> <out.bam.s>


  Alignments are both compressed for long term
storage and sorted for variant discovery.

                                    8
9
Coverage profiles /
  Depth vectors



                 10
SAMtools...

    Dump a coverage profile
samtools mpileup -f <in.fasta>
 <my.bam.s>
    P1   244526   A   10   ...,.,,,..      BBQa`aaaa[
    P1   244527   A   10   ...,.,,,..      BBZ_`^a_a[
    P1   244528   C   10   .$.$.,.,,,..    >>RaZ`aaaa
    P1   244529   C    8   .,.,,,..        NaXaaaa`
    P1   244530   T    8   .,.,,,..        Xa_aaa`
    P1   244531   C    8   .,.,,,..        Rbabbaa
    P1   244532   T    9   .,.,,,..^~.     EE^^^^^^A
    P1   244533   T    9   .,.,,,...       BBB
    P1   244534   T    9   .$,$.,,,...     @@^^^^^^E

                                          11
SAMtools Bio::DB::Sam (BioPerl)
Dump a coverage
 profile 2




                       12
SAMtools Bio::DB::Sam (BioPerl)
P41630
Matches : 9
0233333333333345555555555
 666778888888899999999999
 999999999999999999999999
 999976666666666665444444
 44443332211111111000

                        13
14
mpileup

    samtools mpileup collects summary
    information in the input BAMs, computes the
    likelihood of data given each possible
    genotype and stores the likelihoods in the
    BCF format.

    bcftools view applies the prior and does the
    actual calling.

    Finally, we filter.
                                    15
SNP call
1) Index the potato genome assembly (again!)
samtools faidx in.fasta
2) Run 'mpileup' to generate VCF format
samtools mpileup -ug -f in.fasta
  my1.bam.s my2.bam.s > my.raw.bcf

    Actually, all we did (I think) is perform a
    format conversion (BAM to VCF).
VCF format




             17
VCF format
A standard format for sequence variation:
  SNPs, indels and structural variants.
Compressed and indexed.
Developed for the 1000 Genomes Project.
VCFtools for VCF like SAMtools for SAM.
Specification and tools available from
 http://vcftools.sourceforge.net
                                    18
19
SNP call and filter
1) Call SNPs
bcftools view -bvcg my.raw.bcf >
 my.var.bcf
2) Filter SNPs
bcftools view my.var.bcf |
 vcfutils.pl varFilter my.var.bcf
 > my.var.bcf.filt


                             20
21
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         22
Select SNPs for genetic mapping
 Using Illumina's GoldenGate SNP chip (OPA)




                                23
SNP chip (OPA) construction

    A set of DM SNP positions was provided by
    the SolCAP project (RNASeq derived).

    A subset was selected for developing OPAs
    (Illumina’s SNP chip technology).

    OPAs were run, and results have now been
    compared to RNASeq.


                                   24
Comparison (using an early SAMtools)
Comparison (using an early SAMtools)
27
Creating a SNP calling pipeline
Comparison (using an early SAMtools)
Comparison (using new SAMtools)
Creating a SNP calling pipeline
Creating a SNP calling pipeline
Comparison (using new SAMtools)
Looking into the RNASeq data…




                      34
35
Potato genome
  assembly




      RNASeq          RNASeq
     read library    read library




                    36
37
38
39
40
41
A lot more questions to answer…

    Track down more ‘strange’ SNPs based on
    the expected AFS of the two samples.

    Go beyond bialleleic SNPs

    Check the OPA base...
    −   Was the right base probed by the chip?




                                          42
Thank you for your patience!




                      43
Creating a SNP calling pipeline
OPAs in 5 steps...
         The DNA sample is
          activated for binding
          to paramagnetic
          particles.
OPAs in 5 steps...
         Three oligos are
          designed for each
          SNP locus. Two are
          specific to each allele
          of the SNP site
          (ASO) and a Locus-
          Specific Oligo (LSO).
OPAs in 5 steps...
        Several wash steps
         remove excess and
         mis-hybridized oligos.
        Extension of the
         appropriate ASO and
         ligation to the LSO joins
         information about the
         genotype to the
         address sequence on
         the LSO.
OPAs in 5 steps...
         The single-stranded,
          dye-labeled DNAs
          are hybridized to
          their complement
          bead type through
          their unique address
          sequences.
OPAs in 5 steps...
         Key to the assay:
         Scalable, multiplexing
          sample preparation
          (one tube reaction).
         Highly parallel array-
           based read-out.
         High-quality data:
           Average call rates
           above 99% accuracy.

More Related Content

PPTX
Sequence database
PDF
Genome organisation
PDF
Gene prediction strategies
PDF
Gene mapping
PPTX
Lecture 3 l dand_haplotypes_full
PPTX
Dna sequencing
PPTX
Tilling & Eco-tilling.pptx
PDF
Transcriptome Analysis & Applications
Sequence database
Genome organisation
Gene prediction strategies
Gene mapping
Lecture 3 l dand_haplotypes_full
Dna sequencing
Tilling & Eco-tilling.pptx
Transcriptome Analysis & Applications

What's hot (20)

PPT
Blast fasta 4
PPTX
SNPs analysis methods
PDF
Genome Assembly 2018
PPTX
gor ppt (1).pptx
PDF
Overview of methods for variant calling from next-generation sequence data
PPTX
SNP ppt.pptx
PPT
HMM (Hidden Markov Model)
PPTX
cDNA synthesis
PPTX
String.pptx
PPTX
Genome organization ,gene expression sand regulation
PPTX
NGS data formats and analyses
PPTX
Gene mapping methods
PPT
Molecular Docking using Autodock 4.2.6
PPTX
DNA organization in Eukaryotic cells
PPTX
Microarray of long oligonucleotide
PPT
Gene expression profiling i
PPTX
Comparative genomics
PPTX
Primary Transcript
PPT
Multiple sequence alignment
PDF
NGS: Mapping and de novo assembly
Blast fasta 4
SNPs analysis methods
Genome Assembly 2018
gor ppt (1).pptx
Overview of methods for variant calling from next-generation sequence data
SNP ppt.pptx
HMM (Hidden Markov Model)
cDNA synthesis
String.pptx
Genome organization ,gene expression sand regulation
NGS data formats and analyses
Gene mapping methods
Molecular Docking using Autodock 4.2.6
DNA organization in Eukaryotic cells
Microarray of long oligonucleotide
Gene expression profiling i
Comparative genomics
Primary Transcript
Multiple sequence alignment
NGS: Mapping and de novo assembly
Ad

Viewers also liked (20)

PDF
20-Line Lifesavers: Coding simple solutions in the GATK
PDF
Ensembl Plants: Visualising, mining and analysing crop genomics data
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 1
PPTX
SNp mining in crops
PDF
Press Release Vietnam -Vietnamese
PDF
Cloud Computing and ROI
PPT
IBM SaaS Complete A Questionnaire
PPTX
Appearances do matter leadership in a crisis
PDF
Chuong 1 tu bat on vi mo den con duong tai co cau
PPT
Building Your Personal Brand with Social Media
PPTX
Workshop social networking 09
PPT
IBM SaaS Upload And Share A File
ODP
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
PDF
IR 2.0: media społecznościowe w relacjach inwestorskich
PPT
Luxury Real Estate Stats 4 26
PDF
DWI_Introduction Material_ver.01 (2)
PPTX
TiếP Thị Số HướNg DẫNthiếT YếU Cho
PPT
BioWikis BSB10
PDF
Manifesto Dos EmpresáRios
PPTX
Questions
20-Line Lifesavers: Coding simple solutions in the GATK
Ensembl Plants: Visualising, mining and analysing crop genomics data
Variant (SNPs/Indels) calling in DNA sequences, Part 1
SNp mining in crops
Press Release Vietnam -Vietnamese
Cloud Computing and ROI
IBM SaaS Complete A Questionnaire
Appearances do matter leadership in a crisis
Chuong 1 tu bat on vi mo den con duong tai co cau
Building Your Personal Brand with Social Media
Workshop social networking 09
IBM SaaS Upload And Share A File
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
IR 2.0: media społecznościowe w relacjach inwestorskich
Luxury Real Estate Stats 4 26
DWI_Introduction Material_ver.01 (2)
TiếP Thị Số HướNg DẫNthiếT YếU Cho
BioWikis BSB10
Manifesto Dos EmpresáRios
Questions
Ad

Similar to Creating a SNP calling pipeline (20)

PDF
Introduction to Apollo for i5k
PPTX
20150601 bio sb_assembly_course
PDF
Using BioNano Maps to Improve an Insect Genome Assembly​
PPTX
Rnaseq forgenefinding
PDF
20110524zurichngs 1st pub
PPTX
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
PPTX
PPTX
Microarray data analysis of the variants
PPTX
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
PDF
Apollo Collaborative genome annotation editing
PDF
07 wp6 progresses&results-20130221
PDF
Introduction to NGS
PDF
RNA sequencing analysis tutorial with NGS
PDF
20110524zurichngs 2nd pub
PPTX
2.CRISPR .pptx
PDF
Overview of methods for variant calling from next-generation sequence data
PPTX
Fish546
PPTX
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
PPTX
PACBIO SEQUENCING - PRINCIPLE, TYPES, APPLICATION, ADVANTAGE AND DISADVANTAGE
PPTX
RNASeq - Analysis Pipeline for Differential Expression
Introduction to Apollo for i5k
20150601 bio sb_assembly_course
Using BioNano Maps to Improve an Insect Genome Assembly​
Rnaseq forgenefinding
20110524zurichngs 1st pub
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Microarray data analysis of the variants
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
Apollo Collaborative genome annotation editing
07 wp6 progresses&results-20130221
Introduction to NGS
RNA sequencing analysis tutorial with NGS
20110524zurichngs 2nd pub
2.CRISPR .pptx
Overview of methods for variant calling from next-generation sequence data
Fish546
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
PACBIO SEQUENCING - PRINCIPLE, TYPES, APPLICATION, ADVANTAGE AND DISADVANTAGE
RNASeq - Analysis Pipeline for Differential Expression

More from Dan Bolser (7)

PDF
Ramona Tăme - Email Encryption and Digital SIgning
ODP
Nice 2012, BioWikis and DASWiki
PPTX
Ensembl plants hsf_d_bolser_2012
PDF
NETTAB 2012 flyer
PPT
Semantic MediaWiki Workshop
PPT
Wikis at work
ODP
Wikipedia and the Global Brain
Ramona Tăme - Email Encryption and Digital SIgning
Nice 2012, BioWikis and DASWiki
Ensembl plants hsf_d_bolser_2012
NETTAB 2012 flyer
Semantic MediaWiki Workshop
Wikis at work
Wikipedia and the Global Brain

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Basic Mud Logging Guide for educational purpose
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Pharma ospi slides which help in ospi learning
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Sports Quiz easy sports quiz sports quiz
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
Insiders guide to clinical Medicine.pdf
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Basic Mud Logging Guide for educational purpose
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Pharma ospi slides which help in ospi learning
Supply Chain Operations Speaking Notes -ICLT Program
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Sports Quiz easy sports quiz sports quiz
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis
Final Presentation General Medicine 03-08-2024.pptx
Cell Structure & Organelles in detailed.
Microbial disease of the cardiovascular and lymphatic systems
human mycosis Human fungal infections are called human mycosis..pptx

Creating a SNP calling pipeline

  • 1. Potato SNPs Dan Bolser and David Martin Next Gen Bug, Dundee 01/18/2010 1
  • 2. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 2
  • 3. Creating a SNP calling pipeline 3
  • 4. 4
  • 5. Align (using BWA) 1) Index the potato genome assembly bwa index [-a bwtsw|div|is] [-c] <in.fasta> 2) Perform the alignment bwa aln [options] <in.fasta> <in.fq> 3) Output results in SAM format (single end) bwa samse <in.fasta> <in.sai> <in.fq> 5
  • 6. Align (using Bowtie) 1) Index the potato genome assembly bowtie-build [options] <in.fasta> <ebwt> 2) Perform the alignment and output results bowtie [options] <ebwt> <in.fq>
  • 7. 7
  • 8. Convert (using SAMtools) 1) Convert SAM to BAM for sorting samtools view -S -b <in.sam> 2) Sort BAM for SNP calling samtools sort <in.bam> <out.bam.s>  Alignments are both compressed for long term storage and sorted for variant discovery. 8
  • 9. 9
  • 10. Coverage profiles / Depth vectors 10
  • 11. SAMtools...  Dump a coverage profile samtools mpileup -f <in.fasta> <my.bam.s> P1 244526 A 10 ...,.,,,.. BBQa`aaaa[ P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[ P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaa P1 244529 C 8 .,.,,,.. NaXaaaa` P1 244530 T 8 .,.,,,.. Xa_aaa` P1 244531 C 8 .,.,,,.. Rbabbaa P1 244532 T 9 .,.,,,..^~. EE^^^^^^A P1 244533 T 9 .,.,,,... BBB P1 244534 T 9 .$,$.,,,... @@^^^^^^E 11
  • 12. SAMtools Bio::DB::Sam (BioPerl) Dump a coverage profile 2 12
  • 13. SAMtools Bio::DB::Sam (BioPerl) P41630 Matches : 9 0233333333333345555555555 666778888888899999999999 999999999999999999999999 999976666666666665444444 44443332211111111000 13
  • 14. 14
  • 15. mpileup  samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format.  bcftools view applies the prior and does the actual calling.  Finally, we filter. 15
  • 16. SNP call 1) Index the potato genome assembly (again!) samtools faidx in.fasta 2) Run 'mpileup' to generate VCF format samtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf  Actually, all we did (I think) is perform a format conversion (BAM to VCF).
  • 18. VCF format A standard format for sequence variation: SNPs, indels and structural variants. Compressed and indexed. Developed for the 1000 Genomes Project. VCFtools for VCF like SAMtools for SAM. Specification and tools available from http://vcftools.sourceforge.net 18
  • 19. 19
  • 20. SNP call and filter 1) Call SNPs bcftools view -bvcg my.raw.bcf > my.var.bcf 2) Filter SNPs bcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt 20
  • 21. 21
  • 22. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 22
  • 23. Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA) 23
  • 24. SNP chip (OPA) construction  A set of DM SNP positions was provided by the SolCAP project (RNASeq derived).  A subset was selected for developing OPAs (Illumina’s SNP chip technology).  OPAs were run, and results have now been compared to RNASeq. 24
  • 25. Comparison (using an early SAMtools)
  • 26. Comparison (using an early SAMtools)
  • 27. 27
  • 29. Comparison (using an early SAMtools)
  • 34. Looking into the RNASeq data… 34
  • 35. 35
  • 36. Potato genome assembly RNASeq RNASeq read library read library 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. A lot more questions to answer…  Track down more ‘strange’ SNPs based on the expected AFS of the two samples.  Go beyond bialleleic SNPs  Check the OPA base... − Was the right base probed by the chip? 42
  • 43. Thank you for your patience! 43
  • 45. OPAs in 5 steps... The DNA sample is activated for binding to paramagnetic particles.
  • 46. OPAs in 5 steps... Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus- Specific Oligo (LSO).
  • 47. OPAs in 5 steps... Several wash steps remove excess and mis-hybridized oligos. Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.
  • 48. OPAs in 5 steps... The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.
  • 49. OPAs in 5 steps... Key to the assay: Scalable, multiplexing sample preparation (one tube reaction). Highly parallel array- based read-out. High-quality data: Average call rates above 99% accuracy.

Editor's Notes

  • #47: All three oligo sequences contain regions of genomic complementarity and universal PCR primer sites; the LSO also contains a unique address sequence that targets a particular bead type. Up to 1,536 SNPs may be interrogated simultaneously in this manner. During the primer hybridization process, the assay oligos hybridize to the genomic DNA sample bound to paramagnetic particles. Because hybridization occurs prior to any amplification steps, no amplification bias can be introduced into the assay.
  • #48: Extension of the appropriate ASO and ligation of the extended product to the LSO joins information about the genotype present at the SNP site to the address sequence on the LSO Allele-specific primer extension (ASPE). This step is used to preferentially extend the correctly matched ASO (at the 3&apos; end) up to the 5&apos; end of the LSO primer.
  • #49: One to one mapping between an address sequence on the array and the locus being scored. As a result of this labeling scheme, the PCR product consists of double stranded DNA of which one strand, containing the complement to the Illumicode, is labeled with either Cy3 or Cy5 in an allele specific manner, and a complementary strand labeled with biotin. The biotinylated strand is removed and the single, florescently labeled strand hybridized to the BeadArray.