Harrison Leong and Stephan Berosik
ThermoFisher Scientific, Genetic Sciences Division, 200 Oyster Point Blvd., South San Francisco, CA, 94080
BACKGROUND
Inserted or deleted genetic material has been associated with many forms
of cancer5,6,7. For disease detection and treatment monitoring it is of
interest to detect low levels of these mutations. To do this with Sanger
sequencing technologies, a complication is that signals from the normal
DNA sequence and mutated DNA are mixed together and the sequences
are shifted relative to one another. Figure 1 shows an example of the
resultant electropherogram caused by the insertion of CGCC in half of the
DNA molecules in the sample.
CONCLUSIONS
In addition to the conclusions expressed in the abstract, a closer look at the results
suggests that the VAF limit for detecting indels may be below 2.5%. The VAF limit
for accurately characterizing indels may be around 10%. Additional work is needed
to confirm these results with independent validation data that includes the 1% and
10% VAF data points. No a priori information about the indels were used by the
algorithm so it is possible that results can be improved for panels of targets where a
priori information is available.
REFERENCES
1.Lin, M.T. et al. (2014), American Journal of Clinical Pathology, June 2014; 141:856-866.
2.Jancik, S. et al. (2012), Journal of Experimental & Clinical Cancer Research 2012; 31:79:1-13.
3.Tsiatis, A.C. et al. (2010), Journal of Molecular Diagnostics, July 2010; 12:4:425-432.
4.Leong, H. et al. (2016), AMP conference poster TT27.
5.Yang H. et al. (2010), BMC Medical Genetics, 11:128.
6.Lengar, P., (2012), Nucleic Acids Research, Vol. 40, No. 14 6401–6413.
7.Ye, K. et al., (2016), Nat Med., January ; 22(1): 97–104. doi:10.1038/nm.4002.
High Sensitivity Sanger Sequencing for Minor Indel Detection and Characterization
(Automated Detection and Characterization of Minor Insertion and Deletion Variants Using Sanger Sequencing)
TT79
MATERIALS AND METHODS
DATA FOR DEVELOPING THE METHOD
The samples used and methods to process those samples is covered in
the abstract. The algorithm requires a quartet of .ab1 files to perform
its analysis: sequencing results for a test and control sample sequenced
in the forward and reverse orientations. The following describes the
quartets used to develop the algorithm:
1130 quartets with no indels present covered VAFs 0%, 0.6125%, 1%,
1.25%, 1.4%, 2%, 2.5%, 5%, 5.28%, 7.5%, 10%, 15%, 20%, 25%, 30%,
and 50%. There were from 0 up to 16 SNPs in these quartets.
Sequence lengths ranged from 125bp to 466bp.
203 quartets had indels and covered VAFs 2.5% (33), 5% (33), 6.25%
(30), 7.5% (1), 12.5% (30), 15% (1), 25% (33), and 50% (33); the
number of quartets for each VAF are in parenthesis. VAF was unknown
for 9 of the quartets. There were 122 deletions from 1bp up to 48bp.
There were 81 insertions from 1bp to 6bp. Sequences lengths ranged
from 114bp to 590bp.
DATA PROCESSING
Data from control and test samples for both forward and reverse
sequencing reactions were processed through the basecaller embedded
within the data collection software that comes with Applied Biosystems™
3500xl Genetic Analyzer. The resulting analyzed data, an example of
which is shown in Figure 3, were processed by the indel algorithm to find
whether or not an indel is present and, if so, to determine indel type,
length, location, sequence, and VAF.
ABSTRACT
(This abstract differs from that submitted to AMP.)
Introduction: Detecting minor genetic variants has become
essential to cancer and infectious disease management. Many
have turned to next generation sequencing to fill this need given
the common misperception that the limit of detection (LOD) for
Sanger sequencing is somewhere between variant allele
frequencies (VAFs) of 15% to 25%1,2,3. Recent developments
have generated algorithmic methods to reduce this limit to 5% for
single nucleotide polymorphisms (SNPs)4. We have invented
algorithms to extend this work to detect and characterize
insertions and deletions (indels). It appears we can detect indels
down to 2.5% VAF. Standard Sanger sequencing protocols can
be used. The method can generate the familiar
electropherogram data display with noise substantially reduced.
Methods: The algorithm utilizes forward and reverse sequencing
reactions of both a control sample and the test sample to detect
whether or not an indel is present, and, if so, characterize the
indel. The algorithm was developed using DNA with indels in 33
different amplicons from 21 different genes and DNA with SNP
variants in 22 different amplicons from eight different genes.
These samples were from DNA reference standards or genomic
DNA or DNA extracted from formalin-fixed, paraffin-embedded
tissues. DNA was quantified using the RNase-P quantitative
polymerase chain reaction assay, and serially diluted. Allelic
proportions spanned 0.6125% to 50% for DNAs with no indels
and 2.5% to 50% for DNAs with indels. Samples were amplified
with AmpliTaq GoldTM 360 Master Mix, cycle sequenced with
BigDyeTM Terminator v3.1, and pre-processed using POP7TM
polymer on the 3500xl Series Genetic Analyzer from Applied
BiosystemsTM. For comparison, the indel data was also
processed using commercially available third party software that
surveys Sanger sequencing data for mutations.
Results: 121 samples had deletions having lengths of 1 to 48
base pairs (bp) and 82 samples had insertions having lengths of
1 to 6-bp. 1130 samples contained 0 to 16 SNP variants. For
indel detection, there were zero false positives and zero false
negatives. For indel characterization accuracy, type was 100%,
length 100%, location 93%, sequence 93%, and VAF 92% within
3% of the expected value. By contrast, the third party software
detected 58% of the indels and characterization accuracy for
type was 56%, length 53%, location 8%, and sequence 37%.
Conclusions: The results suggest the possibility of detecting
the presence of indels down to at least 2.5% using typical
Sanger sequencing protocols. A benefit of the approach is that
existing protocols for visually reviewing the results can continue
to be used and are enhanced because the algorithm generates
results in a familiar form for which the noise has been
substantially diminished. The algorithm may give Sanger
sequencing performance and/or economic advantages in some
molecular diagnostic applications that require finding minor
genetic variants.
Figure 1. Sanger sequencing result of a sample with a CGCC
insertion mutation after base 137 in half of the DNA molecules.
The goal is to disentangle these data to determine the type of indel
(deletion or insertion), its length, its location, its sequence, and the
proportion of DNA molecules having the indel (the variant allele
frequency, VAF). For the case shown in Figure 1 this means concluding
that the two molecules involved have the two sub-sequences shown in
Figure 2 and the one with the insertion has a VAF of 50%:
...AGAGAAGCCCGCC GGCTCTT... (sub sequence of normal DNA)
...AGAGAAGCCCGCCCGCCGGCTCTT... (sub sequence of mutated DNA)
Figure 2. Correct answer for disentangling the data of Figure 1 into
normal and mutant molecules; the insertion is shown in red.
Another complication is that the signals associated with the mutated DNA
may be so small that they are hidden within the noise underlying Sanger
sequencing data, see Figure 3:
Figure 3. A minor indel variant can be hidden in the noise underlying
Sanger sequencing; upper panel shows a 25% VAF variant, lower
panel shows the same variant except at a VAF of 2.5%.
Table 1 reports the accuracy of indel detection and characterization achieved by the
methods described in this communication. Table 2 reports values for the third party
software. False positive rate for the third party software was not measured (n/m)
because the software was set up with parameter values that would maximize its ability
to detect low level variants without regard for false positive detections. For VAF 50%
results, calculated VAF is assigned a value of 50% if it is within 40% to 60%.
Calculated VAF values are used as is for all other VAFs. The VAF tolerance reported
is that required for at least 90% of the samples to fall within that tolerance.
RESULTS
Figure 4 shows electropherograms generated by the algorithm.
Figure 4. The result of processing the data shown in Figure 3.
Note that in the lower panel the peaks of the 2.5% VAF indel
variant are revealed.
Table 1: Indel detection and characterization accuracy, overall and split by VAF
overall 50% VAF 25% VAF 12.5% VAF 6.25% VAF 5% VAF 2.5% VAF
Detection rate: 100% 100% 100% 100% 100% 100% 100%
False positive rate: 0% 0% 0% 0% 0% 0% 0%
Indel type: 100% 100% 100% 100% 100% 100% 100%
Length: 100% 100% 100% 100% 100% 100% 100%
Location, exact:
Within 6 bases:
93%
98%
100%
100%
100%
100%
97%
100%
90%
97%
91%
100%
79%
91%
Sequence: 93% 100% 100% 97% 90% 91% 79%
VAF:
Tolerance:
92%
3%
94%
1%
90%
6%
90%
3%
93%
3%
94%
3%
97%
2%
Table 2: Third party software indel detection and characterization accuracy
overall 50% VAF 25% VAF 12.5% VAF 6.25% VAF 5% VAF 2.5% VAF
Detection rate: 58% 100% 100% 93% 20% 15% 12%
False positive rate: n/m n/m n/m n/m n/m n/m n/m
Indel type: 56% 100% 100% 93% 13% 9% 9%
Length: 53% 97% 100% 90% 10% 3% 6%
Location, exact:
Within 6 bases:
8%
38%
24%
94%
24%
100%
3%
20%
0%
0%
0%
0%
0%
0%
Sequence: 37% 94% 91% 17% 3% 3% 0%
VAF: n/m n/m n/m n/m n/m n/m n/m

More Related Content

PDF
A computational framework for large-scale analysis of TCRβ immune repertoire ...
PDF
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
PPT
Creating custom gene panels for next-generation sequencing: optimization of 5...
PDF
A Next-Generation Sequencing Assay to Estimate Tumor Mutation Load at > 5% Al...
PDF
A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...
PDF
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
PDF
High Sensitivity Sanger Sequencing for Minor Variant Detection
PDF
Computational Methods for detection of somatic mutations at 0.1% frequency fr...
A computational framework for large-scale analysis of TCRβ immune repertoire ...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Creating custom gene panels for next-generation sequencing: optimization of 5...
A Next-Generation Sequencing Assay to Estimate Tumor Mutation Load at > 5% Al...
A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
High Sensitivity Sanger Sequencing for Minor Variant Detection
Computational Methods for detection of somatic mutations at 0.1% frequency fr...

What's hot (20)

PDF
Gene expression profile of the tumor microenvironment from 40 NSCLC FFPE and ...
PDF
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
PDF
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
PDF
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
PDF
Low Level Somatic Variant Detection by Sanger Sequencing of FFPE Samples for ...
PDF
Rare Mutation Analysis Using Digital PCR on QuantStudio™ 3D to Verify Ion Amp...
PDF
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
PDF
Defining the relevant genome in solid tumors
PDF
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
PPT
Advances in Breast Tumor Biomarker Discovery Methods
PDF
Successful detection of 40 COSMIC hotspot mutations at allelic frequency belo...
PDF
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
PDF
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
PDF
TaqMan dPCR Liquid Biopsy Assays targeting the TERT promoter region
PDF
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
PDF
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
PPT
High-throughput processing to maximize genomic analysis through simultaneous ...
PDF
CNV and aneuploidy detection by Ion semiconductor sequencing
PDF
Q biomarkersomaticmutation
PDF
Improved Algorithm for Amplicon Sequencing Assay Designs
Gene expression profile of the tumor microenvironment from 40 NSCLC FFPE and ...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
An NGS workflow to detect down to 0.1% allelic frequency in cfDNA
Multiplex TaqMan Assays for Rare Mutation Analysis Using Digital PCR
Low Level Somatic Variant Detection by Sanger Sequencing of FFPE Samples for ...
Rare Mutation Analysis Using Digital PCR on QuantStudio™ 3D to Verify Ion Amp...
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Defining the relevant genome in solid tumors
Orthogonal Verification of Oncomine cfDNA Data with Digital PCR Using TaqMan ...
Advances in Breast Tumor Biomarker Discovery Methods
Successful detection of 40 COSMIC hotspot mutations at allelic frequency belo...
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
TaqMan dPCR Liquid Biopsy Assays targeting the TERT promoter region
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples
High-throughput processing to maximize genomic analysis through simultaneous ...
CNV and aneuploidy detection by Ion semiconductor sequencing
Q biomarkersomaticmutation
Improved Algorithm for Amplicon Sequencing Assay Designs
Ad

Similar to High Sensitivity Sanger Sequencing for Minor Indel Detection and Characterization (20)

PPT
Genomica - Microarreglos de DNA
PPTX
Developing a framework for for detection of low frequency somatic genetic alt...
PPTX
Aug2015 analysis team 10 mason epigentics
PDF
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
PDF
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
DOCX
antiviral coursework
PDF
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
PPTX
To assess the effect of formalin on genomic DNA and assay performance for som...
PPT
testing123
PDF
Eshg poster roman-naranjo
PDF
Som aacr2011poster
PDF
Development of Quality Control Materials for Characterization of Comprehensiv...
PDF
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
PDF
Verifying the role of AID in Chronic Lymphocytic Leukemia
PDF
Validaternai
PDF
Sept2016 plenary mercer_sequins
PDF
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
PDF
Reference for long range pcr based ngs applications
PDF
High Sensitivity Detection of Tumor Gene Mutations-v3
PPTX
Single nucleotide polymorphisms (sn ps), haplotypes,
Genomica - Microarreglos de DNA
Developing a framework for for detection of low frequency somatic genetic alt...
Aug2015 analysis team 10 mason epigentics
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
antiviral coursework
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
To assess the effect of formalin on genomic DNA and assay performance for som...
testing123
Eshg poster roman-naranjo
Som aacr2011poster
Development of Quality Control Materials for Characterization of Comprehensiv...
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
Verifying the role of AID in Chronic Lymphocytic Leukemia
Validaternai
Sept2016 plenary mercer_sequins
Low level somatic variant detection by Sanger sequencing of formalin-fixed pa...
Reference for long range pcr based ngs applications
High Sensitivity Detection of Tumor Gene Mutations-v3
Single nucleotide polymorphisms (sn ps), haplotypes,
Ad

More from Thermo Fisher Scientific (20)

PDF
Why you would want a powerful hot-start DNA polymerase for your PCR
PDF
TCRB chain convergence in chronic cytomegalovirus infection and cancer
PDF
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
PDF
What can we learn from oncologists? A survey of molecular testing patterns
PDF
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
PDF
Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...
PDF
Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...
PDF
Liquid biopsy quality control – the importance of plasma quality, sample prep...
PDF
Streamlined next generation sequencing assay development using a highly multi...
PDF
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
PDF
A High Throughput System for Profiling Respiratory Tract Microbiota
PDF
A high-throughput approach for multi-omic testing for prostate cancer research
PDF
Why is selecting the right thermal cycler important?
PDF
A rapid library preparation method with custom assay designs for detection of...
PDF
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
PDF
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
PDF
Identifying novel and druggable targets in a triple negative breast cancer ce...
PDF
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
PDF
Analytical performance of a novel next generation sequencing assay for Myeloi...
PDF
Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...
Why you would want a powerful hot-start DNA polymerase for your PCR
TCRB chain convergence in chronic cytomegalovirus infection and cancer
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
What can we learn from oncologists? A survey of molecular testing patterns
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...
Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...
Liquid biopsy quality control – the importance of plasma quality, sample prep...
Streamlined next generation sequencing assay development using a highly multi...
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
A High Throughput System for Profiling Respiratory Tract Microbiota
A high-throughput approach for multi-omic testing for prostate cancer research
Why is selecting the right thermal cycler important?
A rapid library preparation method with custom assay designs for detection of...
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
Identifying novel and druggable targets in a triple negative breast cancer ce...
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
Analytical performance of a novel next generation sequencing assay for Myeloi...
Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...

Recently uploaded (20)

PDF
Wound infection.pdfWound infection.pdf123
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
Computional quantum chemistry study .ppt
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
Science Form five needed shit SCIENEce so
PPTX
Probability.pptx pearl lecture first year
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
PMR- PPT.pptx for students and doctors tt
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPT
LEC Synthetic Biology and its application.ppt
PDF
Packaging materials of fruits and vegetables
Wound infection.pdfWound infection.pdf123
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Computional quantum chemistry study .ppt
Presentation1 INTRODUCTION TO ENZYMES.pptx
Seminar Hypertension and Kidney diseases.pptx
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Science Form five needed shit SCIENEce so
Probability.pptx pearl lecture first year
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PMR- PPT.pptx for students and doctors tt
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
A powerpoint on colorectal cancer with brief background
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
LEC Synthetic Biology and its application.ppt
Packaging materials of fruits and vegetables

High Sensitivity Sanger Sequencing for Minor Indel Detection and Characterization

  • 1. Harrison Leong and Stephan Berosik ThermoFisher Scientific, Genetic Sciences Division, 200 Oyster Point Blvd., South San Francisco, CA, 94080 BACKGROUND Inserted or deleted genetic material has been associated with many forms of cancer5,6,7. For disease detection and treatment monitoring it is of interest to detect low levels of these mutations. To do this with Sanger sequencing technologies, a complication is that signals from the normal DNA sequence and mutated DNA are mixed together and the sequences are shifted relative to one another. Figure 1 shows an example of the resultant electropherogram caused by the insertion of CGCC in half of the DNA molecules in the sample. CONCLUSIONS In addition to the conclusions expressed in the abstract, a closer look at the results suggests that the VAF limit for detecting indels may be below 2.5%. The VAF limit for accurately characterizing indels may be around 10%. Additional work is needed to confirm these results with independent validation data that includes the 1% and 10% VAF data points. No a priori information about the indels were used by the algorithm so it is possible that results can be improved for panels of targets where a priori information is available. REFERENCES 1.Lin, M.T. et al. (2014), American Journal of Clinical Pathology, June 2014; 141:856-866. 2.Jancik, S. et al. (2012), Journal of Experimental & Clinical Cancer Research 2012; 31:79:1-13. 3.Tsiatis, A.C. et al. (2010), Journal of Molecular Diagnostics, July 2010; 12:4:425-432. 4.Leong, H. et al. (2016), AMP conference poster TT27. 5.Yang H. et al. (2010), BMC Medical Genetics, 11:128. 6.Lengar, P., (2012), Nucleic Acids Research, Vol. 40, No. 14 6401–6413. 7.Ye, K. et al., (2016), Nat Med., January ; 22(1): 97–104. doi:10.1038/nm.4002. High Sensitivity Sanger Sequencing for Minor Indel Detection and Characterization (Automated Detection and Characterization of Minor Insertion and Deletion Variants Using Sanger Sequencing) TT79 MATERIALS AND METHODS DATA FOR DEVELOPING THE METHOD The samples used and methods to process those samples is covered in the abstract. The algorithm requires a quartet of .ab1 files to perform its analysis: sequencing results for a test and control sample sequenced in the forward and reverse orientations. The following describes the quartets used to develop the algorithm: 1130 quartets with no indels present covered VAFs 0%, 0.6125%, 1%, 1.25%, 1.4%, 2%, 2.5%, 5%, 5.28%, 7.5%, 10%, 15%, 20%, 25%, 30%, and 50%. There were from 0 up to 16 SNPs in these quartets. Sequence lengths ranged from 125bp to 466bp. 203 quartets had indels and covered VAFs 2.5% (33), 5% (33), 6.25% (30), 7.5% (1), 12.5% (30), 15% (1), 25% (33), and 50% (33); the number of quartets for each VAF are in parenthesis. VAF was unknown for 9 of the quartets. There were 122 deletions from 1bp up to 48bp. There were 81 insertions from 1bp to 6bp. Sequences lengths ranged from 114bp to 590bp. DATA PROCESSING Data from control and test samples for both forward and reverse sequencing reactions were processed through the basecaller embedded within the data collection software that comes with Applied Biosystems™ 3500xl Genetic Analyzer. The resulting analyzed data, an example of which is shown in Figure 3, were processed by the indel algorithm to find whether or not an indel is present and, if so, to determine indel type, length, location, sequence, and VAF. ABSTRACT (This abstract differs from that submitted to AMP.) Introduction: Detecting minor genetic variants has become essential to cancer and infectious disease management. Many have turned to next generation sequencing to fill this need given the common misperception that the limit of detection (LOD) for Sanger sequencing is somewhere between variant allele frequencies (VAFs) of 15% to 25%1,2,3. Recent developments have generated algorithmic methods to reduce this limit to 5% for single nucleotide polymorphisms (SNPs)4. We have invented algorithms to extend this work to detect and characterize insertions and deletions (indels). It appears we can detect indels down to 2.5% VAF. Standard Sanger sequencing protocols can be used. The method can generate the familiar electropherogram data display with noise substantially reduced. Methods: The algorithm utilizes forward and reverse sequencing reactions of both a control sample and the test sample to detect whether or not an indel is present, and, if so, characterize the indel. The algorithm was developed using DNA with indels in 33 different amplicons from 21 different genes and DNA with SNP variants in 22 different amplicons from eight different genes. These samples were from DNA reference standards or genomic DNA or DNA extracted from formalin-fixed, paraffin-embedded tissues. DNA was quantified using the RNase-P quantitative polymerase chain reaction assay, and serially diluted. Allelic proportions spanned 0.6125% to 50% for DNAs with no indels and 2.5% to 50% for DNAs with indels. Samples were amplified with AmpliTaq GoldTM 360 Master Mix, cycle sequenced with BigDyeTM Terminator v3.1, and pre-processed using POP7TM polymer on the 3500xl Series Genetic Analyzer from Applied BiosystemsTM. For comparison, the indel data was also processed using commercially available third party software that surveys Sanger sequencing data for mutations. Results: 121 samples had deletions having lengths of 1 to 48 base pairs (bp) and 82 samples had insertions having lengths of 1 to 6-bp. 1130 samples contained 0 to 16 SNP variants. For indel detection, there were zero false positives and zero false negatives. For indel characterization accuracy, type was 100%, length 100%, location 93%, sequence 93%, and VAF 92% within 3% of the expected value. By contrast, the third party software detected 58% of the indels and characterization accuracy for type was 56%, length 53%, location 8%, and sequence 37%. Conclusions: The results suggest the possibility of detecting the presence of indels down to at least 2.5% using typical Sanger sequencing protocols. A benefit of the approach is that existing protocols for visually reviewing the results can continue to be used and are enhanced because the algorithm generates results in a familiar form for which the noise has been substantially diminished. The algorithm may give Sanger sequencing performance and/or economic advantages in some molecular diagnostic applications that require finding minor genetic variants. Figure 1. Sanger sequencing result of a sample with a CGCC insertion mutation after base 137 in half of the DNA molecules. The goal is to disentangle these data to determine the type of indel (deletion or insertion), its length, its location, its sequence, and the proportion of DNA molecules having the indel (the variant allele frequency, VAF). For the case shown in Figure 1 this means concluding that the two molecules involved have the two sub-sequences shown in Figure 2 and the one with the insertion has a VAF of 50%: ...AGAGAAGCCCGCC GGCTCTT... (sub sequence of normal DNA) ...AGAGAAGCCCGCCCGCCGGCTCTT... (sub sequence of mutated DNA) Figure 2. Correct answer for disentangling the data of Figure 1 into normal and mutant molecules; the insertion is shown in red. Another complication is that the signals associated with the mutated DNA may be so small that they are hidden within the noise underlying Sanger sequencing data, see Figure 3: Figure 3. A minor indel variant can be hidden in the noise underlying Sanger sequencing; upper panel shows a 25% VAF variant, lower panel shows the same variant except at a VAF of 2.5%. Table 1 reports the accuracy of indel detection and characterization achieved by the methods described in this communication. Table 2 reports values for the third party software. False positive rate for the third party software was not measured (n/m) because the software was set up with parameter values that would maximize its ability to detect low level variants without regard for false positive detections. For VAF 50% results, calculated VAF is assigned a value of 50% if it is within 40% to 60%. Calculated VAF values are used as is for all other VAFs. The VAF tolerance reported is that required for at least 90% of the samples to fall within that tolerance. RESULTS Figure 4 shows electropherograms generated by the algorithm. Figure 4. The result of processing the data shown in Figure 3. Note that in the lower panel the peaks of the 2.5% VAF indel variant are revealed. Table 1: Indel detection and characterization accuracy, overall and split by VAF overall 50% VAF 25% VAF 12.5% VAF 6.25% VAF 5% VAF 2.5% VAF Detection rate: 100% 100% 100% 100% 100% 100% 100% False positive rate: 0% 0% 0% 0% 0% 0% 0% Indel type: 100% 100% 100% 100% 100% 100% 100% Length: 100% 100% 100% 100% 100% 100% 100% Location, exact: Within 6 bases: 93% 98% 100% 100% 100% 100% 97% 100% 90% 97% 91% 100% 79% 91% Sequence: 93% 100% 100% 97% 90% 91% 79% VAF: Tolerance: 92% 3% 94% 1% 90% 6% 90% 3% 93% 3% 94% 3% 97% 2% Table 2: Third party software indel detection and characterization accuracy overall 50% VAF 25% VAF 12.5% VAF 6.25% VAF 5% VAF 2.5% VAF Detection rate: 58% 100% 100% 93% 20% 15% 12% False positive rate: n/m n/m n/m n/m n/m n/m n/m Indel type: 56% 100% 100% 93% 13% 9% 9% Length: 53% 97% 100% 90% 10% 3% 6% Location, exact: Within 6 bases: 8% 38% 24% 94% 24% 100% 3% 20% 0% 0% 0% 0% 0% 0% Sequence: 37% 94% 91% 17% 3% 3% 0% VAF: n/m n/m n/m n/m n/m n/m n/m