SlideShare a Scribd company logo
The IMPACT of INDEL realignment: Detecting insertions and deletions longer than 30 base pairs with ABRA 
Kirk Thaker1, Ronak Shah2, Michael Berger2 
1Riverdale Country School, Bronx, NY, 2Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 
ABSTRACT 
Background 
Cancer is a disease of the genome –most of its forms result from a buildup of genetic alterations that, directly or indirectly, allow the patient’s cells to proliferate without restraint. For decades, identifying and targeting cancer mutations for treatment was impractical due to the limitations of sequencing technology. However, the rise of high-throughput next- generation sequencing (NGS) tools has allowed researchers to rapidly and cheaply sequence large, targeted regions of DNA. MSK-IMPACT(Memorial Sloan Kettering-IntegratedMutation Profiling of Actionable Cancer Targets), a sequencing platform with an associated computational pipeline, takes advantage of improvements in sequencing technology to analyzetumor specimensfor clinically actionable variants in341 cancer- associatedgenes.Criticalto IMPACT’s efficacy is the detection of somatic DNAalterationslike INDELs, which are insertions or deletions of nucleotides. Current sequence aligners have difficulty accuratelymapping reads (short, overlapping DNA sequences) containing morethan a single base change, let alone reads containing INDELs. This flaw necessitates the use of INDEL realigners, whichrearrange reads inregions where INDELs might exist in order to identify them more easily. Currently, the INDEL realignment software associated withMSK- IMPACT’scomputational pipeline, the Genome Analysis Toolkit’s IndelRealigner (GATK), canonly efficiently resolveINDELsshorter than 30 base pairs, which limits theplatform’sreliability forINDELdetection. Thus, wetested and compared the performance of a new INDEL realigner called ABRA (Assembly BasedRe-Aligner) to that of GATK’s IndelRealigner. 
Objectives 
1.To resolve poorly aligned genomic regions caused by occurrence ofINDELsand repeat sequences. 
2.To improve INDEL detection performancewith emphasis on both finding INDELslonger than 30 bp and on improving the accuracy of each INDELs variant frequency. 
CONCLUSIONS 
METHODS 
RESULTS 
•ABRA increased our confidence in already existing variant calls by increasing the variant frequency of the alternate allele. 
•ABRA detected INDELs longer than 30 base pairs, especially in regions that previously exhibited “messy” or unclear read alignments. 
•INDELs, unless they occur in multiples of 3, often negatively impact the structure of proteins they code -therefore, not identifying INDELs presents an obstacle in the creation of personalized cancer medicine. 
Processing BAMs: 
#of samples 
SNVs gained 
SNVs dropped 
INDELs gained 
INDELs dropped 
Total gained 
Total dropped 
151 
1 
1 
12 
2 
13 
3 
ABRA increases supporting evidence for already-existing INDEL calls 
ACKNOWLEDGEMENTS 
ABRA detects INDELs longer than 30 base pairs where GATK cannot 
I would like to thank Ronak Shah and Dr. Michael Berger for all of their instruction, support, and advice in making this project. 
Targeted sequencing 
Computational pipeline 
Prepare 24-48 libraries 
Probes for 340 cancer genes 
Sequence to 500- 1000X (HiSeq 2500) 
Hybridize & select 
(NimbleGen SeqCap: MSK-IMPACT Assay) 
Align to genome & analyze 
Raw data 
Mapping 
Processing BAMs 
Metrics calculations 
Variant calling, filtering, and genotyping 
Variant annotation and filtering 
Manual analysis 
Mark duplicates (PICARD tools) 
INDEL realignment 
Base quality recalibration 
GATK’s IndelRealigner 
ABRA 
Figure 1: GATK’s IndelRealigner attempts to minimize the number of mismatches, preferring deletions and insertions over individual SNPs 
Figure 2: ABRA creates a de Bruijn graph of k-mers(sequences) of variable lengths and maps back locally assembled reads using BWA-MEM. 
Table 1: After realignment with ABRA and GATK on a common set of mutations, we found that ABRA increased the variant frequency of 13 of those events, letting them pass our filters and be called as significant by the pipeline. Although GATK had already detected those events, ABRA increased our confidence in those calls, to the point where we could consider them meaningful. 
Figure 4: For the 10 pools (151 samples) above, ABRA (brown) consistently increased the variant frequency of INDELs also found by GATK’s IndelRealigner (green). 
Tumor 
VF=21% 
Normal 
VF=0% 
Tumor 
VF=4.3% 
Normal 
VF=0% 
ABRA 
GATK 
Figure 6: Above, a 21 bp insertion detected by both ABRA and GATK in the MAP3K1 gene. Here we clearly see that ABRA’s alignment exhibits a tumor variant frequency that is almost 5 times higher than that of GATK’s IndelRealigner. 
0 
0.05 
0.1 
0.15 
0.2 
0.25 
0.3 
0.35 
0.4 
16D 
61 
62 
63 
64 
65 
66 
67 
69 
70 
Variant frequency (decimal) 
Pool name 
Variant frequency of common INDELs between GATK and ABRA 
GATK VF 
ABRA VF 
Figure 8: A deletion and another mutation event called by GATK in the BARD1 gene was resolved far more cleanly by ABRA into a single deletion event with a separate SNV farther away. 
ABRA 
GATK 
Figure 9: Here, ABRA is able to detect a significant exon 11 insertion (45bp) and deletion (42bp) in the KIT gene, which is usually relevant for patients with gastrointestinal cancer. 
Tumor 
VF=41% 
Normal 
VF=40% 
Tumor 
VF=10% 
Normal 
VF=0% 
Figure 7: A 41 bp deletion called as a true positive with GATK because of no presence in the normal is found to be a sequencing artifact after applying ABRA as a realigner. 
ABRA 
GATK 
ABRA 
GATK 
ABRA resolves poorly aligned regions 
Tumor VF=25% 
Normal VF=0% 
Tumor VF=0% 
Normal VF=0% 
ABRA 
GATK 
Tumor 
VF=22% 
Normal 
VF=0% 
Tumor 
VF=0% 
Normal 
VF=0% 
Tumor 
VF=34% 
Normal 
VF=0% 
Tumor 
VF=0% 
Normal 
VF=0% 
ABRA presents a more parsimonious alignment of the sequencing data. 
ABRA 
GATK 
Figure 10: ABRA is able to resolve a large deletion (>100bp) in the HRAS gene, which can significantly impact patient treatment and prognosis in bladder cancer. 
Tumor VF=74% 
Normal VF=0% 
Tumor VF=0% 
Normal VF=0% 
local multiple sequence alignment 
The importance of IMPACT is twofold: it allows oncologists to better understand their patient’s disease and decide upon treatment and researchers can use it to retroactively analyze tumor specimens for common mutations

More Related Content

PPTX
Developing a framework for for detection of low frequency somatic genetic alt...
PDF
Comparison of LUMPY vs. DELLY for structural variant detection
PPTX
Detecting clinically actionable somatic structural aberrations from targeted ...
PDF
Array cgh ftnw
PPTX
Raj Lab Meeting May/01/2019
PPT
Clinical Assessment In Incorporating a Personal Genome
PDF
Applications of Gene Editing: CRISPR-Cas9 in Cancer Therapeutics (Oncogenes)
PDF
zandona14nipsA0
Developing a framework for for detection of low frequency somatic genetic alt...
Comparison of LUMPY vs. DELLY for structural variant detection
Detecting clinically actionable somatic structural aberrations from targeted ...
Array cgh ftnw
Raj Lab Meeting May/01/2019
Clinical Assessment In Incorporating a Personal Genome
Applications of Gene Editing: CRISPR-Cas9 in Cancer Therapeutics (Oncogenes)
zandona14nipsA0

What's hot (20)

PPTX
Microarray and its application
PDF
Molecular quantitative genetics for plant breeding roundtable 2010x
PDF
PURE poster
PPTX
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
PDF
Mapping and QTL
 
PDF
Single Nucleotide Polymorphism Analysis (SNPs)
PDF
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
PPTX
FunGen JC Presentation - Mostafavi et al. (2019)
PPTX
Candidate Gene Approach in Crop Improvement
PDF
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
PPTX
Next generation sequencing in pharmacogenomics
PPTX
Applications of microarray
PPTX
Comparative Genomic Hybridization
PDF
Published-PageOne
PDF
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
PPTX
Next Generation Sequencing application in virology
PPTX
NGS in cancer treatment
PPTX
Genotyping in Breeding programs
PPTX
Association mapping
PPTX
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Microarray and its application
Molecular quantitative genetics for plant breeding roundtable 2010x
PURE poster
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
Mapping and QTL
 
Single Nucleotide Polymorphism Analysis (SNPs)
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
FunGen JC Presentation - Mostafavi et al. (2019)
Candidate Gene Approach in Crop Improvement
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
Next generation sequencing in pharmacogenomics
Applications of microarray
Comparative Genomic Hybridization
Published-PageOne
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
Next Generation Sequencing application in virology
NGS in cancer treatment
Genotyping in Breeding programs
Association mapping
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Ad

Similar to The IMPACT of INDEL realignment: Detecting insertions and deletions longer than 30 base pairs with ABRA (10)

PPTX
Jan2015 using the pilot genome rm for clinical validation steve lincoln
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 2
PDF
el text.life science6.tanishima191030
PDF
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
PDF
human_mutation_article
PPTX
150219 agbt giab_poster_marc
PDF
Population-Based DNA Variant Analysis
PDF
Ngs part iii 2013
PDF
Pattern Recognition in Clinical Data
PDF
Pattern Recognition in clinical data
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Variant (SNPs/Indels) calling in DNA sequences, Part 2
el text.life science6.tanishima191030
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
human_mutation_article
150219 agbt giab_poster_marc
Population-Based DNA Variant Analysis
Ngs part iii 2013
Pattern Recognition in Clinical Data
Pattern Recognition in clinical data
Ad

Recently uploaded (20)

PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
2Systematics of Living Organisms t-.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
2. Earth - The Living Planet earth and life
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPT
protein biochemistry.ppt for university classes
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
BIOMOLECULES PPT........................
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Phytochemical Investigation of Miliusa longipes.pdf
POSITIONING IN OPERATION THEATRE ROOM.ppt
TOTAL hIP ARTHROPLASTY Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
INTRODUCTION TO EVS | Concept of sustainability
. Radiology Case Scenariosssssssssssssss
2Systematics of Living Organisms t-.pptx
The scientific heritage No 166 (166) (2025)
2. Earth - The Living Planet earth and life
2. Earth - The Living Planet Module 2ELS
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
protein biochemistry.ppt for university classes
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
BIOMOLECULES PPT........................
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

The IMPACT of INDEL realignment: Detecting insertions and deletions longer than 30 base pairs with ABRA

  • 1. The IMPACT of INDEL realignment: Detecting insertions and deletions longer than 30 base pairs with ABRA Kirk Thaker1, Ronak Shah2, Michael Berger2 1Riverdale Country School, Bronx, NY, 2Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY ABSTRACT Background Cancer is a disease of the genome –most of its forms result from a buildup of genetic alterations that, directly or indirectly, allow the patient’s cells to proliferate without restraint. For decades, identifying and targeting cancer mutations for treatment was impractical due to the limitations of sequencing technology. However, the rise of high-throughput next- generation sequencing (NGS) tools has allowed researchers to rapidly and cheaply sequence large, targeted regions of DNA. MSK-IMPACT(Memorial Sloan Kettering-IntegratedMutation Profiling of Actionable Cancer Targets), a sequencing platform with an associated computational pipeline, takes advantage of improvements in sequencing technology to analyzetumor specimensfor clinically actionable variants in341 cancer- associatedgenes.Criticalto IMPACT’s efficacy is the detection of somatic DNAalterationslike INDELs, which are insertions or deletions of nucleotides. Current sequence aligners have difficulty accuratelymapping reads (short, overlapping DNA sequences) containing morethan a single base change, let alone reads containing INDELs. This flaw necessitates the use of INDEL realigners, whichrearrange reads inregions where INDELs might exist in order to identify them more easily. Currently, the INDEL realignment software associated withMSK- IMPACT’scomputational pipeline, the Genome Analysis Toolkit’s IndelRealigner (GATK), canonly efficiently resolveINDELsshorter than 30 base pairs, which limits theplatform’sreliability forINDELdetection. Thus, wetested and compared the performance of a new INDEL realigner called ABRA (Assembly BasedRe-Aligner) to that of GATK’s IndelRealigner. Objectives 1.To resolve poorly aligned genomic regions caused by occurrence ofINDELsand repeat sequences. 2.To improve INDEL detection performancewith emphasis on both finding INDELslonger than 30 bp and on improving the accuracy of each INDELs variant frequency. CONCLUSIONS METHODS RESULTS •ABRA increased our confidence in already existing variant calls by increasing the variant frequency of the alternate allele. •ABRA detected INDELs longer than 30 base pairs, especially in regions that previously exhibited “messy” or unclear read alignments. •INDELs, unless they occur in multiples of 3, often negatively impact the structure of proteins they code -therefore, not identifying INDELs presents an obstacle in the creation of personalized cancer medicine. Processing BAMs: #of samples SNVs gained SNVs dropped INDELs gained INDELs dropped Total gained Total dropped 151 1 1 12 2 13 3 ABRA increases supporting evidence for already-existing INDEL calls ACKNOWLEDGEMENTS ABRA detects INDELs longer than 30 base pairs where GATK cannot I would like to thank Ronak Shah and Dr. Michael Berger for all of their instruction, support, and advice in making this project. Targeted sequencing Computational pipeline Prepare 24-48 libraries Probes for 340 cancer genes Sequence to 500- 1000X (HiSeq 2500) Hybridize & select (NimbleGen SeqCap: MSK-IMPACT Assay) Align to genome & analyze Raw data Mapping Processing BAMs Metrics calculations Variant calling, filtering, and genotyping Variant annotation and filtering Manual analysis Mark duplicates (PICARD tools) INDEL realignment Base quality recalibration GATK’s IndelRealigner ABRA Figure 1: GATK’s IndelRealigner attempts to minimize the number of mismatches, preferring deletions and insertions over individual SNPs Figure 2: ABRA creates a de Bruijn graph of k-mers(sequences) of variable lengths and maps back locally assembled reads using BWA-MEM. Table 1: After realignment with ABRA and GATK on a common set of mutations, we found that ABRA increased the variant frequency of 13 of those events, letting them pass our filters and be called as significant by the pipeline. Although GATK had already detected those events, ABRA increased our confidence in those calls, to the point where we could consider them meaningful. Figure 4: For the 10 pools (151 samples) above, ABRA (brown) consistently increased the variant frequency of INDELs also found by GATK’s IndelRealigner (green). Tumor VF=21% Normal VF=0% Tumor VF=4.3% Normal VF=0% ABRA GATK Figure 6: Above, a 21 bp insertion detected by both ABRA and GATK in the MAP3K1 gene. Here we clearly see that ABRA’s alignment exhibits a tumor variant frequency that is almost 5 times higher than that of GATK’s IndelRealigner. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 16D 61 62 63 64 65 66 67 69 70 Variant frequency (decimal) Pool name Variant frequency of common INDELs between GATK and ABRA GATK VF ABRA VF Figure 8: A deletion and another mutation event called by GATK in the BARD1 gene was resolved far more cleanly by ABRA into a single deletion event with a separate SNV farther away. ABRA GATK Figure 9: Here, ABRA is able to detect a significant exon 11 insertion (45bp) and deletion (42bp) in the KIT gene, which is usually relevant for patients with gastrointestinal cancer. Tumor VF=41% Normal VF=40% Tumor VF=10% Normal VF=0% Figure 7: A 41 bp deletion called as a true positive with GATK because of no presence in the normal is found to be a sequencing artifact after applying ABRA as a realigner. ABRA GATK ABRA GATK ABRA resolves poorly aligned regions Tumor VF=25% Normal VF=0% Tumor VF=0% Normal VF=0% ABRA GATK Tumor VF=22% Normal VF=0% Tumor VF=0% Normal VF=0% Tumor VF=34% Normal VF=0% Tumor VF=0% Normal VF=0% ABRA presents a more parsimonious alignment of the sequencing data. ABRA GATK Figure 10: ABRA is able to resolve a large deletion (>100bp) in the HRAS gene, which can significantly impact patient treatment and prognosis in bladder cancer. Tumor VF=74% Normal VF=0% Tumor VF=0% Normal VF=0% local multiple sequence alignment The importance of IMPACT is twofold: it allows oncologists to better understand their patient’s disease and decide upon treatment and researchers can use it to retroactively analyze tumor specimens for common mutations