SlideShare a Scribd company logo
• Xin-zhuan Su
• Sittiporn Pattaradilokrat
• Sethu Nair
• Yanwei Qi
• Gordon Bullen
NIH/ NIAID – Malaria
Functional Genomics Section • Sebastian Gurevich
McGill University
Funding:
National Institutes of Health
Canadian Institutes of Health Research
• Philip Awadalla
University of Montreal
https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014
zmartine@gmail.com
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
Martine Zilversmit
http://www.slideshare.net/zmartine1/com-par-25jun14
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
https://github.com/parasite-genomics/Pipelines
• BASH-scripted
pipelines
• Accurate variant
prediction
– SNPs
– Small indels
– Large indels
(>17bp)
– Focused regions of
extreme divergence
(35-70% amino acid
identity)
• In silico variant
validation
Parameters:
- Quality Metric and Cutoff
- Number of variants per cluster
- Maximum distance between variants within a cluster
- Maximum distance between smaller clusters to merge
into an HDR
Finding Highly Divergent Regions – HDR Program
VCF File
False Positive
Variants
True Positive
Variants
HDR File:
- Size of HDR
- Position of HDR
- Variants Contained
Python - Stand-alone interactive or pipelined
NumberofVariants
Position on “Chromosome”
Dye-Terminator Sequenced Variation – 50 basepair Sliding window
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
Predicted Variants – No filtering Based on Quality Metrics
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30 Cutoff
Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
Quality ≥ 30 Variants without Consensus Quality ≥ -100
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30, No HDRs
Characteristics of Highly Divergent Regions
33X 44.4%
By265 55.6%
N67 66.7%
histone acetyltransferase GCN5, putative (GCN5)
RNA-binding protein NOB1, putative
Percent Identity
DNA repair protein, putative
33X 41.4%
By265 79.3%
N67 51.7%
Characteristics of Highly Divergent Regions

More Related Content

PPTX
Com par 25jun14
PPTX
I evobio zilversmit_25jun14
PDF
Utilization of NGS data and genomic selection to rescue an endangered and her...
PPTX
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
PDF
Proof of concept of WGS based surveillance: meningococcal disease
PPTX
Establishing validity, reproducibility, and utility of highly scalable geneti...
PDF
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
PPTX
Krista's Presentation at the 2019 SFAF Meeting
Com par 25jun14
I evobio zilversmit_25jun14
Utilization of NGS data and genomic selection to rescue an endangered and her...
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Proof of concept of WGS based surveillance: meningococcal disease
Establishing validity, reproducibility, and utility of highly scalable geneti...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Krista's Presentation at the 2019 SFAF Meeting

What's hot (20)

PDF
Annotation capabilities
PDF
An Exploration of Clinical Workflows in VarSeq
PPTX
Evaluating Oncogenicity in VSClinical
PPTX
Open zika presentation
PDF
Next-Generation Sequencing Commercial Milestones Infographic
PDF
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
PDF
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
DOC
Resume_Bill_Martinez
PDF
Genome Sequencing: FAO's relevant activities in Animal Health
 
PPTX
2015 06-12-beiko-irida-big data
PDF
Pizza club - May 2016 - Shaman
PDF
Oncogenicity Scoring in VSClinical
PDF
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
PPTX
Ashg sedlazeck grc_share
PDF
LAMPARAH - LJ_Manceras
PDF
academic / small company collaborations for rare and neglected diseasesv2
PDF
2nd CRISPR Congress Boston, 23-25 February 2016
PPTX
Using Public Access Clinical Databases to Interpret NGS Variants
PDF
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
PPTX
Optimizing the Output of Your Molecular Pathology Laboratory
Annotation capabilities
An Exploration of Clinical Workflows in VarSeq
Evaluating Oncogenicity in VSClinical
Open zika presentation
Next-Generation Sequencing Commercial Milestones Infographic
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Resume_Bill_Martinez
Genome Sequencing: FAO's relevant activities in Animal Health
 
2015 06-12-beiko-irida-big data
Pizza club - May 2016 - Shaman
Oncogenicity Scoring in VSClinical
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Ashg sedlazeck grc_share
LAMPARAH - LJ_Manceras
academic / small company collaborations for rare and neglected diseasesv2
2nd CRISPR Congress Boston, 23-25 February 2016
Using Public Access Clinical Databases to Interpret NGS Variants
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Optimizing the Output of Your Molecular Pathology Laboratory
Ad

Viewers also liked (16)

PPTX
Etilos de-comu
PPTX
PPTX
Herramientas web
PPTX
Presentación 2 publicidad en internet Comercio Electrónico
PDF
Men's retreat template 2016
DOC
10 πιθανοί λόγοι για τους οποίους δεν μπορείτε να φτάσετε στο στόχο σας
PDF
Letter of Reference - Mayor Anna Peterson - Maureen A. Kenney - August 2016
PDF
Eleftherios Darousis_certificate
PPTX
Confiep Peru 2015
PPT
Trabajo pintura alfaltica
DOC
Collaspe of The AA Middle Class
PPTX
Hitler
PPT
PBM. MANEJO DE LA ANEMIA PREOPERATORIA. Dr García Erce. Roma 2015
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PPT
Titlefontidea
PPTX
Etilos de-comu
Herramientas web
Presentación 2 publicidad en internet Comercio Electrónico
Men's retreat template 2016
10 πιθανοί λόγοι για τους οποίους δεν μπορείτε να φτάσετε στο στόχο σας
Letter of Reference - Mayor Anna Peterson - Maureen A. Kenney - August 2016
Eleftherios Darousis_certificate
Confiep Peru 2015
Trabajo pintura alfaltica
Collaspe of The AA Middle Class
Hitler
PBM. MANEJO DE LA ANEMIA PREOPERATORIA. Dr García Erce. Roma 2015
Combining density functional theory calculations, supercomputing, and data-dr...
Titlefontidea
Ad

Similar to Compar (20)

PPTX
05 costa
PPTX
16S MVRSION at Washington University
PPTX
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
PDF
Mohit Patel -- Disruptive Diner: Nano Possibilities
PDF
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
PPTX
Giab for jax long read 190917
PDF
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
PDF
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
PPTX
160627 giab for festival sv workshop
PPTX
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
PPTX
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
PPTX
GIAB for AMP GeT-RM Forum
PDF
Artificial Intelligence in Radiation Oncology
PDF
Oncogenomics 2013
PDF
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
PDF
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
PPTX
Achieve Complete Coverage of the SARS-CoV-2 Genome
PPTX
GIAB update for GRC GIAB workshop 191015
PPTX
Artificial Intelligence in Radiation Oncology
PPTX
2013 Cornell's Plant Breeding and Genetic Seminar Series
05 costa
16S MVRSION at Washington University
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
Mohit Patel -- Disruptive Diner: Nano Possibilities
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Giab for jax long read 190917
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
160627 giab for festival sv workshop
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
GIAB for AMP GeT-RM Forum
Artificial Intelligence in Radiation Oncology
Oncogenomics 2013
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
Achieve Complete Coverage of the SARS-CoV-2 Genome
GIAB update for GRC GIAB workshop 191015
Artificial Intelligence in Radiation Oncology
2013 Cornell's Plant Breeding and Genetic Seminar Series

Recently uploaded (20)

PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
STKI Israel Market Study 2025 version august
PDF
Architecture types and enterprise applications.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPT
What is a Computer? Input Devices /output devices
PDF
Five Habits of High-Impact Board Members
PDF
Getting Started with Data Integration: FME Form 101
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
Developing a website for English-speaking practice to English as a foreign la...
O2C Customer Invoices to Receipt V15A.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
WOOl fibre morphology and structure.pdf for textiles
STKI Israel Market Study 2025 version august
Architecture types and enterprise applications.pdf
sustainability-14-14877-v2.pddhzftheheeeee
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
CloudStack 4.21: First Look Webinar slides
A comparative study of natural language inference in Swahili using monolingua...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A novel scalable deep ensemble learning framework for big data classification...
What is a Computer? Input Devices /output devices
Five Habits of High-Impact Board Members
Getting Started with Data Integration: FME Form 101
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Geologic Time for studying geology for geologist
Taming the Chaos: How to Turn Unstructured Data into Decisions

Compar

  • 1. • Xin-zhuan Su • Sittiporn Pattaradilokrat • Sethu Nair • Yanwei Qi • Gordon Bullen NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich McGill University Funding: National Institutes of Health Canadian Institutes of Health Research • Philip Awadalla University of Montreal https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014 zmartine@gmail.com ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines Martine Zilversmit http://www.slideshare.net/zmartine1/com-par-25jun14
  • 2. ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines https://github.com/parasite-genomics/Pipelines • BASH-scripted pipelines • Accurate variant prediction – SNPs – Small indels – Large indels (>17bp) – Focused regions of extreme divergence (35-70% amino acid identity) • In silico variant validation
  • 3. Parameters: - Quality Metric and Cutoff - Number of variants per cluster - Maximum distance between variants within a cluster - Maximum distance between smaller clusters to merge into an HDR Finding Highly Divergent Regions – HDR Program VCF File False Positive Variants True Positive Variants HDR File: - Size of HDR - Position of HDR - Variants Contained Python - Stand-alone interactive or pipelined
  • 4. NumberofVariants Position on “Chromosome” Dye-Terminator Sequenced Variation – 50 basepair Sliding window Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12
  • 5. Predicted Variants – No filtering Based on Quality Metrics NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results
  • 6. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30 Cutoff Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
  • 7. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants FQ −100 Cuttoff
  • 8. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff
  • 9. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff Quality ≥ 30 Variants without Consensus Quality ≥ -100 Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30, No HDRs
  • 10. Characteristics of Highly Divergent Regions 33X 44.4% By265 55.6% N67 66.7% histone acetyltransferase GCN5, putative (GCN5) RNA-binding protein NOB1, putative Percent Identity DNA repair protein, putative 33X 41.4% By265 79.3% N67 51.7%
  • 11. Characteristics of Highly Divergent Regions

Editor's Notes

  • #3: MAPPING, DEFINE! DE NOVO, DEFINE! SAY WHAT VARIANTS ARE
  • #5: Define highly divergent regions
  • #6: Define highly divergent regions
  • #7: Define highly divergent regions
  • #8: Define highly divergent regions
  • #9: Define highly divergent regions
  • #10: Define highly divergent regions