SlideShare a Scribd company logo
Gabe Rudy | VP of Product &
Engineering
Using the GRCh38 reference assembly for
clinical interpretation in VSClinical
NIH Grant Funding Acknowledgments
 Research reported in this publication was supported by the National Institute Of General
Medical Sciences of the National Institutes of Health under:
- Award Number R43GM128485
- Award Number 2R44 GM125432-01
- Award Number 2R44 GM125432-02
 PI is Dr. Andreas Scherer, CEO Golden Helix.
 The content is solely the responsibility of the authors and does not necessarily represent the
official views of the National Institutes of Health.
Q & A
Please enter your questions into your GoToWebinar Panel
Golden Helix – Who We Are
Golden Helix is a global
bioinformatics company founded
in 1998, celebrating our 20th year!
CNV Analysis
GWAS
Genomic Prediction
Large-N-Population Studies
RNA-Seq
Large-N CNV-Analysis
Variant Warehouse
Centralized Annotations
Hosted Reports
Sharing and Integration
Variant Calling
Filtering and Annotation
Variant Interpretation
Clinical Reports
CNV Analysis
Pipeline: Run Workflows
Cited in over 1,300 peer-reviewed publications
Over 350 customers globally
Golden Helix – Who We Are
When you choose a Golden Helix solution, you get more than just
software
 REPUTATION
 TRUST
 EXPERIENCE
 INDUSTRY FOCUS
 THOUGHT
LEADERSHIP
 COMMUNITY
 TRAINING
 SUPPORT
 RESPONSIVENES
S
 INNOVATION and
SPEED
 CUSTOMIZATIONS
Agenda
Genetic Testing with NGS
Variant Representation
Human Reference Genomes
Implications for Variant Interpretation
Demo using VarSeq + VSClinical
Motivation for Using GRCh38
Other Lab Considerations
Thanks! / Q&A
NGS Genetics Testing Process
Sample Prep Sequencing Align & Call Annotate
& Filter
Variant
Interpretation
Report
Sentieon
& VS-CNV
VarSeq VSReportsVSClinical
Golden Helix Clinical Suite
VSWarehouse
Aggregate Variants, Reports, Knowledgebase
Representing a Variant
 Genomic:
- chr2: 47,641,560 A/T
- NC_000002.11:g.47641560A>T
- chr14: 51,378,590 TT/T
- NC_000014.8 :g.51378593delT
 Gene Coding Sequence:
- BRAF c.1799T>A
- NM_058197.4:c.105dupG
- LRG_218t1(MSH2):c.942+3A>T
 Gene Protein Sequence
- DYX1C1 p.E417*
- NP_000483.3:p.Phe508del
 Genomic Representation Enables
- Precise lookup of annotations
- Overlap / relation to genomic features
- Representation of non-genic variants
 Coding Representation Enables
- Genomic reference independent
- UTR and Intronic variants
- Informative representation of coding change
 Protein Representation Enables
- Grouping of variants that result in same protein
- Descriptive of effect on protein
- Coordinates match domains and protein DBs
Genomes Are Just a Means to an End (Genes)
 RefSeqGenes – mRNA sequence archive, with mappings to genomes
- Provided mappings to Locus Reference Gene (LRG) database
- Use genome mappings by NCBI (through genome annotation builds). NOT UCSC
- “Clinically Relevant” transcript in VarSeq:
- Most commonly submitted to ClinVar
- LRG if available, longest if tied
 Ensembl – defined directly against the human genome
- More inclusive of genes discovered with high-throughput methods
- Gencode subset – similar to RefSeqGenes in size / definition
 Each have unique Accessions and Version Numbers
- Newer releases are provided only on GRCh38
- GRCh37 mappings not being updated (“105 Interim” by special request)
Variant Representation and the Reference Genome
 NC_000015.10:g.48428426C>T (GRCh38)
 NC_000015.9:g.48720623C>T (GRCh37)
 NM_000138.4:c.6917G>A
 NP_000129.3:p.Arg2306His
 NG_008805.2:g.222363G>A
 LRG_778t1:c.6917G>A
 LRG_778p1:p.Arg2306His
 LRG_778:g.222363G>A
History of the Human Reference Genome
 2003: Human Genome Project Declared Done
 2006: NCBI36 (hg18)
- Produced by the International Human Genome Sequencing Consortium
- Used by first high throughput sequencers (Illumina GAII), pilot project of 1000 Genomes
- UCSC uses its own sequential versioning, calling this hg18
 2009: GRCh37 (hg19)
- Handed over to the Genome Reference Consortium (GRCh)
- Used by the 1000 genome project (Phase I/II/III) in the era of the HiSeq 2000
 2013: GRCh38 (hg38)
- ~100 assembly gaps updated, ~2000 erroneous alleles fixed
- Included centromere models, mitochondrial reference, alternate sequences
Alternative Loci / “Haplotypes”
GRCh37 had 9 “Alt Ref Loci”
GRCh38 has 35 “Alt Ref Loci”
3.6 Mb novel sequence
153 genes
Up to 25% of these genes hare medically interpretable
Alignment support
Before using, ensure aligner can support alt loci without
flagging “multi-alignment” codes that cause reads to be
filtered out / lost. BWA-MEM supports alt loci.
More than Chromosomes in your FASTA
 Other bits of the reference:
- Un-localized scaffolds assigned to chromosomes
- Unplaced scaffolds (not assigned to chromosomes
- Patches Releases (i.e. GRCh37.p13, GRCh38.p12)
- Types of “alt”, “fix” or “novel”
- Not applied, and do not change the primary sequence
- You can think of them as “known issues, with proposed fixes for next major
release”
 Other useful things to add for alignment purposes:
- A “decoy” reference genome segment as primary reference
- DNA virus: human herpesvirus 4, type 1, aka Epstein-Barr virus (EBV)
- Unique sequence found in HuRef (Craig Venter’s genome) or de novo assemblies
- Other novel unaccounted for (or “novel” patch) sequence
- Full set of HLA “haplotype” sequences, marked as “alternates”
 Mitochondrial!
The Human Mitochondrial
 Our second genome:
- Only 16Kb long
- Encodes 37 genes (product of energy and its storage in ATP)
- Slightly different genetic code than nuclear genes:
- UGA = tryptophan, AUA = methionine, and AGA and AGG = stop
 Sequence in 1981 as the “Cambridge Reference Sequence” (before HGP)
- 2014: “revised Cambridge Reference Sequence” or rCRS
- 16,569bp long
- 1000 genome project used with GRCh37 +decoy to create the “g1k” reference
- This is the default for Golden Helix Sentieon pipeline and VarSeq interpretation
 NCBI36 (hg18) Included a MT reference NC_001807 in 2006:
- Derived from a African (Yoruba) Individual
- 16,571bp long, differing from the rCRS by 40 variants
- Removed from GenBank, don’t publish with this M!
- UCSC hg19 includes NC_001807 as “M” and still uses it today!
- Next VarSeq version drops support for this “hg19” genome
Variant Interpretation in VSClinical
 Evaluate and Classify Variants using ACMG
Guidelines:
- Focused workflow to evaluate criteria relevant to each variant,
resulting in final classification
- Aggregates annotations from population and clinical resources
- Customized visualizations and annotation presentations
- Allows easy look-up and cross reference
 Save Interpretations into Assessment
Catalogs:
- New samples have previous classifications brought in
- See previous interpretations, review and update
- Can be potted for regional context
 Use VarSeq’s Filter, GenomeBrowse,
VSReports:
- Customize to lab specific QC, annotation and filtering
- Genomic context of variant vital to assess
- VSReports allows custom presentation of VSClinical output
GRCh38: Implications for Variant Interpretation
 Assembly Regions:
- Multiple Species Alignment
- Repeat Regions / Low Complexity Regions
- Genomic “Super Dups”
 Genes (and Annotations)
- Functional Domains
- Transcript Counts of Gene Constraint
 Population Catalogs on GRCh37
- dbSNP
- 1000 Genomes
- ExAC / gnomAD Exomes / Genomes
 Clinical Annotations
- ClinVar
- CIVIC
- OMIM (variants, genes, phenotypes)
 Functional Annotations / Conservation
- CADD
- SIFT/Polyphen/Missense Badness
- Conservation scores
GRCh37: rs174264
GRCh37: rs174264
Substitution Leu (leucine) → Pro (proline) at 173
Leucine conserved in all vertebrates!
VarSeq Import LiftOver
Start with GRCh37 VCFs: LiftOver to GRCh38:
Or the Other Way Around! GRCh38 => GRCh37
[Demo in VarSeq]
Reasons to Switch to GRCh38
 Better for alignment
- More reads mapped
- Fewer variants called
 Better gene representations
- Fewer “frame-fixing” introns
- Some genes fixed/improved
 Newer annotations are GRCh38
- Large consortiums are
switching to GRCh38 first:
- Cancer: ICGC, COSMIC
- TopMed (65K WGS) snps
snps
mnps
mnps
indels
indels
complex
complex
270000
280000
290000
300000
310000
320000
330000
340000
GRCh37 GRCh38
My Exome
331,824
319,442
Better Gene Representation
 The human genome does not necessarily contain the mRNA sequence in RefSeq
 “Frame-fixing” intron introduced in alignment of mRNA coding sequence to human reference:
EMG1 on GRCh37:
EMG1 on GRCh38:
Some Variants are Pure “Reference Artifacts”
Some Variants are Pure “Reference Artifacts”
Considerations for Transitioning your Lab
 Switching your Secondary Pipeline
 Your Genomic Variants Being Saved:
- VSClinical Catalog / Assessment Catalogs
- Catalog of Observed CNVs
- VSWarehouse Projects (all variants from samples)
- Target capture annotations
- Custom in-house annotations
 Converting Existing Data:
- Re-import variants using import Liftover
- Export/import catalogs using Liftover
- Convert custom annotations using Liftover
Liftover Using Our Convert Wizard:
Thank you!
 Research reported in this publication was
supported by the National Institute Of General
Medical Sciences of the National Institutes of
Health under:
- Award Number R43GM128485
- Award Number 2R44 GM125432-01
- Award Number 2R44 GM125432-02
 PI is Dr. Andreas Scherer, CEO Golden Helix.
 The content is solely the responsibility of the
authors and does not necessarily represent the
official views of the National Institutes of
Health.
ASHG 2018
 Booth 408
 Live demos and CoLab Sessions
 Unveiling our new t-shirt designs
 Chance to win some iPads

More Related Content

PDF
New methods diploid assembly with graphs
PPTX
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
PPTX
Jason Chin MHC diploid assembly
PDF
Data analysis pipelines for NGS applications
PDF
The Trans-NIH RNAi Initiative : Informatics
PDF
The Clinical Significance of Transcript Alignment Discrepancies
PDF
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
PPTX
Agbt2015 workshop schneider
New methods diploid assembly with graphs
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Jason Chin MHC diploid assembly
Data analysis pipelines for NGS applications
The Trans-NIH RNAi Initiative : Informatics
The Clinical Significance of Transcript Alignment Discrepancies
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
Agbt2015 workshop schneider

What's hot (20)

PDF
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
PDF
Part 2 of RNA-seq for DE analysis: Investigating raw data
PPTX
Genome in a Bottle- reference materials to benchmark challenging variants and...
PPTX
GIAB update for GRC GIAB workshop 191015
PPTX
Introduction to Single-cell RNA-seq
PDF
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
PDF
wings2014 Workshop 1 Design, sequence, align, count, visualize
PPTX
Sept2016 smallvar 10_x
PPTX
Getting the most from the reference assembly
PDF
Preclinical Scale Bioprocessing, Nov. 2, 2009
PDF
Talk ABRF 2015 (Gunnar Rätsch)
PDF
Profile Multiple Cytokines and Chemokines Simultaneously with Very High Sensi...
PDF
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
PPTX
Abrf 2017 hadfield j
PPTX
CRISPR: Gene editing for everyone
PDF
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
PDF
New data from giab genomes promethion
PPTX
GIAB ASHG 2019 Small Variant poster
PPTX
CRISPR presentation extended Mouse Modeling
PDF
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Part 2 of RNA-seq for DE analysis: Investigating raw data
Genome in a Bottle- reference materials to benchmark challenging variants and...
GIAB update for GRC GIAB workshop 191015
Introduction to Single-cell RNA-seq
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
wings2014 Workshop 1 Design, sequence, align, count, visualize
Sept2016 smallvar 10_x
Getting the most from the reference assembly
Preclinical Scale Bioprocessing, Nov. 2, 2009
Talk ABRF 2015 (Gunnar Rätsch)
Profile Multiple Cytokines and Chemokines Simultaneously with Very High Sensi...
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
Abrf 2017 hadfield j
CRISPR: Gene editing for everyone
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
New data from giab genomes promethion
GIAB ASHG 2019 Small Variant poster
CRISPR presentation extended Mouse Modeling
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Ad

Similar to Using the GRCh38 reference assembly for clinical interpretation in VSClinical (20)

PDF
Annotation capabilities
PPTX
2015 functional genomics variant annotation and interpretation- tools and p...
PPTX
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
PPTX
GIAB for AMP GeT-RM Forum
PDF
Using VarSeq to Improve Variant Analysis Research Workflows
PDF
Using VarSeq to Improve Variant Analysis Research Workflows
PPTX
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
PDF
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
PDF
Beagle Imputation in SVS
PDF
BEAGLE Imputation in SVS for Human & Animal SNP Data
PPTX
Knowing Your NGS Upstream: Alignment and Variants
PPTX
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
PPTX
CS Guest Lecture 2015 10-05 advanced databases
PPTX
Genome in a bottle for amp GeT-RM 181030
PPTX
CS Lecture 2017 04-11 from Data to Precision Medicine
PDF
Grc ashg2015 workshop_mudge
PPTX
Performing a Trio Analysis in VSClinical
PDF
Comprehensive Clinical Workflows for Copy Number Variants in VarSeq
PPTX
Giab for jax long read 190917
PPTX
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
Annotation capabilities
2015 functional genomics variant annotation and interpretation- tools and p...
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
GIAB for AMP GeT-RM Forum
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Beagle Imputation in SVS
BEAGLE Imputation in SVS for Human & Animal SNP Data
Knowing Your NGS Upstream: Alignment and Variants
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
CS Guest Lecture 2015 10-05 advanced databases
Genome in a bottle for amp GeT-RM 181030
CS Lecture 2017 04-11 from Data to Precision Medicine
Grc ashg2015 workshop_mudge
Performing a Trio Analysis in VSClinical
Comprehensive Clinical Workflows for Copy Number Variants in VarSeq
Giab for jax long read 190917
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
Ad

More from Golden Helix (20)

PPTX
Automating Pharmacogenomic Workflows with VSWarehouse 3 From Variants to Clin...
PPTX
VSWarehouse 3: Secondary Analysis Platform Overview
PPTX
Automate, Import, & Interpret: Using Custom Scripts in VSClinical
PPTX
Powering Genomic Workflows with Upgraded Catalogs in VSWarehouse and VarSeq 3
PPTX
Dynamic and Flexible Fullstack NGS Pipelines in VSWarehouse 3
PPTX
VSWarehouse 3: Enterprise-Grade Genomic Analysis Across Cloud and On-Premise ...
PPTX
Automation in the Cloud With VSWarehouse 3.0: A User's Perspective
PPTX
The Latest and Greatest Golden Helix CancerKB 4.0 and Somatic Analysis within...
PPTX
Bring Your Own Cloud: Clinical Testing at Scale with VSWarehouse 3
PPTX
VarSeq 2.6.2: Advancements in Pharmacogenomics Reporting
PPTX
Combined Impact: New Tools to Assess Complex and Compound Heterozygous Varian...
PPTX
Integrating Long and Short Read Sequencing for Comprehensive NGS Analysis
PPTX
Complete Variant Assessment in VSClinical
PPTX
PGx Analysis in VarSeq: A User’s Perspective
PPTX
Introducing VarSeq Dx as a Medical Device in the European Union
PPTX
Introducing VSPGx: Pharmacogenomics Testing in VarSeq
PPTX
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
PDF
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
PPTX
Enhance Genomic Research with Polygenic Risk Score Calculations in SVS
PPTX
VarSeq 2.5.0: VSClinical AMP Workflow from the User Perspective
Automating Pharmacogenomic Workflows with VSWarehouse 3 From Variants to Clin...
VSWarehouse 3: Secondary Analysis Platform Overview
Automate, Import, & Interpret: Using Custom Scripts in VSClinical
Powering Genomic Workflows with Upgraded Catalogs in VSWarehouse and VarSeq 3
Dynamic and Flexible Fullstack NGS Pipelines in VSWarehouse 3
VSWarehouse 3: Enterprise-Grade Genomic Analysis Across Cloud and On-Premise ...
Automation in the Cloud With VSWarehouse 3.0: A User's Perspective
The Latest and Greatest Golden Helix CancerKB 4.0 and Somatic Analysis within...
Bring Your Own Cloud: Clinical Testing at Scale with VSWarehouse 3
VarSeq 2.6.2: Advancements in Pharmacogenomics Reporting
Combined Impact: New Tools to Assess Complex and Compound Heterozygous Varian...
Integrating Long and Short Read Sequencing for Comprehensive NGS Analysis
Complete Variant Assessment in VSClinical
PGx Analysis in VarSeq: A User’s Perspective
Introducing VarSeq Dx as a Medical Device in the European Union
Introducing VSPGx: Pharmacogenomics Testing in VarSeq
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
Enhance Genomic Research with Polygenic Risk Score Calculations in SVS
VarSeq 2.5.0: VSClinical AMP Workflow from the User Perspective

Recently uploaded (20)

PPTX
AI_in_Pharmaceutical_Technology_Presentation.pptx
PPT
Adrenergic drugs (sympathomimetics ).ppt
PPTX
Importance of Immediate Response (1).pptx
PPT
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PPTX
Trichuris trichiura infection
PPTX
Nursing Care Aspects for High Risk newborn.pptx
PPT
Recent advances in Diagnosis of Autoimmune Disorders
PPTX
Genaralised anxiety disorder presentation
PDF
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
PPTX
First aid in common emergency conditions.pptx
PDF
Structure Composition and Mechanical Properties of Australian O.pdf
PPTX
CBT FOR OCD TREATMENT WITHOUT MEDICATION
PDF
DAY-6. Summer class. Ppt. Cultural Nursing
PPTX
Galactosemia pathophysiology, clinical features, investigation and treatment ...
PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PDF
Megan Miller Colona Illinois - Passionate About CrossFit
PDF
MINERAL & VITAMIN CHARTS fggfdtujhfd.pdf
PPTX
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
PPTX
different types of Gait in orthopaedic injuries
PPTX
Current Treatment Of Heart Failure By Dr Masood Ahmed
AI_in_Pharmaceutical_Technology_Presentation.pptx
Adrenergic drugs (sympathomimetics ).ppt
Importance of Immediate Response (1).pptx
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
Trichuris trichiura infection
Nursing Care Aspects for High Risk newborn.pptx
Recent advances in Diagnosis of Autoimmune Disorders
Genaralised anxiety disorder presentation
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
First aid in common emergency conditions.pptx
Structure Composition and Mechanical Properties of Australian O.pdf
CBT FOR OCD TREATMENT WITHOUT MEDICATION
DAY-6. Summer class. Ppt. Cultural Nursing
Galactosemia pathophysiology, clinical features, investigation and treatment ...
Rheumatic heart diseases with Type 2 Diabetes Mellitus
Megan Miller Colona Illinois - Passionate About CrossFit
MINERAL & VITAMIN CHARTS fggfdtujhfd.pdf
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
different types of Gait in orthopaedic injuries
Current Treatment Of Heart Failure By Dr Masood Ahmed

Using the GRCh38 reference assembly for clinical interpretation in VSClinical

  • 1. Gabe Rudy | VP of Product & Engineering Using the GRCh38 reference assembly for clinical interpretation in VSClinical
  • 2. NIH Grant Funding Acknowledgments  Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: - Award Number R43GM128485 - Award Number 2R44 GM125432-01 - Award Number 2R44 GM125432-02  PI is Dr. Andreas Scherer, CEO Golden Helix.  The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • 3. Q & A Please enter your questions into your GoToWebinar Panel
  • 4. Golden Helix – Who We Are Golden Helix is a global bioinformatics company founded in 1998, celebrating our 20th year! CNV Analysis GWAS Genomic Prediction Large-N-Population Studies RNA-Seq Large-N CNV-Analysis Variant Warehouse Centralized Annotations Hosted Reports Sharing and Integration Variant Calling Filtering and Annotation Variant Interpretation Clinical Reports CNV Analysis Pipeline: Run Workflows
  • 5. Cited in over 1,300 peer-reviewed publications
  • 7. Golden Helix – Who We Are When you choose a Golden Helix solution, you get more than just software  REPUTATION  TRUST  EXPERIENCE  INDUSTRY FOCUS  THOUGHT LEADERSHIP  COMMUNITY  TRAINING  SUPPORT  RESPONSIVENES S  INNOVATION and SPEED  CUSTOMIZATIONS
  • 8. Agenda Genetic Testing with NGS Variant Representation Human Reference Genomes Implications for Variant Interpretation Demo using VarSeq + VSClinical Motivation for Using GRCh38 Other Lab Considerations Thanks! / Q&A
  • 9. NGS Genetics Testing Process Sample Prep Sequencing Align & Call Annotate & Filter Variant Interpretation Report Sentieon & VS-CNV VarSeq VSReportsVSClinical Golden Helix Clinical Suite VSWarehouse Aggregate Variants, Reports, Knowledgebase
  • 10. Representing a Variant  Genomic: - chr2: 47,641,560 A/T - NC_000002.11:g.47641560A>T - chr14: 51,378,590 TT/T - NC_000014.8 :g.51378593delT  Gene Coding Sequence: - BRAF c.1799T>A - NM_058197.4:c.105dupG - LRG_218t1(MSH2):c.942+3A>T  Gene Protein Sequence - DYX1C1 p.E417* - NP_000483.3:p.Phe508del  Genomic Representation Enables - Precise lookup of annotations - Overlap / relation to genomic features - Representation of non-genic variants  Coding Representation Enables - Genomic reference independent - UTR and Intronic variants - Informative representation of coding change  Protein Representation Enables - Grouping of variants that result in same protein - Descriptive of effect on protein - Coordinates match domains and protein DBs
  • 11. Genomes Are Just a Means to an End (Genes)  RefSeqGenes – mRNA sequence archive, with mappings to genomes - Provided mappings to Locus Reference Gene (LRG) database - Use genome mappings by NCBI (through genome annotation builds). NOT UCSC - “Clinically Relevant” transcript in VarSeq: - Most commonly submitted to ClinVar - LRG if available, longest if tied  Ensembl – defined directly against the human genome - More inclusive of genes discovered with high-throughput methods - Gencode subset – similar to RefSeqGenes in size / definition  Each have unique Accessions and Version Numbers - Newer releases are provided only on GRCh38 - GRCh37 mappings not being updated (“105 Interim” by special request)
  • 12. Variant Representation and the Reference Genome  NC_000015.10:g.48428426C>T (GRCh38)  NC_000015.9:g.48720623C>T (GRCh37)  NM_000138.4:c.6917G>A  NP_000129.3:p.Arg2306His  NG_008805.2:g.222363G>A  LRG_778t1:c.6917G>A  LRG_778p1:p.Arg2306His  LRG_778:g.222363G>A
  • 13. History of the Human Reference Genome  2003: Human Genome Project Declared Done  2006: NCBI36 (hg18) - Produced by the International Human Genome Sequencing Consortium - Used by first high throughput sequencers (Illumina GAII), pilot project of 1000 Genomes - UCSC uses its own sequential versioning, calling this hg18  2009: GRCh37 (hg19) - Handed over to the Genome Reference Consortium (GRCh) - Used by the 1000 genome project (Phase I/II/III) in the era of the HiSeq 2000  2013: GRCh38 (hg38) - ~100 assembly gaps updated, ~2000 erroneous alleles fixed - Included centromere models, mitochondrial reference, alternate sequences
  • 14. Alternative Loci / “Haplotypes” GRCh37 had 9 “Alt Ref Loci” GRCh38 has 35 “Alt Ref Loci” 3.6 Mb novel sequence 153 genes Up to 25% of these genes hare medically interpretable Alignment support Before using, ensure aligner can support alt loci without flagging “multi-alignment” codes that cause reads to be filtered out / lost. BWA-MEM supports alt loci.
  • 15. More than Chromosomes in your FASTA  Other bits of the reference: - Un-localized scaffolds assigned to chromosomes - Unplaced scaffolds (not assigned to chromosomes - Patches Releases (i.e. GRCh37.p13, GRCh38.p12) - Types of “alt”, “fix” or “novel” - Not applied, and do not change the primary sequence - You can think of them as “known issues, with proposed fixes for next major release”  Other useful things to add for alignment purposes: - A “decoy” reference genome segment as primary reference - DNA virus: human herpesvirus 4, type 1, aka Epstein-Barr virus (EBV) - Unique sequence found in HuRef (Craig Venter’s genome) or de novo assemblies - Other novel unaccounted for (or “novel” patch) sequence - Full set of HLA “haplotype” sequences, marked as “alternates”  Mitochondrial!
  • 16. The Human Mitochondrial  Our second genome: - Only 16Kb long - Encodes 37 genes (product of energy and its storage in ATP) - Slightly different genetic code than nuclear genes: - UGA = tryptophan, AUA = methionine, and AGA and AGG = stop  Sequence in 1981 as the “Cambridge Reference Sequence” (before HGP) - 2014: “revised Cambridge Reference Sequence” or rCRS - 16,569bp long - 1000 genome project used with GRCh37 +decoy to create the “g1k” reference - This is the default for Golden Helix Sentieon pipeline and VarSeq interpretation  NCBI36 (hg18) Included a MT reference NC_001807 in 2006: - Derived from a African (Yoruba) Individual - 16,571bp long, differing from the rCRS by 40 variants - Removed from GenBank, don’t publish with this M! - UCSC hg19 includes NC_001807 as “M” and still uses it today! - Next VarSeq version drops support for this “hg19” genome
  • 17. Variant Interpretation in VSClinical  Evaluate and Classify Variants using ACMG Guidelines: - Focused workflow to evaluate criteria relevant to each variant, resulting in final classification - Aggregates annotations from population and clinical resources - Customized visualizations and annotation presentations - Allows easy look-up and cross reference  Save Interpretations into Assessment Catalogs: - New samples have previous classifications brought in - See previous interpretations, review and update - Can be potted for regional context  Use VarSeq’s Filter, GenomeBrowse, VSReports: - Customize to lab specific QC, annotation and filtering - Genomic context of variant vital to assess - VSReports allows custom presentation of VSClinical output
  • 18. GRCh38: Implications for Variant Interpretation  Assembly Regions: - Multiple Species Alignment - Repeat Regions / Low Complexity Regions - Genomic “Super Dups”  Genes (and Annotations) - Functional Domains - Transcript Counts of Gene Constraint  Population Catalogs on GRCh37 - dbSNP - 1000 Genomes - ExAC / gnomAD Exomes / Genomes  Clinical Annotations - ClinVar - CIVIC - OMIM (variants, genes, phenotypes)  Functional Annotations / Conservation - CADD - SIFT/Polyphen/Missense Badness - Conservation scores GRCh37: rs174264 GRCh37: rs174264 Substitution Leu (leucine) → Pro (proline) at 173 Leucine conserved in all vertebrates!
  • 19. VarSeq Import LiftOver Start with GRCh37 VCFs: LiftOver to GRCh38: Or the Other Way Around! GRCh38 => GRCh37
  • 21. Reasons to Switch to GRCh38  Better for alignment - More reads mapped - Fewer variants called  Better gene representations - Fewer “frame-fixing” introns - Some genes fixed/improved  Newer annotations are GRCh38 - Large consortiums are switching to GRCh38 first: - Cancer: ICGC, COSMIC - TopMed (65K WGS) snps snps mnps mnps indels indels complex complex 270000 280000 290000 300000 310000 320000 330000 340000 GRCh37 GRCh38 My Exome 331,824 319,442
  • 22. Better Gene Representation  The human genome does not necessarily contain the mRNA sequence in RefSeq  “Frame-fixing” intron introduced in alignment of mRNA coding sequence to human reference: EMG1 on GRCh37: EMG1 on GRCh38:
  • 23. Some Variants are Pure “Reference Artifacts”
  • 24. Some Variants are Pure “Reference Artifacts”
  • 25. Considerations for Transitioning your Lab  Switching your Secondary Pipeline  Your Genomic Variants Being Saved: - VSClinical Catalog / Assessment Catalogs - Catalog of Observed CNVs - VSWarehouse Projects (all variants from samples) - Target capture annotations - Custom in-house annotations  Converting Existing Data: - Re-import variants using import Liftover - Export/import catalogs using Liftover - Convert custom annotations using Liftover Liftover Using Our Convert Wizard:
  • 26. Thank you!  Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: - Award Number R43GM128485 - Award Number 2R44 GM125432-01 - Award Number 2R44 GM125432-02  PI is Dr. Andreas Scherer, CEO Golden Helix.  The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • 27. ASHG 2018  Booth 408  Live demos and CoLab Sessions  Unveiling our new t-shirt designs  Chance to win some iPads

Editor's Notes

  • #3: The ACMG guidelines provide a set of criteria to score variants and place them into one of five classification tiers. Following the guidelines requires diving into the annotations, genomic context and existing clinical assertions about every variant. VSClinical provides a tailored workflow to score each relevant criterion, while also providing all the bioinformatic literature and evidence from clinical knowledgebases. This includes population frequencies, functional prediction scores, conservation scores, and effect prediction. Through standardized questions, automatically computed evidence, and historical context, VSClinical provides clinicians with everything they need to reach a final classification consistently and reproducibly, thereby reducing the subjectivity of the variant classification process. While a final classification of a variant requires the manual inspection of the sample’s clinical details, published literature and the content of existing related clinical interpretations, many criteria from the ACMG guidelines can be automatically computed with bioinformatic algorithms and specially curated annotation sources.  The work done in your lab to classify variants will be automatically included in future analyses. As the number of samples processed increases, the number of variants requiring classification will be reduced, along with sample turn-around time. Previously classified variants will be marked with their last evaluation date, allowing them to be accepted in the context of the current sample without extra analysis time. Look at 16 and 12
  • #5: Golden Helix is a bioinformatic software and analytics company seeking to promote the genetic research and translational genomics community by creating high quality software to analyze large genomic datasets The company was founded in 1998 and was developed from the founders early work in pharmacogenomics at GlaxoSmithKline, who is still a key investor and solid partner to this day. Our earliest developed software HelixTree eventually evolved into our powerful research application software SVS. SVS provides a large collection of great analytical tools for GWAS, Genomic Prediction, and Large N studies just to name a few. Eventually, there was clearly a need to develop a clinical software solution for efficient, repeatable and most importantly accurate variant interpretation. With this concept in mind, VarSeq was created. With VarSeq, users have the ability to not only streamline their variant interpretation workflows but also automatically create customizable clinical reports from the results. It was no surprise then, that with increased workflow efficiency and the ever increasing amount of genomic data, a data storage solution was also needed. With our server based Warehouse solution we offer additional power in organizing projects in a central repository, sharing all the collected data and projects easily, and a simple but powerful querying ability from multiple elements of your variant analysis.
  • #6: Our software has been very well received by the industry. We have been cited in over 1000 peer-reviewed publications and that’s a testament to our customer base.
  • #7: Another is that we have customers in over 350 organizations globally. This includes clinics such as Sick Kids, or Cincinnati Childrens, top tier institutions including Stanford and UCLA, Genetic testing labs with Prevention Genetics and LineaGen, pharmaceutical companies such as Bayer, and Govt institutions like USDA.
  • #8: One point that cant be stressed enough is that our software has become the quality product it is thanks to high usage and scrutiny from our users. Through this feedback and consideration from our users, we have developed the trust and experience necessary to support our claim of our high quality software. Not only do we rely on feedback from our users, but we also strive to be on the leading edge when considering both industry and community needs. This coupled with our desire to provide innovative, customizable, and quick solutions only reinforces that not only are you getting a quality product, but also software that remains cutting edge. Most importantly though, is the support you receive when using Golden Helix software. We do not hand you this software and let you free wishing you the best of luck. With the software comes the reassurance that users will have access to prompt support whether it is a simple question about installation or you would like training, we at Golden Helix are here to support our customers whole heartedly.
  • #12: For the vast majority of clinical genetics, we care about genes, not genomes.
  • #27: The ACMG guidelines provide a set of criteria to score variants and place them into one of five classification tiers. Following the guidelines requires diving into the annotations, genomic context and existing clinical assertions about every variant. VSClinical provides a tailored workflow to score each relevant criterion, while also providing all the bioinformatic literature and evidence from clinical knowledgebases. This includes population frequencies, functional prediction scores, conservation scores, and effect prediction. Through standardized questions, automatically computed evidence, and historical context, VSClinical provides clinicians with everything they need to reach a final classification consistently and reproducibly, thereby reducing the subjectivity of the variant classification process. While a final classification of a variant requires the manual inspection of the sample’s clinical details, published literature and the content of existing related clinical interpretations, many criteria from the ACMG guidelines can be automatically computed with bioinformatic algorithms and specially curated annotation sources.  The work done in your lab to classify variants will be automatically included in future analyses. As the number of samples processed increases, the number of variants requiring classification will be reduced, along with sample turn-around time. Previously classified variants will be marked with their last evaluation date, allowing them to be accepted in the context of the current sample without extra analysis time. Look at 16 and 12
  • #28: Golden Helix is a bioinformatic software and analytics company seeking to promote the genetic research and translational genomics community by creating high quality software to analyze large genomic datasets The company was founded in 1998 and was developed from the founders early work in pharmacogenomics at GlaxoSmithKline, who is still a key investor and solid partner to this day. Our earliest developed software HelixTree eventually evolved into our powerful research application software SVS. SVS provides a large collection of great analytical tools for GWAS, Genomic Prediction, and Large N studies just to name a few. Eventually, there was clearly a need to develop a clinical software solution for efficient, repeatable and most importantly accurate variant interpretation. With this concept in mind, VarSeq was created. With VarSeq, users have the ability to not only streamline their variant interpretation workflows but also automatically create customizable clinical reports from the results. It was no surprise then, that with increased workflow efficiency and the ever increasing amount of genomic data, a data storage solution was also needed. With our server based Warehouse solution we offer additional power in organizing projects in a central repository, sharing all the collected data and projects easily, and a simple but powerful querying ability from multiple elements of your variant analysis.