SlideShare a Scribd company logo
Clinical Grade Annotations:
Public Data Resources for
Interpreting Genomic Variants
February 19, 2105
Gabe Rudy
@gabeinformatics
VP Product Management and Engineering
Golden Helix
My Background
 Golden Helix
- Founded in 1998
- Genetic association software
- Analytic services
- Thousands of users worldwide
- Over 800 customer citations in journals
 Products I Build with My Team
- SNP & Variation Suite (SVS)
- SNP, CNV, NGS tertiary analysis
- Import and deal with all flavors of upstream data
- VarSeq
- Annotate and filter variants in gene panels, exomes and
genomes for clinical labs and researchers.
- GenomeBrowse (Free!)
- Visualization of everything with genomic coordinates.
All standardized file formats.
Agenda
Getting High Quality Variant Calls
Data Sharing and the Maturing of Public Resources
2
3
4
Clinical Grade Candidate Variant Identification
How I Met My Exomes1
NGS Clinical Utopia: Are We There Yet?5
Exome Sequencing in Consumer Genomics
 Exomes done as part of Pilot
Program
 80x coverage
 Raw data with no interpretation
Erin
JIA
Gabe
(me)
Ethan
Research or clinical grade?
Total Reads 140M
Unique Align 87%
Mean Target 105x
% Target at 2x 97%
% Target at 10x 94%
% Target at 20x 89%
% Target at 30x 83%
Agenda
Getting High Quality Variant Calls
Data Sharing and the Maturing of Public Resources
2
3
4
Clinical Grade Candidate Variant Identification
How I Met My Exomes1
NGS Clinical Utopia: Are We There Yet?5
PSPH mis-alignment
Splice Mutation
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants
GRCh38 – Here Now, but no hurry
 A better human reference
- Revised Cambridge Reference
Sequence (rCRS) MT
- Has centromere models
- ~2000 incorrect alleles fixed
- ~100 assembly gaps updated
 NCBI Annotations 106 on 38
- dbSNP 141, ClinVar,
RefSeqGene
- Ensembl 76 on both
 No Poplulation Catalogs
- Some being ported (by
Ensembl, dbSNP)
GRCh37 GRCh38
Ts/Tv 2.06558 2.10171
snps
snps
mnps
mnps
indels
indels
complex
complex
270000
280000
290000
300000
310000
320000
330000
340000
GRCh37 GRCh38
My Exome
331,824
319,442
Blog Post
Agenda
Getting High Quality Variant Calls
Data Sharing and the Maturing of Public Resources
2
3
4
Clinical Grade Candidate Variant Identification
How I Met My Exomes1
NGS Clinical Utopia: Are We There Yet?5
Baylor Workflow - Clinical Exomes Paper
Disease gene related
Medically actionable
deleterious variants
Deleterious variants in ACMG
gene list
Deleterious variants
VUS in dominant gene or
homozygous in recessive
gene
Deleterious variant in gene
with no known disease
Annotate, Then Filter and Interpret
Data Sources to Replicate Workflow
 1000 Genomes (Phase 1)
 “ESP” (NHLBI 6500 Exomes v2)
 HGMD (Public vs Professional)
 Variant’s Protein Coding Effect
 RNA Splicing Effect (dbscSNV)
- −3 to +8 at the 5’, −12 to +2 at the 3’
 Genes Lists:
- Single-Gene Disorder (OMIM with Inheritance)
- Medically Actionable (114 genes NHLBI study)
- Dominant Inheritance (MedGen)
- ACMG Carrier Panel (ACMG Incidental
Findings guidelines)
My Exome Analyzed
Start: 235,689
847
234,842
224,914
9,928
9,069
807
859
40
242 13
59 565
0
624
624
255
20
20
20
0
0
598
644
• Pathogenic OTC Variant
• What if I got this through BabySeq?
Agenda
Getting High Quality Variant Calls
Data Sharing and the Maturing of Public Resources
2
3
4
Clinical Grade Candidate Variant Identification
How I Met My Exomes1
NGS Clinical Utopia: Are We There Yet?5
Annotating against Transcripts
 RefSeqGenes – Versioned on RNA sequence
- Annotated against human reference by “Annotation Releases”
- Last on GRCh37 was 105 (2013-08-20) – GRCh38 release 106 (2014-01-17)
- 84,950 transcripts, most are “predicted” (XM_” and non-coding)
- Standard in US for reporting variation (NM_016335.4:c.123C>T etc)
- UCSC grabs RNA from RefSeq directly and maps to their genome references
“continuously”
 Ensembl – Versioned on Alignment
- GENCODE: Well curated subset of high-quality, validated transcripts
- V75 last version of GRCh37, 2014-06-27
- Many specific bio-types, but protein_coding usued for annotation
- Has mappings to RefSeq IDs, but
Reference Sequence Versus Gene Sequence
EMG1 on GRCh37
 “Gap” of the mRNA coding sequence versus reference seq:
 Handled differently by 3 different “gene alignments”
Reference Sequence Versus Gene Sequence
EMG1 on GRCh38
 Reference sequence patched, no gap
 Alignments agree
RefSeq Accession Not Sufficient for Var-Tx Interaction
RefSeq defines transcripts as mRNA sequence
 NCBI “Annotation Releases” (like v105) provides alignments using “Splign”
 UCSC pulls RefSeq mRNA and aligns themselves using “BLAT”
 They can choose equally valid but different alignments for the same assession
 This alignment of NM_052814.3 places the exon at dramatically different loci.
 Will result in different annotations of any variant overlapping these exons
COSMIC
 Does not provide data in easy
to use form for NGS
 Just announced change in
licensing affective in March
- Access to the COSMIC website will
stay free for all users.
- The new licensing strategy will
charge for-profit organisations to
download COSMIC datasets.
- Download by academic and non-
profit organisations will remain free
 2015 Roadmap:
- GRCh38
- More curation
- Visualization improvements
ClinVar
 Submitters:
- OMIM: Johns Hopkins
- Samuels
- Lab for Molecular Medicine
- Invitae
- Emory Genetics Lab
Star rating system
- 0-4 stars – level of review
ClinVar is designed to provide a freely accessible,
public archive of reports of the relationships
among human variations and phenotypes, with
supporting evidence.
ClinVitae: ClinVar and Friends by Invitae
Sources:
- ClinVar (62,913)
- Emory (13,365)
- ARUP (2,850)
- Carver Mut (199)
- K Cunningham (581)
79,907 V, 9,189 G
- 32,523 Pathogenic
- 38,796 Likely Pathogenic
Provided in HGVS
- 59,878 after mapping to genomic space
BRCA: The back door to Myriad’s database
1995 – Patent issued
to Myriad Genetics
June 2013 – Patents
invalidated by ruling
Lab setting up Dx
has a lot of catch up
“Free the Data” and
other ways in which
Mryiad’s data is in
ClinVar, etc.
Sharing Clinical Reports Project
BRCA: In my wife
HGMD
 Data mines academic
papers for reported
functional variants
 Also takes
submissions,
corrections reviewed by
team
 First available in 1996
- Originally 10k variants
- 105k in Public (2014)
- 148k in “Pro” (2014)
Left-Align Delta F508 to Make it Match
Left-Align Annotations
 Using a Smith-
Waterman
algorithm to left-
align variants
from public
databases show
non-obvious
differences
 NGS alignment
and variant
calling always
left-aligned
 Left-align your
database so they
can be annotated
Changes in Monthly Updates
• 36 variants went missing from
December to Jan release
• Some where Pathogenic
ClinVar’s VCF File
• ClinVar current relies on their
dbSNP identifier mappings to
“build” VCF files
• There are ~14,000 small variants
in their database without dbSNP
identifiers, and thus missing from
the VCF
• ~5K Pathogenic
• Often these variants are in newer
dbSNP builds, and the ClinVar
mappings are just not updated.
• This variant was in ClinVar, with
genomic coordinates, but no
RSID:
- HGVS(c.): NM_002894.2:c.298C>T
- Chromosome:Start:Stop:
18:20548818:20548818
- (Recently RSID was added)
dbSNP 141 Had Allele Errors
 I reported the issue
7/22/2014
 Confirmed, 8/12
generated better VCF
and placed in “test”
folder
 Found more issues
 Replaced official VCF
in 02/09/2014
 We waited until fixed
to publish official
support
Agenda
Getting High Quality Variant Calls
Data Sharing and the Maturing of Public Resources
2
3
4
Clinical Grade Candidate Variant Identification
How I Met My Exomes1
NGS Clinical Utopia: Are We There Yet?5
 asdf
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants
NM_002626.4:c.1877G>C in PFKL
 NP_002617.3:p.Arg626Pro missense mutation
 Predicted damaging by 4/5 functional predictions
 VEST3: 0.948, GERP++: 4.59
 ExAC and 1kG have a G>A, but G>C is novel
 Variants in region are extremely rare (G>C ExAC 4 of 122,364 alleles) – 0.003%
 No ClinVar variants for gene
 OMIM entry has no known disease association
 PubMed search shows few recent articles: Most recent 1998 paper showed
- phosphofructokinase (PFKL) overexpressed in Down syndrome (DS)
- Transgenic PFKL mice had an abnormal glucose metabolism with reduced clearance
rate from blood and enhanced metabolic rate in brain.
 d
 d
35 LoF Variants, None Homozygous
Training
 Most variants are rare or novel
- Training to interpret these is
extensive
 MD/Pathology background is
insufficient
 Need a PhD in molecular
genetics
 There’s only 500 board certified
Clinical Molecular Geneticists
since started
 Let’s share in the learning
process
Baylor Exome Sign-Out
Phenotypeing and Matchmaking Portals
 Diagnosis often requires finding
another family to confirm a novel
gene to phenotype association
 Finding a second family:
- Social media
- PhenoDB
- PhenomeCentral.org
- Orphanet – Resources on over 6000 rare
diseases and orphan drugs.
- European centric: GEN2PHEN (G2P)
Matt Might found a second
family with NGLY1
deficiency through a blog
post that went viral.
N-Glycanase Deficiency
 http://www.ngly1.org/
 Matthew Might and Matt Wilsey. The
shifting model in clinical diagnostics:
how next-generation sequencing and
families are altering the way rare
diseases are discovered, studied,
and treated. Genetics in Medicine.
March 2014.
Thank you
 Heidi Rehm – Chief Laboratory Director at
Laboratory for Molecular Medicine,
PCPGM
 Joel Parker – Cancer Genetics, UNC
Chapel Hill
 Gerry Higgins – VP, Pharmacogenomic
Science, Assure Rx Health
 Frank Schacherer – Chief Technical
Officer, BIOBASE
 Reece Hart – Computational Biologist,
Invitae (now 23andMe)
 Greta Linse Peterson – Director of Product
Management and Quality, Golden Helix
Questions?

More Related Content

PPTX
2015 functional genomics variant annotation and interpretation- tools and p...
PPT
The ClinGen Sequence Variant Interpretation Working Group: Refining Criteria ...
PDF
ACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
PPTX
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
PPTX
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
PPTX
Using Public Access Clinical Databases to Interpret NGS Variants
PDF
Platforms CIBERER and INB-ELIXIR-es
PDF
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
2015 functional genomics variant annotation and interpretation- tools and p...
The ClinGen Sequence Variant Interpretation Working Group: Refining Criteria ...
ACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
Report from the Gene & Disease Specific Database Advisory Council - Peter Ta...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Using Public Access Clinical Databases to Interpret NGS Variants
Platforms CIBERER and INB-ELIXIR-es
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...

What's hot (20)

PDF
Bioinformatics and NGS for advancing in hearing loss research
PPTX
Genomics, Bioinformatics, and Pathology
PDF
How to transform genomic big data into valuable clinical information
PPTX
2016 ngs health_lecture
PDF
A New Generation Of Mechanism-Based Biomarkers For The Clinic
PPTX
2015 bio it visualizing genomic variants and annotations is vital for accur...
PPTX
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
PPTX
Envisioning a world where everyone helps solve disease
PDF
Tips for effective use of BLAST and other NCBI tools
PDF
The trivial case of the missing heritability
PPTX
The Monarch Initiative: A semantic phenomics approach to disease discovery
PDF
The server of the Spanish Population Variability
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
PDF
Bioinformatics in dermato-oncology
PDF
Multigenic (mechanistic) biomarkers
PDF
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
PDF
LOKITHESWARI VIPPALA
PDF
Big Data and Genomic Medicine by Corey Nislow
PDF
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
PDF
Bioinformatics tools for development, analysis, and preclinical testing of in...
Bioinformatics and NGS for advancing in hearing loss research
Genomics, Bioinformatics, and Pathology
How to transform genomic big data into valuable clinical information
2016 ngs health_lecture
A New Generation Of Mechanism-Based Biomarkers For The Clinic
2015 bio it visualizing genomic variants and annotations is vital for accur...
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Envisioning a world where everyone helps solve disease
Tips for effective use of BLAST and other NCBI tools
The trivial case of the missing heritability
The Monarch Initiative: A semantic phenomics approach to disease discovery
The server of the Spanish Population Variability
The Monarch Initiative: From Model Organism to Precision Medicine
Bioinformatics in dermato-oncology
Multigenic (mechanistic) biomarkers
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
LOKITHESWARI VIPPALA
Big Data and Genomic Medicine by Corey Nislow
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Bioinformatics tools for development, analysis, and preclinical testing of in...
Ad

Viewers also liked (16)

PDF
How humanities changed_the_world
 
PPT
15 ways to take control of your time at work
PDF
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
PPTX
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PPTX
The Birth and Growth of the Social Sciences
PPTX
Disciplines and Ideas in Social Sciences
PPTX
Social science
PDF
Social sciences scope and importance
PPTX
Introduction to Social Science
PPTX
Origins of social science
PPTX
Hon did the Social Sciences emerge? Hon does it link to the Natural Sciences?
PPTX
Historical Context: Emergence of Social Science Disciplines
PDF
What is Social Science
PPTX
An Introduction To Social Science
PPTX
Eyes Health Diseases And Problems- Know The Facts
PPTX
Developing a research Library position statement on Text and Data Mining in t...
How humanities changed_the_world
 
15 ways to take control of your time at work
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
The Birth and Growth of the Social Sciences
Disciplines and Ideas in Social Sciences
Social science
Social sciences scope and importance
Introduction to Social Science
Origins of social science
Hon did the Social Sciences emerge? Hon does it link to the Natural Sciences?
Historical Context: Emergence of Social Science Disciplines
What is Social Science
An Introduction To Social Science
Eyes Health Diseases And Problems- Know The Facts
Developing a research Library position statement on Text and Data Mining in t...
Ad

Similar to 2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants (20)

PPTX
VarSeq 2.6.2: Advancements in Pharmacogenomics Reporting
PPTX
CS Lecture 2017 04-11 from Data to Precision Medicine
PDF
Annotation capabilities
PPTX
Introducing VSPGx: Pharmacogenomics Testing in VarSeq
PPSX
Church SFAF2014 keynote
PPTX
VarSeq 2.3.0: Supporting the Full Spectrum of Genomic Variation
PPTX
Genetics & Genomic Testing
PPTX
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
PDF
Supporting Genomics in the Practice of Medicine by Heidi Rehm
PDF
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
PDF
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
PPTX
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
PPTX
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
PDF
NGS in Clinical Research: Meet the NGS Experts Series Part 1
PPTX
Complete Variant Assessment in VSClinical
PDF
A User’s Perspective: ACMG Guidelines for CNVs in VSClinical
PPTX
Exome Analysis with VS-CNV and VSClinical: Updated Strategies and Expanded Ca...
PPTX
Integrating Custom Gene Panels for Variant Innovations
PPTX
Next-Generation Sequencing Analysis in VSClinical
PPTX
Advanced genomics v_medical_pitt_kent_osu
VarSeq 2.6.2: Advancements in Pharmacogenomics Reporting
CS Lecture 2017 04-11 from Data to Precision Medicine
Annotation capabilities
Introducing VSPGx: Pharmacogenomics Testing in VarSeq
Church SFAF2014 keynote
VarSeq 2.3.0: Supporting the Full Spectrum of Genomic Variation
Genetics & Genomic Testing
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Evaluating Copy Number Variants with VSClinical's New ACMG Guideline Workflow
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
NGS in Clinical Research: Meet the NGS Experts Series Part 1
Complete Variant Assessment in VSClinical
A User’s Perspective: ACMG Guidelines for CNVs in VSClinical
Exome Analysis with VS-CNV and VSClinical: Updated Strategies and Expanded Ca...
Integrating Custom Gene Panels for Variant Innovations
Next-Generation Sequencing Analysis in VSClinical
Advanced genomics v_medical_pitt_kent_osu

Recently uploaded (20)

PPTX
Uterus anatomy embryology, and clinical aspects
PPTX
Gastroschisis- Clinical Overview 18112311
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
SKIN Anatomy and physiology and associated diseases
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PPTX
Important Obstetric Emergency that must be recognised
PPTX
1 General Principles of Radiotherapy.pptx
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
Neuropathic pain.ppt treatment managment
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPT
Management of Acute Kidney Injury at LAUTECH
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPTX
Acid Base Disorders educational power point.pptx
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
Uterus anatomy embryology, and clinical aspects
Gastroschisis- Clinical Overview 18112311
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
SKIN Anatomy and physiology and associated diseases
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Important Obstetric Emergency that must be recognised
1 General Principles of Radiotherapy.pptx
History and examination of abdomen, & pelvis .pptx
Neuropathic pain.ppt treatment managment
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
Management of Acute Kidney Injury at LAUTECH
neonatal infection(7392992y282939y5.pptx
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Acid Base Disorders educational power point.pptx
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
OPIOID ANALGESICS AND THEIR IMPLICATIONS
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
MENTAL HEALTH - NOTES.ppt for nursing students

2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpreting Genomic Variants

  • 1. Clinical Grade Annotations: Public Data Resources for Interpreting Genomic Variants February 19, 2105 Gabe Rudy @gabeinformatics VP Product Management and Engineering Golden Helix
  • 2. My Background  Golden Helix - Founded in 1998 - Genetic association software - Analytic services - Thousands of users worldwide - Over 800 customer citations in journals  Products I Build with My Team - SNP & Variation Suite (SVS) - SNP, CNV, NGS tertiary analysis - Import and deal with all flavors of upstream data - VarSeq - Annotate and filter variants in gene panels, exomes and genomes for clinical labs and researchers. - GenomeBrowse (Free!) - Visualization of everything with genomic coordinates. All standardized file formats.
  • 3. Agenda Getting High Quality Variant Calls Data Sharing and the Maturing of Public Resources 2 3 4 Clinical Grade Candidate Variant Identification How I Met My Exomes1 NGS Clinical Utopia: Are We There Yet?5
  • 4. Exome Sequencing in Consumer Genomics  Exomes done as part of Pilot Program  80x coverage  Raw data with no interpretation Erin JIA Gabe (me) Ethan
  • 5. Research or clinical grade? Total Reads 140M Unique Align 87% Mean Target 105x % Target at 2x 97% % Target at 10x 94% % Target at 20x 89% % Target at 30x 83%
  • 6. Agenda Getting High Quality Variant Calls Data Sharing and the Maturing of Public Resources 2 3 4 Clinical Grade Candidate Variant Identification How I Met My Exomes1 NGS Clinical Utopia: Are We There Yet?5
  • 12. GRCh38 – Here Now, but no hurry  A better human reference - Revised Cambridge Reference Sequence (rCRS) MT - Has centromere models - ~2000 incorrect alleles fixed - ~100 assembly gaps updated  NCBI Annotations 106 on 38 - dbSNP 141, ClinVar, RefSeqGene - Ensembl 76 on both  No Poplulation Catalogs - Some being ported (by Ensembl, dbSNP) GRCh37 GRCh38 Ts/Tv 2.06558 2.10171 snps snps mnps mnps indels indels complex complex 270000 280000 290000 300000 310000 320000 330000 340000 GRCh37 GRCh38 My Exome 331,824 319,442
  • 14. Agenda Getting High Quality Variant Calls Data Sharing and the Maturing of Public Resources 2 3 4 Clinical Grade Candidate Variant Identification How I Met My Exomes1 NGS Clinical Utopia: Are We There Yet?5
  • 15. Baylor Workflow - Clinical Exomes Paper Disease gene related Medically actionable deleterious variants Deleterious variants in ACMG gene list Deleterious variants VUS in dominant gene or homozygous in recessive gene Deleterious variant in gene with no known disease
  • 16. Annotate, Then Filter and Interpret
  • 17. Data Sources to Replicate Workflow  1000 Genomes (Phase 1)  “ESP” (NHLBI 6500 Exomes v2)  HGMD (Public vs Professional)  Variant’s Protein Coding Effect  RNA Splicing Effect (dbscSNV) - −3 to +8 at the 5’, −12 to +2 at the 3’  Genes Lists: - Single-Gene Disorder (OMIM with Inheritance) - Medically Actionable (114 genes NHLBI study) - Dominant Inheritance (MedGen) - ACMG Carrier Panel (ACMG Incidental Findings guidelines)
  • 18. My Exome Analyzed Start: 235,689 847 234,842 224,914 9,928 9,069 807 859 40 242 13 59 565 0 624 624 255 20 20 20 0 0 598 644
  • 19. • Pathogenic OTC Variant • What if I got this through BabySeq?
  • 20. Agenda Getting High Quality Variant Calls Data Sharing and the Maturing of Public Resources 2 3 4 Clinical Grade Candidate Variant Identification How I Met My Exomes1 NGS Clinical Utopia: Are We There Yet?5
  • 21. Annotating against Transcripts  RefSeqGenes – Versioned on RNA sequence - Annotated against human reference by “Annotation Releases” - Last on GRCh37 was 105 (2013-08-20) – GRCh38 release 106 (2014-01-17) - 84,950 transcripts, most are “predicted” (XM_” and non-coding) - Standard in US for reporting variation (NM_016335.4:c.123C>T etc) - UCSC grabs RNA from RefSeq directly and maps to their genome references “continuously”  Ensembl – Versioned on Alignment - GENCODE: Well curated subset of high-quality, validated transcripts - V75 last version of GRCh37, 2014-06-27 - Many specific bio-types, but protein_coding usued for annotation - Has mappings to RefSeq IDs, but
  • 22. Reference Sequence Versus Gene Sequence EMG1 on GRCh37  “Gap” of the mRNA coding sequence versus reference seq:  Handled differently by 3 different “gene alignments”
  • 23. Reference Sequence Versus Gene Sequence EMG1 on GRCh38  Reference sequence patched, no gap  Alignments agree
  • 24. RefSeq Accession Not Sufficient for Var-Tx Interaction RefSeq defines transcripts as mRNA sequence  NCBI “Annotation Releases” (like v105) provides alignments using “Splign”  UCSC pulls RefSeq mRNA and aligns themselves using “BLAT”  They can choose equally valid but different alignments for the same assession  This alignment of NM_052814.3 places the exon at dramatically different loci.  Will result in different annotations of any variant overlapping these exons
  • 25. COSMIC  Does not provide data in easy to use form for NGS  Just announced change in licensing affective in March - Access to the COSMIC website will stay free for all users. - The new licensing strategy will charge for-profit organisations to download COSMIC datasets. - Download by academic and non- profit organisations will remain free  2015 Roadmap: - GRCh38 - More curation - Visualization improvements
  • 26. ClinVar  Submitters: - OMIM: Johns Hopkins - Samuels - Lab for Molecular Medicine - Invitae - Emory Genetics Lab Star rating system - 0-4 stars – level of review ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence.
  • 27. ClinVitae: ClinVar and Friends by Invitae Sources: - ClinVar (62,913) - Emory (13,365) - ARUP (2,850) - Carver Mut (199) - K Cunningham (581) 79,907 V, 9,189 G - 32,523 Pathogenic - 38,796 Likely Pathogenic Provided in HGVS - 59,878 after mapping to genomic space
  • 28. BRCA: The back door to Myriad’s database 1995 – Patent issued to Myriad Genetics June 2013 – Patents invalidated by ruling Lab setting up Dx has a lot of catch up “Free the Data” and other ways in which Mryiad’s data is in ClinVar, etc. Sharing Clinical Reports Project
  • 29. BRCA: In my wife
  • 30. HGMD  Data mines academic papers for reported functional variants  Also takes submissions, corrections reviewed by team  First available in 1996 - Originally 10k variants - 105k in Public (2014) - 148k in “Pro” (2014)
  • 31. Left-Align Delta F508 to Make it Match
  • 32. Left-Align Annotations  Using a Smith- Waterman algorithm to left- align variants from public databases show non-obvious differences  NGS alignment and variant calling always left-aligned  Left-align your database so they can be annotated
  • 33. Changes in Monthly Updates • 36 variants went missing from December to Jan release • Some where Pathogenic
  • 34. ClinVar’s VCF File • ClinVar current relies on their dbSNP identifier mappings to “build” VCF files • There are ~14,000 small variants in their database without dbSNP identifiers, and thus missing from the VCF • ~5K Pathogenic • Often these variants are in newer dbSNP builds, and the ClinVar mappings are just not updated. • This variant was in ClinVar, with genomic coordinates, but no RSID: - HGVS(c.): NM_002894.2:c.298C>T - Chromosome:Start:Stop: 18:20548818:20548818 - (Recently RSID was added)
  • 35. dbSNP 141 Had Allele Errors  I reported the issue 7/22/2014  Confirmed, 8/12 generated better VCF and placed in “test” folder  Found more issues  Replaced official VCF in 02/09/2014  We waited until fixed to publish official support
  • 36. Agenda Getting High Quality Variant Calls Data Sharing and the Maturing of Public Resources 2 3 4 Clinical Grade Candidate Variant Identification How I Met My Exomes1 NGS Clinical Utopia: Are We There Yet?5
  • 39. NM_002626.4:c.1877G>C in PFKL  NP_002617.3:p.Arg626Pro missense mutation  Predicted damaging by 4/5 functional predictions  VEST3: 0.948, GERP++: 4.59  ExAC and 1kG have a G>A, but G>C is novel  Variants in region are extremely rare (G>C ExAC 4 of 122,364 alleles) – 0.003%  No ClinVar variants for gene  OMIM entry has no known disease association  PubMed search shows few recent articles: Most recent 1998 paper showed - phosphofructokinase (PFKL) overexpressed in Down syndrome (DS) - Transgenic PFKL mice had an abnormal glucose metabolism with reduced clearance rate from blood and enhanced metabolic rate in brain.
  • 40.  d
  • 41.  d 35 LoF Variants, None Homozygous
  • 42. Training  Most variants are rare or novel - Training to interpret these is extensive  MD/Pathology background is insufficient  Need a PhD in molecular genetics  There’s only 500 board certified Clinical Molecular Geneticists since started  Let’s share in the learning process Baylor Exome Sign-Out
  • 43. Phenotypeing and Matchmaking Portals  Diagnosis often requires finding another family to confirm a novel gene to phenotype association  Finding a second family: - Social media - PhenoDB - PhenomeCentral.org - Orphanet – Resources on over 6000 rare diseases and orphan drugs. - European centric: GEN2PHEN (G2P) Matt Might found a second family with NGLY1 deficiency through a blog post that went viral.
  • 44. N-Glycanase Deficiency  http://www.ngly1.org/  Matthew Might and Matt Wilsey. The shifting model in clinical diagnostics: how next-generation sequencing and families are altering the way rare diseases are discovered, studied, and treated. Genetics in Medicine. March 2014.
  • 45. Thank you  Heidi Rehm – Chief Laboratory Director at Laboratory for Molecular Medicine, PCPGM  Joel Parker – Cancer Genetics, UNC Chapel Hill  Gerry Higgins – VP, Pharmacogenomic Science, Assure Rx Health  Frank Schacherer – Chief Technical Officer, BIOBASE  Reece Hart – Computational Biologist, Invitae (now 23andMe)  Greta Linse Peterson – Director of Product Management and Quality, Golden Helix