SlideShare a Scribd company logo
Global Phenotypic Data Sharing Standards
to Maximize Diagnostic Discovery
Melissa Haendel, PhD and Sebastian Köhler, PhD
RD-Action workshop
April 26th and 27th, Brussels
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange
What do we mean by phenotype?
 = Phenotypic abnormality = clinical feature
 Constellation/Pattern clinical features
defines a disease:
– [Disease X]... is a rare developmental disorder defined by
the combination of aplasia cutis congenita of the scalp
vertex and terminal transverse limb defects. In addition,
vascular anomalies such as cutis marmorata
telangiectatica ... are recurrently seen.
 (Yes, this is a simplification)
Starting point: OMIM
 Clinical Synopsis (CS) section
 Free text phenotypic description
 Very expressive
Online Mendelian Inheritance in Man database
(Un)Controlled Vocabularies
 Not designed to be easily machine interpretable
 Spelling problems, acronyms, etc.
 Homonyms:
... fibrillation ...
fibrillation ≠ fibrillation
= ventricular fibrillation= muscle fibrillation
Why you should care
OMIM Query Number of Results
large bones 264
large bone 785
enlarged bones 87
enlarged bone 156
big bones 16
huge bones 4
massive bones 28
hyperplastic bones 12
hyperplastic bone 40
bone hyperplasia 134
increased bone growth 612
Motivation
 HPO started in 2008
 Goal: computer-interpretable clinical features!
 Reliable information extraction from databases based on clinical
features
 Compute similarity between diseases based on clinical features
 Compute similarity between patients based on clinical features
 Compute similarity between patients and diseases based on clinical
features
 Interoperability with basic research to improve diagnostic discovery
 Easy to use
 Freely available
The Human Phenotype Ontology
(HPO)
Description of phenotypic abnormalities (or clinical features) in
humans
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
incoordination
abnormality of
movement
abnormality of the
central nervous
system
This is a term
CS of OMIM:0815
CS of OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
The Human Phenotype Ontology (HPO)
 Synonyms merged into one term
 Textual definitions for each term
id: HP:0002185
name: Neurofibrillary tangles
def: Pathological protein
aggregates formed by
hyperphosphorylation of a
microtubule-associated protein
known as tau, causing it to
aggregate in an insoluble form.
[HPO:sdoelken]
synonym: Neurofibrillary tangles
may be present EXACT []
synonym: Paired helical filaments
EXACT []
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
incoordination
The Human Phenotype Ontology
(HPO)
 Semantic relations
(’subclass of’, ‘is a’)
 From top to bottom,
terms get more specific
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a incoordination
Computable phenotype definitions of
disease
HPO Terms are used to annotate (describe) diseases
E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:
Orphanet + Monarch:
 ~124,000 annotations of 7,700 rare diseases from OMIM,
Orphanet, DECIPHER
 ~133,000 annotations of 3,145 common diseases
Köhler et al. https://doi.org/10.1093/nar/gkw1039
OMIM:0815 OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
Why HPO? Existing clinical vocabularies don’t
adequately cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
0
10
20
30
40
50
60
70
80
90
100
HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM
Percentcoverage
 LDDB (✓)
 Orphanet (✓) (Use HPO directly)
 MedDRA (✓)
 UMLS (completely incorporated)
Community contribution
Multiple HPO-specific workshops
Constant discussions via Tracker-System and E-Mail
We try our best to acknowledge contributors:
+ microattributions
Contributing to and extending HPO
HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%
Adoption of HPO
Public facing databases using HPO to
annotate patients
Tools ingesting HPO-annotated data:
Köhler et al. https://doi.org/10.1093/nar/gkw1039
Why HPO is a successful standard
 One language shared by “all“
 Synonyms “map“ to one concept (HPO term)
 Contains terms that no other ontology has
 Comes with disease annotations! (Not just “Yet another clinical
terminology“)
 Simple, qualitative phenotyping, deviation (abnormal, abnormal
increase, abnormal decrease, ...) to ease analysis
 Documented, traceable editing
 Open science community project with diverse contributors
 Constantly improved and extended, examples:
 Layperson version for patients
 Language translations
 Opposite-relations between terms
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange
A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with matching phenotype concepts is already good
Splenomegaly
Nasal speech
Increased spleen size Nasal voice
These are synonyms in
HPO, i.e. map to the
same term
These are synonyms in
HPO, i.e. map to the
same term
A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Splenomegaly Oral motor hypotonia
Ruptured spleen Decreased muscle mass
Similarity between two terms
Oral motor
hypotonia
Muscular
hypotonia of the
trunk
Abnormal muscle
tone
Oral motor
hypotonia
Abnormality of
calvarial
morphology
Phenotypic
abnormality
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
Comparing phenotype profiles
 E.g. Patient-to-Disease
comparison
 Patient‘s phenotypes
more similar to Disease A
 Orphamizer would rank
Disease A before Disease
Disease BPatientPatient Disease A
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
Orphamizer
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based visualization tools
 Phenotype data standards for exchange
The genome is sequenced, but...
3,398
OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000*
ClinVar
Variants with no known
pathogenicity
…we still don’t know very much about what it does
*This is > twice what it was
in 2016!
Adding other species’ data
helps fill knowledge gaps in human genome
More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
Even inclusion of just four species boosts
phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974
(union of coverage in any species)
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
Challenge: Each database uses their own
vocabulary/ontology
MP
HP
MGI
HPOA
Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNOMED
…
NCIT
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have absolutely
no idea what
that means
Decomposition of complex concepts
using species neutral terms
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner,
M. (2010). Integrating phenotype ontologies across multiple species.
Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
How can anatomy be “species-
neutral”?
HPO Interoperability and
annotations
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
 11,813
phenotype
terms
 127,125 rare
disease -
phenotype
annotations
 136,268
common
disease -
phenotype
annotations
http://bit.ly/hpo-paper
Which phenotypic profile is most
similar?
Model X
Patient
Disease Y
Model X
Patient
Disease Ywww.owlsim.org
Fuzzy-phenotype matching
But what about the diseases? How to choose
which ones? What is their provenance?
A dynamic nosology
 Challenge: can we rapidly synergize
multiple knowledge sources into a
dynamic ontology?
 classic clinical phenotype-oriented disease
classification and molecular sources
 Knowledge-based approaches
 Logical Definition OWL Ontology Merging
 Bayesian OWL Ontology Merging
 Data driven
 Phenotype and functional ontology networks
Mungall, C. J.,. (2016). k-BOOM: bioRxiv, 048843. doi:10.1101/048843
DOID
(blue)
OMIM
(brown)
MESH
(grey)
ORDO/Orphanet
(yellow)
SubClassOf
(solid line)
Xref
(dashed grey line)
4 disease resources
plus mappings:
Hemolytic anemia
Coherent disease classification =>
Orphanet
https://github.com/monarch-initiative/monarch-disease-ontology
“Ontology” Classes (before, after
merge)
SubClass axioms Xrefs
Inputs:
DOID 6878  6012 7082 36656
MESH (D) 11314  4152 19036
OMIM (D) 7783  7783 0 31242
Orphanet (D) 8740  4683 15182 20326
OMIA 4833  4833 3120 355
DC 209  208 310 316
Medic 0 8630 3435
Output:
Merged 39757  27617 44837
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange
Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME
/ GENOME
PATIENT CLINICAL
PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENT
PUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,
CORRELATIONS
Under-utilized data
Phenotypic profile matching
Combining G2P data for variant
prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 1000 genomes at GEL, Exomiser
predicts top candidate
In 86% of cases, Exomiser predicts within top 5
Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
http://bit.ly/exomiser
Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
4.9% exomes with dual molecular diagnoses,
differentiated with deep phenotyping
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange
How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency
Phenotype matching visualization
widget
file:///.file/id=657136
7.18966428
bit.ly/monarch-nar-2016
Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.org
http://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data
Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange
Genes Environment Phenotypes+ =
Biology central dogma
Standards for exchanging data
must be up to these challenges.
Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
SNOMED-CTMedical procedure coding
Environment Ontology
@ontowonka
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange mechanisms exist for
genes … but for phenotypes? Environment?
BED
Introducing PhenoPackets
A packet of phenotype data to be used
anywhere, written by anyone
http://phenopackets.org
What does a phenopacket look like?
 Alacrima
 Sleep Apnea
 Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
 Clinical labs
 Public databases
 Journals
What about patients? Can they phenotype
themselves?
Global phenotypic data sharing standards to maximize diagnostic discovery
HPO for Patients
http://bit.ly/hpo-biocuration
6,200 plain language terms for patients, families, and non-experts
New software application being developed for patients
Layperson HPO + Phenopackets
 Dry eyes
 Stops breathing during sleep
 Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries
• Social media
Journals are now requiring HPO
terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via DOI
in any repository
outside paywall (eg.
Figshare, Zenodo, etc)
Each article can be
associated with a
phenopacket
Community “curate-athons” for of HPO
Cardiovascular curate-athon at Stanford.
@20 cardiologists (surgeons, pediatric, etc.),
four ontologists, and three clinical curators
met for two days.
Abnormal Complex
Voltage to be added to all waves
-increased, decreased, fluctuating (alternans)
Duration to be added to all waves
-increased, decreased
P wave
-notching
-axis
QRS
-fractionation
-axis (right/left/extreme)
Q wave
R wave
S wave
R’ wave
S’ wave (abnormal only)
J wave (can be normal variant)
Epsilon wave (abnormal only)
Osborne wave (abnormal only)
Terminal slur wave (can be normal variant)
Delta wave (abnormal only)
Added 100s of clinically relevant
cardiophysiology phenotypes to HPO,
new exome analysis possible
Summary
 The Human Phenotype Ontology is a robust standard
describing phenotypic abnormalities FOR the community,
FROM the community for deep phenotyping rare disease
patients
 Model organism data can fill gaps in our knowledge and
aid mechanistic exploration of disease candidates
 Tools that leverage the Human Phenotype Ontology can be
used to prioritize coding and noncoding variants for WES
and WGS and CNVs
 Patients can provide self-phenotyping information as
partners in the deep phenotyping process
 Phenopackets is a FAIR-based GA4GH exchange standard
for facilitating distributed phenotype data sharing for
clinics, labs, patients, and journals
Acknowledgements
Orphanet
Ana Rath
Annie Olry
Marc Hanauer
Halima Lourghi
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
RENCI
Jim Balhoff
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Genomics
England/Queen Mary
Damian Smedley
Jules Jacobson
Jackson Laboratory
Peter Robinson
Leigh Carmody
With special thanks to Julie McMurry for excellent graphic design
Garvan
Tudor Groza
Craig McNamara
Hipbi / NeuroCure
Dominik Seelow
Markus Schülke-
Gerstenfeld
Charite
Dominik Seelow
Tomasz Zemojtel
www.monarchinitiative.org
Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C,
HSN268201400093P; NCATS: UDN U01TR001395,
Biomedical Data Translator: 1OT3TR002019; E-RARE 2015: Hipbi-RD 01GM1608

More Related Content

PPTX
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
PPTX
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
PPTX
GA4GH Phenotype Ontologies Task team update
PPTX
Phenopackets as applied to variant interpretation
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
PPTX
GA4GH Monarch Driver Project Introduction
PPTX
The Application of the Human Phenotype Ontology
PPTX
Why the world needs phenopacketeers, and how to be one
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
GA4GH Phenotype Ontologies Task team update
Phenopackets as applied to variant interpretation
The Monarch Initiative: From Model Organism to Precision Medicine
GA4GH Monarch Driver Project Introduction
The Application of the Human Phenotype Ontology
Why the world needs phenopacketeers, and how to be one

What's hot (20)

PPTX
Envisioning a world where everyone helps solve disease
PDF
Semantics for rare disease phenotyping, diagnostics, and discovery
PPTX
Deep phenotyping for everyone
PPTX
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
PPTX
On the frontier of genotype-2-phenotype data integration
PPTX
Semantic phenotyping for disease diagnosis and discovery
PPTX
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
PDF
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
PDF
Visual Exploration of Clinical and Genomic Data for Patient Stratification
PPTX
Monarch Initiative Poster - Rare Disease Symposium 2015
PDF
Guided visual exploration of patient stratifications in cancer genomics
PPTX
What's In a Genotype?: An Ontological Characterization for the Integration of...
PDF
Resazurin Cell Viability Assay
PDF
Data Visualization to Enhance our Understanding of the Cancer Genome
PPTX
Cell Authentication By STR Profiling
PPTX
Cell authentication by str profile
ODP
Mikel egana itbam_2010_ogo_system
PDF
Approaches for the Integration of Visual and Computational Analysis of Biomed...
PDF
A New Generation Of Mechanism-Based Biomarkers For The Clinic
PDF
How to transform genomic big data into valuable clinical information
Envisioning a world where everyone helps solve disease
Semantics for rare disease phenotyping, diagnostics, and discovery
Deep phenotyping for everyone
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
On the frontier of genotype-2-phenotype data integration
Semantic phenotyping for disease diagnosis and discovery
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Monarch Initiative Poster - Rare Disease Symposium 2015
Guided visual exploration of patient stratifications in cancer genomics
What's In a Genotype?: An Ontological Characterization for the Integration of...
Resazurin Cell Viability Assay
Data Visualization to Enhance our Understanding of the Cancer Genome
Cell Authentication By STR Profiling
Cell authentication by str profile
Mikel egana itbam_2010_ogo_system
Approaches for the Integration of Visual and Computational Analysis of Biomed...
A New Generation Of Mechanism-Based Biomarkers For The Clinic
How to transform genomic big data into valuable clinical information

Similar to Global phenotypic data sharing standards to maximize diagnostic discovery (20)

PPTX
Enhancing the Human Phenotype Ontology for Use by the Layperson
PPTX
Enhancing the Human Phenotype Ontology for Use by the Layperson
PDF
Enhancing the Human Phenotype Ontology for Use by the Layperson
PDF
Enhancing the Human Phenotype Ontology for Use by the Layperson
PPTX
Computing on Phenotypes AMP 2015
PPTX
Use of semantic phenotyping to aid disease diagnosis
PPTX
The Monarch Initiative: A semantic phenomics approach to disease discovery
PPTX
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
PPTX
Making the most of phenotypes in ontology-based biomedical knowledge discovery
PPTX
Enhancing Rare Disease Literature for Researchers and Patients
PPTX
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
PDF
Ontologies for representing, integrating and analyzing phenotypes
PPTX
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014
PDF
Semantic tools for aggregation of morphological characters across studies
PPTX
Integrating clinical and model organism G2P data for disease discovery
PPT
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
PPTX
Integrating phenotype ontologies across multiple species
PDF
Ontologies for life sciences: examples from the gene ontology
PPTX
From baleen to cleft palate: an ontological exploration of evolution and dis...
PPTX
GIGA2 Structuring Phenotype Data
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
Computing on Phenotypes AMP 2015
Use of semantic phenotyping to aid disease diagnosis
The Monarch Initiative: A semantic phenomics approach to disease discovery
HIPBI-RD: Harmonising phenomics information for a better interoperability in ...
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Enhancing Rare Disease Literature for Researchers and Patients
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
Ontologies for representing, integrating and analyzing phenotypes
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014
Semantic tools for aggregation of morphological characters across studies
Integrating clinical and model organism G2P data for disease discovery
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating phenotype ontologies across multiple species
Ontologies for life sciences: examples from the gene ontology
From baleen to cleft palate: an ontological exploration of evolution and dis...
GIGA2 Structuring Phenotype Data

More from mhaendel (13)

PPTX
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
PPTX
Equivalence is in the (ID) of the beholder
PPTX
Building (and traveling) the data-brick road: A report from the front lines ...
PPTX
Reusable data for biomedicine: A data licensing odyssey
PPTX
How open is open? An evaluation rubric for public knowledgebases
PPTX
Deep phenotyping to aid identification of coding & non-coding rare disease v...
PPTX
Science in the open, what does it take?
PPTX
Credit where credit is due: acknowledging all types of contributions
PPTX
Getting (and giving) credit for all that we do
PPTX
Force11: Enabling transparency and efficiency in the research landscape
PPTX
Dataset description using the W3C HCLS standard
PPTX
On the nature of Credit
PPTX
Standardizing scholarly output with the VIVO ontology
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
Equivalence is in the (ID) of the beholder
Building (and traveling) the data-brick road: A report from the front lines ...
Reusable data for biomedicine: A data licensing odyssey
How open is open? An evaluation rubric for public knowledgebases
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Science in the open, what does it take?
Credit where credit is due: acknowledging all types of contributions
Getting (and giving) credit for all that we do
Force11: Enabling transparency and efficiency in the research landscape
Dataset description using the W3C HCLS standard
On the nature of Credit
Standardizing scholarly output with the VIVO ontology

Recently uploaded (20)

PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
Welcome-grrewfefweg-students-of-2024.pptx
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Science Form five needed shit SCIENEce so
PPTX
perinatal infections 2-171220190027.pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PDF
Placing the Near-Earth Object Impact Probability in Context
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
TORCH INFECTIONS in pregnancy with toxoplasma
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PMR- PPT.pptx for students and doctors tt
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Welcome-grrewfefweg-students-of-2024.pptx
The Land of Punt — A research by Dhani Irwanto
Introcution to Microbes Burton's Biology for the Health
lecture 2026 of Sjogren's syndrome l .pdf
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Science Form five needed shit SCIENEce so
perinatal infections 2-171220190027.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Animal tissues, epithelial, muscle, connective, nervous tissue
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Placing the Near-Earth Object Impact Probability in Context

Global phenotypic data sharing standards to maximize diagnostic discovery

  • 1. Global Phenotypic Data Sharing Standards to Maximize Diagnostic Discovery Melissa Haendel, PhD and Sebastian Köhler, PhD RD-Action workshop April 26th and 27th, Brussels
  • 2. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based tools  Phenotype data standards for exchange
  • 3. What do we mean by phenotype?  = Phenotypic abnormality = clinical feature  Constellation/Pattern clinical features defines a disease: – [Disease X]... is a rare developmental disorder defined by the combination of aplasia cutis congenita of the scalp vertex and terminal transverse limb defects. In addition, vascular anomalies such as cutis marmorata telangiectatica ... are recurrently seen.  (Yes, this is a simplification)
  • 4. Starting point: OMIM  Clinical Synopsis (CS) section  Free text phenotypic description  Very expressive Online Mendelian Inheritance in Man database
  • 5. (Un)Controlled Vocabularies  Not designed to be easily machine interpretable  Spelling problems, acronyms, etc.  Homonyms: ... fibrillation ... fibrillation ≠ fibrillation = ventricular fibrillation= muscle fibrillation
  • 6. Why you should care OMIM Query Number of Results large bones 264 large bone 785 enlarged bones 87 enlarged bone 156 big bones 16 huge bones 4 massive bones 28 hyperplastic bones 12 hyperplastic bone 40 bone hyperplasia 134 increased bone growth 612
  • 7. Motivation  HPO started in 2008  Goal: computer-interpretable clinical features!  Reliable information extraction from databases based on clinical features  Compute similarity between diseases based on clinical features  Compute similarity between patients based on clinical features  Compute similarity between patients and diseases based on clinical features  Interoperability with basic research to improve diagnostic discovery  Easy to use  Freely available
  • 8. The Human Phenotype Ontology (HPO) Description of phenotypic abnormalities (or clinical features) in humans abnormality of the nervous system neurofibrillary tangles cerebral inclusion bodies gait ataxia gait disturbance ataxia phenotypic abnormality incoordination abnormality of movement abnormality of the central nervous system This is a term CS of OMIM:0815 CS of OMIM:1234 Neurofibrillary tangles may be present Paired helical filaments
  • 9. The Human Phenotype Ontology (HPO)  Synonyms merged into one term  Textual definitions for each term id: HP:0002185 name: Neurofibrillary tangles def: Pathological protein aggregates formed by hyperphosphorylation of a microtubule-associated protein known as tau, causing it to aggregate in an insoluble form. [HPO:sdoelken] synonym: Neurofibrillary tangles may be present EXACT [] synonym: Paired helical filaments EXACT [] abnormality of the nervous system neurofibrillary tangles cerebral inclusion bodies gait ataxia gait disturbance ataxia phenotypic abnormality abnormality of movement abnormality of the central nervous system incoordination
  • 10. The Human Phenotype Ontology (HPO)  Semantic relations (’subclass of’, ‘is a’)  From top to bottom, terms get more specific abnormality of the nervous system neurofibrillary tangles cerebral inclusion bodies gait ataxia gait disturbance ataxia phenotypic abnormality abnormality of movement abnormality of the central nervous system is a is a is a is a is a is a is a is a is a is a is a is a is a is a incoordination
  • 11. Computable phenotype definitions of disease HPO Terms are used to annotate (describe) diseases E.g. neurofibrillary tangles is used to annotate Alzheimer Disease: Orphanet + Monarch:  ~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER  ~133,000 annotations of 3,145 common diseases Köhler et al. https://doi.org/10.1093/nar/gkw1039 OMIM:0815 OMIM:1234 Neurofibrillary tangles may be present Paired helical filaments
  • 12. Why HPO? Existing clinical vocabularies don’t adequately cover phenotypic descriptions Winnenburg and Bodenreider, 2014 0 10 20 30 40 50 60 70 80 90 100 HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM Percentcoverage  LDDB (✓)  Orphanet (✓) (Use HPO directly)  MedDRA (✓)  UMLS (completely incorporated)
  • 13. Community contribution Multiple HPO-specific workshops Constant discussions via Tracker-System and E-Mail We try our best to acknowledge contributors: + microattributions
  • 14. Contributing to and extending HPO
  • 15. HPO language translations We need your help! http://bit.ly/hpo-translations Translation of labels, synonyms, and text definitions Italian Spanish Russian French German English layperson Japanese Chinese 100%11% 12% 100% 19%19% near 100% 20%
  • 16. Adoption of HPO Public facing databases using HPO to annotate patients Tools ingesting HPO-annotated data: Köhler et al. https://doi.org/10.1093/nar/gkw1039
  • 17. Why HPO is a successful standard  One language shared by “all“  Synonyms “map“ to one concept (HPO term)  Contains terms that no other ontology has  Comes with disease annotations! (Not just “Yet another clinical terminology“)  Simple, qualitative phenotyping, deviation (abnormal, abnormal increase, abnormal decrease, ...) to ease analysis  Documented, traceable editing  Open science community project with diverse contributors  Constantly improved and extended, examples:  Layperson version for patients  Language translations  Opposite-relations between terms
  • 18. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based tools  Phenotype data standards for exchange
  • 19. A disease can be described algorithmically as a collection of phenotypes Patient Disease X Differential diagnosis with matching phenotype concepts is already good Splenomegaly Nasal speech Increased spleen size Nasal voice These are synonyms in HPO, i.e. map to the same term These are synonyms in HPO, i.e. map to the same term
  • 20. A disease can be described algorithmically as a collection of phenotypes Patient Disease X Differential diagnosis with similar but non-matching phenotypes is difficult Splenomegaly Oral motor hypotonia Ruptured spleen Decreased muscle mass
  • 21. Similarity between two terms Oral motor hypotonia Muscular hypotonia of the trunk Abnormal muscle tone Oral motor hypotonia Abnormality of calvarial morphology Phenotypic abnormality High scoring match Very low scoring match Medium scoring match Score: Measured by Information Content
  • 22. Comparing phenotype profiles  E.g. Patient-to-Disease comparison  Patient‘s phenotypes more similar to Disease A  Orphamizer would rank Disease A before Disease Disease BPatientPatient Disease A High scoring match Very low scoring match Medium scoring match Score: Measured by Information Content
  • 24. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based visualization tools  Phenotype data standards for exchange
  • 25. The genome is sequenced, but... 3,398 OMIM Mendelian Diseases with no known genetic basis ? At least 120,000* ClinVar Variants with no known pathogenicity …we still don’t know very much about what it does *This is > twice what it was in 2016!
  • 26. Adding other species’ data helps fill knowledge gaps in human genome
  • 27. More species = more coverage 19,008 78% 14,779 Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016 19,008 Even inclusion of just four species boosts phenotypic coverage of genes by 38% (5189%) Combined = 89% 19,008 2,195 7,544 7,235 = 16,974 (union of coverage in any species) 9,739 51% Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
  • 28. Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Image credits: "HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG http://www.guinealynx.info/pododermatitis.html
  • 29. Challenge: Each database uses their own vocabulary/ontology MP HP MGI HPOA
  • 30. Challenge: Each database uses their own phenotype vocabulary/ontology ZFA MP DPO WPO HP OMIA VT FYPO APO SNOMED … NCIT … WB PB FB OMIA MGI RGD ZFIN SGD HPOA EHR IMPC OMIM … QTLdb
  • 31. Can we help machines understand phenotype terms? “Palmoplantar hyperkeratosis” Human phenotype I have absolutely no idea what that means
  • 32. Decomposition of complex concepts using species neutral terms Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2 “Palmoplantar hyperkeratosis” increased Stratum corneum layer of skin = Human phenotype PATO Uberon Species neutral ontologies, homologous concepts Autopod keratinization GO
  • 33. How can anatomy be “species- neutral”?
  • 34. HPO Interoperability and annotations Hyposmia Abnormality of globe location eyeball of camera-type eye sensory perception of smell Abnormal eye morphology Motor neuron atrophyDeeply set eyes motor neuronCL 34571 annotations in 22 species 157534 phenotype annotations 2150 phenotype annotations  11,813 phenotype terms  127,125 rare disease - phenotype annotations  136,268 common disease - phenotype annotations http://bit.ly/hpo-paper
  • 35. Which phenotypic profile is most similar? Model X Patient Disease Y
  • 37. But what about the diseases? How to choose which ones? What is their provenance?
  • 38. A dynamic nosology  Challenge: can we rapidly synergize multiple knowledge sources into a dynamic ontology?  classic clinical phenotype-oriented disease classification and molecular sources  Knowledge-based approaches  Logical Definition OWL Ontology Merging  Bayesian OWL Ontology Merging  Data driven  Phenotype and functional ontology networks Mungall, C. J.,. (2016). k-BOOM: bioRxiv, 048843. doi:10.1101/048843
  • 40. Coherent disease classification => Orphanet https://github.com/monarch-initiative/monarch-disease-ontology “Ontology” Classes (before, after merge) SubClass axioms Xrefs Inputs: DOID 6878  6012 7082 36656 MESH (D) 11314  4152 19036 OMIM (D) 7783  7783 0 31242 Orphanet (D) 8740  4683 15182 20326 OMIA 4833  4833 3120 355 DC 209  208 310 316 Medic 0 8630 3435 Output: Merged 39757  27617 44837
  • 41. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based tools  Phenotype data standards for exchange
  • 42. Prevailing clinical genomic pipelines leverage only a tiny fraction of the available data PATIENT EXOME / GENOME PATIENT CLINICAL PHENOTYPES PUBLIC GENOMIC DATA PUBLIC CLINICAL PHENOTYPE, DISEASE DATA POSSIBLE DISEASES DIAGNOSIS & TREATMENT PATIENT ENVIRONMENT PUBLIC ENVIRONMENT, DISEASE DATA PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES, CORRELATIONS Under-utilized data
  • 44. Combining G2P data for variant prioritization Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PHIVE score to give final candidates Mendelian filters
  • 45. Exomiser results for UDP diagnosed patients Inclusion of phenotype data improves variant prioritization In 60% of first 1000 genomes at GEL, Exomiser predicts top candidate In 86% of cases, Exomiser predicts within top 5
  • 46. Example case solved by Exomiser Phenotypic profile Genes Heterozygous, missense mutation STIM-1 N/A Heterozygous, missense mutation STIM-1 N/A Stim1Sax/Sax Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data, in the absence of traditional data sources http://bit.ly/exomiser
  • 47. Deep phenotyping and “fuzzy” matching algorithms improve diagnostics 4.9% exomes with dual molecular diagnoses, differentiated with deep phenotyping
  • 48. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based tools  Phenotype data standards for exchange
  • 49. How much phenotyping is enough? Enlarged ears (2)Dark hair (6) Female (4) Male (4) Blue skin (1) Pointy ears (1) Hair absent on head (1) Horns present (1) Hair present on head (7) Enlarged lip (2) Increased skin pigmentation (3) bit.ly/annotationsufficiency
  • 51. Matchmaker Exchange for patients, diseases, and model organisms to aid diagnosis and mechanistic discovery www.monarchinitiative.org http://bit.ly/Monarch-MME Goal: Get clinical sites & public databases to provide standardized phenotype data
  • 52. Talk outline  About HPO  Semantic similarity  Leveraging basic research data  Exome analysis and disease discovery  HPO-based tools  Phenotype data standards for exchange
  • 53. Genes Environment Phenotypes+ = Biology central dogma Standards for exchanging data must be up to these challenges.
  • 54. Genes Environment Phenotypes+ = Computable encodings are essential Base pairs Variant notation (eg. HGVS) SNOMED-CTMedical procedure coding Environment Ontology @ontowonka
  • 55. Genes Environment Phenotypes VCF PXFGFF Standard exchange mechanisms exist for genes … but for phenotypes? Environment? BED
  • 56. Introducing PhenoPackets A packet of phenotype data to be used anywhere, written by anyone http://phenopackets.org
  • 57. What does a phenopacket look like?  Alacrima  Sleep Apnea  Microcephaly phenotype_profile: - entity: ”patient16" phenotype: types: - id: "HP:0000522" label: ”Alacrima" onset: description: “at birth” types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: ”PMID:"  Clinical labs  Public databases  Journals
  • 58. What about patients? Can they phenotype themselves?
  • 60. HPO for Patients http://bit.ly/hpo-biocuration 6,200 plain language terms for patients, families, and non-experts New software application being developed for patients
  • 61. Layperson HPO + Phenopackets  Dry eyes  Stops breathing during sleep  Small head phenotype_profile: - entity: “Grace” phenotype: types: - id: "HP:0000522" label: “Alacrima" onset: description: “at birth" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: “ECO:0000033” label: “Traceable Author Statement" source: - id: “ https://twitter.com/examplepatient/status/1 23456789” • Patient registries • Social media
  • 62. Journals are now requiring HPO terms Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 Each phenopacket can be shared via DOI in any repository outside paywall (eg. Figshare, Zenodo, etc) Each article can be associated with a phenopacket
  • 63. Community “curate-athons” for of HPO Cardiovascular curate-athon at Stanford. @20 cardiologists (surgeons, pediatric, etc.), four ontologists, and three clinical curators met for two days. Abnormal Complex Voltage to be added to all waves -increased, decreased, fluctuating (alternans) Duration to be added to all waves -increased, decreased P wave -notching -axis QRS -fractionation -axis (right/left/extreme) Q wave R wave S wave R’ wave S’ wave (abnormal only) J wave (can be normal variant) Epsilon wave (abnormal only) Osborne wave (abnormal only) Terminal slur wave (can be normal variant) Delta wave (abnormal only) Added 100s of clinically relevant cardiophysiology phenotypes to HPO, new exome analysis possible
  • 64. Summary  The Human Phenotype Ontology is a robust standard describing phenotypic abnormalities FOR the community, FROM the community for deep phenotyping rare disease patients  Model organism data can fill gaps in our knowledge and aid mechanistic exploration of disease candidates  Tools that leverage the Human Phenotype Ontology can be used to prioritize coding and noncoding variants for WES and WGS and CNVs  Patients can provide self-phenotyping information as partners in the deep phenotyping process  Phenopackets is a FAIR-based GA4GH exchange standard for facilitating distributed phenotype data sharing for clinics, labs, patients, and journals
  • 65. Acknowledgements Orphanet Ana Rath Annie Olry Marc Hanauer Halima Lourghi Lawrence Berkeley Chris Mungall Suzanna Lewis Jeremy Nguyen Seth Carbon RENCI Jim Balhoff OHSU Matt Brush Kent Shefchek Julie McMurry Tom Conlin Nicole Vasilevsky Dan Keith Genomics England/Queen Mary Damian Smedley Jules Jacobson Jackson Laboratory Peter Robinson Leigh Carmody With special thanks to Julie McMurry for excellent graphic design Garvan Tudor Groza Craig McNamara Hipbi / NeuroCure Dominik Seelow Markus Schülke- Gerstenfeld Charite Dominik Seelow Tomasz Zemojtel
  • 66. www.monarchinitiative.org Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C, HSN268201400093P; NCATS: UDN U01TR001395, Biomedical Data Translator: 1OT3TR002019; E-RARE 2015: Hipbi-RD 01GM1608

Editor's Notes

  • #2: Fix attributions/institutions
  • #3: I take the first two points
  • #4: One of the workshop questions was : why the HPO has been recommended as an optimal ontology for clinical (phenotypic) descriptions. I have not been part of the process that lead to this recommendation, such that I will try to rather give my impression, why HPO has been so successfull over the last 8 years. First, what is the content of HPO. It contains phenotypic abnormalities ... Definition in the context of HPO ... Bla bla
  • #5: What data did we want to use in the beginning. This is what we had.
  • #6: Problems. Well – known. Just briefly.
  • #7: Why is it so important to have controlled vocabularies at all Query today: Search: 'large bone' Results: 9,128 entries. Search: 'enlarged bone' Results: 3,912 entries.
  • #13: CHV = Consumer Health Vocabulary
  • #16: Translation teams at: https://github.com/Human-Phenotype-Ontology/HPO-translations/blob/master/README.md Contact: sebastian.koehler@charite.de
  • #18: Merged with next slide
  • #25: You take it from here Melissa?
  • #26: There is a lot we don’t know about the genome As of March 2017, OMIM number: 3398 unknown 4,964 known ClinVar number: 121,000 at least with the addition that these are variants that researchers have found suspicious, due to rarity in the population or something else, contextually 160k variants in the entire genome is not much
  • #27: Each organism provides unique genetic & phenotypic data that helps fill in knowledge gaps in the human genome. For example, much work has been done in chicks to understand limb development. I used to work in a fruit fly lab studying the brain, so I am particularly attached to fly data. As you can imagine, phenotypes described for flies, or other models, use very different terms than those used for humans. Later, I will discuss how Monarch is overcoming this challenge. Now I will show you an example of how using phenotype data from other organisms can improve human health.
  • #28: If clinvar + omim 20  80%
  • #30: 2 issues: database integration, vocabulary integration
  • #31: Multiple databases
  • #32: Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
  • #33: We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
  • #35: Represent organism as a biological subject Represent diseases/genotypes as collections of nodes in the graph Interoperable with other bioinformatics resources and leverage modern semantic standards 5 root classes: Phenotypic abnormality, Mode of Inheritance, Clinical modifier, Mortality/Ageing, Frequency 11,813 classes/terms in HPO ~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER ~133,000 annotations of 3,145 common diseases
  • #37: OWLsim algorithm About HPO 2: We want the vocabulary to be enable sophisticated phenotypic matching within and across species
  • #39: Our team has led international ontology development efforts, including ICD112, the HPO29,30, the Gene Ontology18,31,32, and major tissue/cell ontologies used for mam- malian functional genomics20,33–37. We have extensive experience integrating data using these ontologies38,39. A fundamental challenge is to translate the vocabularies used by clinicians via EMRs and billing systems to those used in primary research data. For example, a clinician may describe a patient as having “Microcephaly” with an EMR code ICD10-Q02. A basic scientist using mice may describe this condition with MP:0003303.To translate between clinician and scientist, we provide services that map equivalent concepts15,40. Finally, TransMed will generate dynamic ontologies by combining existing classifications with data in the system, e.g. to gener- ate disease nosologies based on pathway membership, orthology, and phenotypic similarity. /// Nosology: We will prototype dynamic ontology generation based on combining our existing knowledge sources. We will apply a mixture of methods. This includes our own k-BOOM Bayesian algorithm that weighs different knowledge sources and ontologies. We will also apply our data-driven techniques for generating nosologies based on molecular mechanistic information ingested into our knowledge graph. For low probability associations and equivalencies that may have high value, we will perform some curation to reconcile these.
  • #40: https://github.com/monarch-initiative/monarch-disease-ontology/issues/90 Note the two subgraphs; little overlap in the upper areas
  • #47: This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
  • #48: Example showing how adding fuzzy phenotype matching improves disease diagnosis above using sequence based methodologies alone.
  • #50: Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
  • #51: This is a lot of text and not easy to see for the audience.
  • #54: The classic G+E=P. But the = has a lot that can be applied to aid the linking. G-P or D (disease) causes contributes to is risk factor for protects against correlates with is marker for modulates involved in increases susceptibility to G-G (kind of) regulates negatively regulates (inhibits) positively regulates (activates) directly regulates interacts with co-localizes with co-expressed with P/D - P/D part of results in co-occurs with correlates with hallmark of (P->D) E-P contributes to (E->P) influences (E->P) exacerbates (E->P) manifest in (P->E) G-E (kind of) expressed in expressed during contains inactivated by
  • #55: The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  • #56: The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  • #66: Needs adjusting yet
  • #67: Fully translational – from bench to bedside – group of stakeholders, contributors, and partners