SlideShare a Scribd company logo
Making the most of phenotypes
in ontology-based biomedical
knowledge discovery
1
Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University
@micheldumontier::Biostats:19-02-15
Topics
• Computable Phenotypes
• Methods to compare Phenotypes
• Cross-Species Phenotype Integration
• Applications
– Undiagnosed Diseases
– Drug Target Identification
– Drug Repurposing
@micheldumontier::Biostats:19-02-152
Phenotypes
• A phenotype is an observable characteristic of
an individual and typically pertains to its
morphology, function, and behavior.
– qualitative, deals with normal and abnormal phen.
– red eye color, abnormal gait, enlarged colon
@micheldumontier::Biostats:19-02-153
Diagnosis
uses observable/measured phenotypes
“Phenotypic Profile”
@micheldumontier::Biostats:19-02-154
Matching patients to diseases
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
@micheldumontier::Biostats:19-02-155
Differential diagnosis becomes challenging
with rare and complex disorders
• Over 7000 rare diseases
• < 1 in 1500-2500
• Most have fewer than 50
case reports
• Nearly 1 in 10 Americans
suffer from one or more
rare diseases
• Only 250 medicinal
products have been
approved to diagnose and
treat rare diseases
@micheldumontier::Biostats:19-02-156
Carpenter Syndrome
- acrocephalopolysyndactyly (ACPS)
disorder
- 40 cases described in the literature
- <1 in 1M
Genotypes + Phenotypes
Improves Diagnosis
@micheldumontier::Biostats:19-02-157
Remove off-target, common variants,
and variants not in known disease
causing genes
http://compbio.charite.de/PhenIX/
Target panel of 2741 known
Mendelian disease genes
Compare
phenotype
profiles from:
Clinvar, OMIM,
Orphanet
Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123
PhenIX helped diagnose 11/40 patients
@micheldumontier::Biostats:19-02-158
So how did they do it?
1. Computable representation of phenotypes
2. Methods to compare phenotype profiles
3. Using model organisms to increase coverage
of the phenotype space
@micheldumontier::Biostats:19-02-159
Difficult to find all results
using text searches
@micheldumontier::Biostats:19-02-1510
The Human Phenotype Ontology:
A Computable Representation of Human Phenotypes
11,000+ classes
Follows the True Path Rule
Used to annotate:
• Patients
• Disorders/Diseases
• Genes, Gene Variants,
& Genotypes
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
@micheldumontier::Biostats:19-02-1511
HPO has unique terms
@micheldumontier::Biostats:19-02-1512
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
Increased numbers of
diseases are described
using the HPO
@micheldumontier::Biostats:19-02-1513
Phenotype annotations per species
http://www.monarchinitiative.org
Phenotype “BLAST”: Which phenotypic
profile is most similar?
Disease X
Patient
Disease Y
@micheldumontier::Biostats:19-02-1514
Phenotips: Getting high quality
patient phenotypes
@micheldumontier::Biostats:19-02-1515
Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research
Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347
Semantic Similarity
• Semantic similarity is a metric defined over a set of
terms, where the distance between them is based on
their meaning.
• It can be estimated by examining, for instance,
– Topological similarity
– Information content
– Statistical co-occurrence
• Widely used in bioinformatics for gene enrichment,
function prediction, network screening, clustering,
etc.
@micheldumontier::Biostats:19-02-1516
@micheldumontier::Biostats:19-02-1517
= X
similarity
Measures of Semantic Similarity
Edge-Based Measures
– Shortest path (Rada)
– Common path
– Scaling by depth, etc.
• Requires uniform distribution
of nodes and edges
Node-based Measures
– Shared terms
– Common ancestors
– Information content (IC)
• Better able to account for
structural heterogeneity
Set comparisons
• Pairwise
– Max/average/sum
– All or best pairs
• Groupwise
– Set, graph, vector
– Various combinations
Implementations
– Semanticmeasureslibrary.org
– OWL-SIM
@micheldumontier::Biostats:19-02-1518
Semantic Similarity in Biomedical Ontologies
PLoS Comput Biol. 2009 Jul; 5(7): e1000443.
Term specificity
@micheldumontier::Biostats:19-02-1519
𝑖𝑐 𝑡 = −log(𝑃 𝑡 )
𝑖𝑐 𝑡 = 𝑑𝑒𝑝𝑡ℎ 𝑡 𝑥 1 −
log 𝑑𝑒𝑠𝑐 𝑡 + 1
log 𝑡𝑜𝑡𝑎𝑙 𝑡𝑒𝑟𝑚𝑠
Structure-based
Corpus-based
by: Heiko Muller, CSIRO
Group-wise Similarity
@micheldumontier::Biostats:19-02-1520
𝐽 𝐴, 𝐵 =
|𝐴 ∩ 𝐵|
|𝐴 ∪ 𝐵|
𝐽 g1, g2 =
6
11
= 0.55
Group-wise Semantic Similarity
@micheldumontier::Biostats:19-02-1521
IC(g1) = 10.66
IC(g1) = 9.79
IC(g1 ⊕ g2) = 2.79
-------------------
sim(g1,g2)=0.27
𝑠𝑖𝑚 g1, g2 =
1
2
𝐼𝐶(g1 ⊕ g2)
𝐼𝐶(g1)
+
𝐼𝐶(g1 ⊕ g2)
𝐼𝐶(g2)
X. Chen et al. Gene. 2012. 509(1):131-5
Robustness of phenotype annotations
@micheldumontier::Biostats:19-02-1522
Image credit: Viljoen and Beighton, J Med Genet. 1992
Schwartz-jampel Syndrome, Type I
 Schwartz-jampel Syndrome,
Type I
 Caused by Hspg2 mutation, a
proteoglycan
~100 phenotype annotations
@micheldumontier::Biostats:19-02-1523
Similarity of Schwartz-jampel Syndrome derivations
@micheldumontier::Biostats:19-02-1524
Semantic similarity
is robust in the face of missing information
 92% of derived profiles are most similar to original
disease profile
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1525
Semantic similarity algorithms
are sensitive to specificity of information
 The more general the phenotype, the poorer the
match the disease
Profile Similarity Derived Profile Rank
@micheldumontier::Biostats:19-02-1526
Annotation Sufficiency Score
http://www.phenotips.orghttp://www.monarchinitiative.org
@micheldumontier::Biostats:19-02-1527
Problem: less than 40% of human
genes are annotated with phenotypes
GWAS
+
ClinVar
+
OMIM
@micheldumontier::Biostats:19-02-1528
B6.Cg-Alms1foz/fox/J
increased weight,
adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)
[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,
diabetes mellitus,
insulin resistance
increased food intake,
hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
A multi-species inventory of phenotypes
from genetic perturbations
@micheldumontier::Biostats:19-02-1529
Down Syndrome Mouse
@micheldumontier::Biostats:19-02-1530
Ts65Dn mice survive to adulthood and express
some characteristics of Down syndrome such
as developmental delay, hyperactivity, weight
problems, craniofacial dysmorphology,
impaired learning, and behavior deficit
Each species uniquely covers a
different set of phenotypes
Provides an opportunity to use this information to inform
human disease @micheldumontier::Biostats:19-02-1531
Human and model phenotypes can be
linked to >75% human genes
@micheldumontier::Biostats:19-02-1532
Problem: Clinical and model
phenotypes are described differently
@micheldumontier::Biostats:19-02-1533
lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Problem:
Each organism uses different vocabularies
develops_from
part_of
is_a (SubClassOf)
surrounded_by
@micheldumontier::Biostats:19-02-1534
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
abnormal
endocrine
pancreas
morphology
abnormal
pancreatic
beta cell
number
abnormal
pancreatic
alpha cell
morphology
abnormal
pancreatic
alpha cell
differentiation
abnormal
pancreatic
alpha cell
number
@micheldumontier::Biostats:19-02-1535
Enhance lexical approach with OWL
bridging axioms
• Key idea:
– Describe the phenotype in a machine-interpretable
way
• Break it down into digestible chunks!
• Logical definition
– The machine will then be able to help you
• Match phenotypes
• Automate ontology checking and addition of new terms
• Approach:
– Use Web Ontology Language (OWL), a description
logic to describe phenotypes
– Use OWL reasoning to find connections
Mungall et al. (2012). Genome Biology, 13(1), R5
Köhler et al. (2014) F1000Research 2:30
Haendel et al. (2014) JBMS 5:21
Hoendorf et al. (2011). NAR 39(18):e119
Hoendorf et al. (2011) Bioinformatics 27(7):1001
@micheldumontier::Biostats:19-02-1536
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
type B
pancreatic
cell
islet
of
Langerhans
endocrine
pancreas
part
of
part
of
abnormal
endocrine
pancreas
morphology
part
of
@micheldumontier::Biostats:19-02-1537
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
type B
pancreatic
cell
islet
of
Langerhans
endocrine
pancreas
part
of
part
of
abnormal
endocrine
pancreas
morphology
part
of
mass
morphology
quality
@micheldumontier::Biostats:19-02-1538
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
abnormal
endocrine
pancreas
morphology
abnormal
pancreatic
beta cell
number
abnormal
pancreatic
alpha cell
morphology
abnormal
pancreatic
alpha cell
differentiation
abnormal
pancreatic
alpha cell
number
inferred
from CL
inferred
from PATO
‘abnormal phenotype’ and
has_entity some ‘type B pancreatic cell’ and
has_quality some amount
‘abnormal phenotype’ and
has_entity some ‘type B pancreatic cell’ and
has_quality some ‘reduced amount’
@micheldumontier::Biostats:19-02-1539
Monarch Cross-Species Similarity
@micheldumontier::Biostats:19-02-1540
PhenomeDrug
Computational methods that use phenotypes to
predict drug targets, drug effects, and drug
indications
@micheldumontier::Biostats:19-02-1541
animal models provide insight for on target effects
• In the majority of 100 best selling drugs ($148B in
US alone), there is a direct correlation between
knockout phenotype and drug effect
• Immunological Indications
– Anti-histamines (Claritin, Allegra, Zyrtec)
– KO of histamine H1 receptor leads to decreased
responsiveness of immune system
– Predicts on target effects : drowsiness, reduced
anxiety
@micheldumontier::Biostats:19-02-1542
Zambrowicz and Sands. Nat Rev Drug Disc. 2003.
Identifying drug targets
from mouse knock-out phenotypes
@micheldumontier::Biostats:19-02-1543
drug
gene
phenotypes effects
human gene
non-functional
gene model
ortholog
similar
inhibits
Main idea: if a drug’s phenotypes matches the phenotypes of a
null model, this suggests that the drug is an inhibitor of the gene
Terminological Interoperability
(we must compare apples with apples)
Mouse
Phenotypes
Drug effects
(mappings from UMLS to DO, NBO, MP)
Mammalian
Phenotype
OntologyPhenomeNet
PhenomeDrug
@micheldumontier::Biostats:19-02-15
poor
coordination
decreased gut
peristalsis
axon
degeneration
decreased
stride length
erotypic
ehavior
Abnormal
EEG
failure to find
food
Unstable
posture
Constipation
Neuronal loss in
Substantia Nigra
Shuffling gait
Resting tremors
REM disorder
Hyposmia
poor rotarod
performance
decreased gut
peristalsis
axon
degeneration
decreased
stride length
sterotypic
behavior
abnormal
EEG
failure to find
food
abnormal
coordination
abnormal
digestive
physiology
CNS neuron
degeneration
abnormal
locomotion
abnormal
motor function
sleep
disturbance
abnormal
olfaction
Semantic Similarity
@micheldumontier::Biostats:19-02-1545
Given a drug effect profile D and a mouse model M, we
compute the semantic similarity as an information weighted
Jaccard metric.
The similarity measure used is non-symmetrical and
determines the amount of information about a drug effect
profile D that is covered by a set of mouse model
phenotypes M.
Loss of function models predict
targets of inhibitor drugs
• 14,682 drugs; 7,255 mouse genotypes
• Validation against known and predicted inhibitor-target pairs
– 0.76 ROC AUC for human targets (DrugBank)
– 0.81 ROC AUC for mouse targets (STITCH)
• diclofenac (STITCH:000003032)
– NSAID used to treat pain, osteoarthritis and rheumatoid arthritis
– Drug effects include liver inflammation (hepatitis), swelling of liver
(hepatomegaly), redness of skin (erythema)
– 49% explained by PPARg knockout
• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism,
proliferation, inflammation and differentiation,
• Diclofenac is a known inhibitor
– 46% explained by COX-2 knockout
• Diclofenac is a known inhibitor
@micheldumontier::Biostats:19-02-15
Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide
information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25
Computational Drug Repurposing
• Similarity
– Guilt by association
– If drug i is similar to drug j, and drug i treats
disease x, then drug j may treat disease x
• Complementarity
– if the signature of drug i complements/counters
the signature of disease x, then drug i may treat
disease x
@micheldumontier::Biostats:19-02-1547
PhenomeDrug:
phenotypic complementarity
• Extends the idea to match opposing drug-
disease phenotypes
– Drugs that induce hypotension may be useful in
treating hypertension
• Problem: We don’t have any information
about phenotypic complementarity
– We generated over 300 antonym pairs for the
Human Phenotype Ontology
– Developed a measure to compute phenotypic
complementarity
@micheldumontier::Biostats:19-02-1548
Phenotype-Based Drug Repurposing
@micheldumontier::Biostats:19-02-1549
Preliminary Results
• Suggest that for some
well annotated diseases,
we recapitulate top drug
candidates
• Quality of drug
annotation is an issue
– Some drugs have
insufficient annotations to
find “good” matches
• Full assessment underway
• Pulmonary Arterial
Hypertension
@micheldumontier::Biostats:19-02-1550
Summary
• Ontologies provide the structure and semantics
by which phenotypes can be accurately
represented and computed with
• Measures of semantic similarity in combination
with terminological integration enable a broad
diversity of ontology-based analyses, including
– Diagnosis of rare diseases
– Identifying human drug targets
– Drug repositioning
@micheldumontier::Biostats:19-02-1551
Acknowledgements
Dumontier Lab
• Tanya Hiebert
• Joachim Baran
PhenomeDrug
• Robert Hoehndorf
• George Gkoutos
Monarch Initiative
• Melissa Haendel
• Peter Robinson
• Chris Mungall
• the Monarch Team
@micheldumontier::Biostats:19-02-1552
dumontierlab.com
michel.dumontier@stanford.edu
Website: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier
53 @micheldumontier::Biostats:19-02-15

More Related Content

PPTX
Envisioning a world where everyone helps solve disease
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
PPTX
The Monarch Initiative: A semantic phenomics approach to disease discovery
PDF
Data Visualization in Biomedical Sciences: More than Meets the Eye
PDF
Enriching Scholarship Personal Genomics presentation
ODP
Mikel egana itbam_2010_ogo_system
PDF
Data Visualization to Enhance our Understanding of the Cancer Genome
PPT
DNA Testing: Living Longer Via Personal Genomics
Envisioning a world where everyone helps solve disease
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: A semantic phenomics approach to disease discovery
Data Visualization in Biomedical Sciences: More than Meets the Eye
Enriching Scholarship Personal Genomics presentation
Mikel egana itbam_2010_ogo_system
Data Visualization to Enhance our Understanding of the Cancer Genome
DNA Testing: Living Longer Via Personal Genomics

What's hot (20)

PPTX
Generating Biomedical Hypotheses Using Semantic Web Technologies
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PDF
Approaches for the Integration of Visual and Computational Analysis of Biomed...
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PPTX
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
PDF
Data analytics to support exposome research course slides
PDF
Bioinformatics Strategies for Exposome 100416
PDF
Visual Exploration of Clinical and Genomic Data for Patient Stratification
PDF
Guided visual exploration of patient stratifications in cancer genomics
PPTX
Emerging challenges in data-intensive genomics
PDF
Psb tutorial cancer_pathways
PPTX
Brief introduction to Bioinformatics
PDF
Turning Data into Knowledge - Semantic Technologies in Healthcare
PDF
DisGeNET: A discovery platform for the dynamical exploration of human disease...
PDF
NCI systems epidemiology 03012019
PDF
Introduction to Bioinformatics.
PDF
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
PPTX
FAIR as a Working Principle for Cancer Genomic Data
PDF
JALANov2000
PDF
IJSRED-V2I1P5
Generating Biomedical Hypotheses Using Semantic Web Technologies
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Approaches for the Integration of Visual and Computational Analysis of Biomed...
Methods to enhance the validity of precision guidelines emerging from big data
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
Data analytics to support exposome research course slides
Bioinformatics Strategies for Exposome 100416
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Guided visual exploration of patient stratifications in cancer genomics
Emerging challenges in data-intensive genomics
Psb tutorial cancer_pathways
Brief introduction to Bioinformatics
Turning Data into Knowledge - Semantic Technologies in Healthcare
DisGeNET: A discovery platform for the dynamical exploration of human disease...
NCI systems epidemiology 03012019
Introduction to Bioinformatics.
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
FAIR as a Working Principle for Cancer Genomic Data
JALANov2000
IJSRED-V2I1P5
Ad

Viewers also liked (20)

PPTX
DSRTF webinar: Dr. H. Craig Heller, Stanford University
PDF
The Role of Astroglia in the Development of Down Syndrome
PPTX
Down syndrome update 2014
PPT
Down Syndrome: Born with Extra
PPT
Down Syndrome
PPT
Down Syndrome
PPTX
Down syndrome
PPTX
Down syndrome presentation
PPTX
Down syndrome Characteristics, Diagnosis, Prognosis, Treatment
PPT
Down Syndrome
PPTX
Towards metrics to assess and encourage FAIRness
PPTX
Down Syndrome
PPTX
Down syndrome ppt for UGs
PPTX
Wight session 5 digital presentation
PPS
Line Upgrade Deferral Scenarios for Distributed Renewable Energy Resources
PDF
Convegno Polizie Locali Sicurezza Stradale Riccione 09 09
PDF
Design thinking in efl context
PPTX
William Kosar Training Contract Law in Rwanda
PPTX
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
PDF
Design Thinking in EFL Context
DSRTF webinar: Dr. H. Craig Heller, Stanford University
The Role of Astroglia in the Development of Down Syndrome
Down syndrome update 2014
Down Syndrome: Born with Extra
Down Syndrome
Down Syndrome
Down syndrome
Down syndrome presentation
Down syndrome Characteristics, Diagnosis, Prognosis, Treatment
Down Syndrome
Towards metrics to assess and encourage FAIRness
Down Syndrome
Down syndrome ppt for UGs
Wight session 5 digital presentation
Line Upgrade Deferral Scenarios for Distributed Renewable Energy Resources
Convegno Polizie Locali Sicurezza Stradale Riccione 09 09
Design thinking in efl context
William Kosar Training Contract Law in Rwanda
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
Design Thinking in EFL Context
Ad

Similar to Making the most of phenotypes in ontology-based biomedical knowledge discovery (20)

PPTX
Deep phenotyping for everyone
PPTX
The Application of the Human Phenotype Ontology
PPTX
Semantic phenotyping for disease diagnosis and discovery
PPTX
Use of semantic phenotyping to aid disease diagnosis
PPTX
Computing on Phenotypes AMP 2015
PDF
The Monarch Initiative Phenotype Grid
PDF
Toward interactive visual tools for comparing phenotype profiles
PPTX
GA4GH Phenotype Ontologies Task team update
PPT
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
PPTX
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
PPTX
Haendel clingenetics.3.14.14
PPTX
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
PPTX
Integrating clinical and model organism G2P data for disease discovery
PPTX
GA4GH Monarch Driver Project Introduction
PPTX
Phenopackets as applied to variant interpretation
PPTX
Global phenotypic data sharing standards to maximize diagnostic discovery
PPTX
Monarch Initiative Poster - Rare Disease Symposium 2015
PDF
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
PDF
Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014
PPTX
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Deep phenotyping for everyone
The Application of the Human Phenotype Ontology
Semantic phenotyping for disease diagnosis and discovery
Use of semantic phenotyping to aid disease diagnosis
Computing on Phenotypes AMP 2015
The Monarch Initiative Phenotype Grid
Toward interactive visual tools for comparing phenotype profiles
GA4GH Phenotype Ontologies Task team update
A Rule-Based NLP System in Tagging and Categorizing Phenotype Variables in dbGaP
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
Haendel clingenetics.3.14.14
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Integrating clinical and model organism G2P data for disease discovery
GA4GH Monarch Driver Project Introduction
Phenopackets as applied to variant interpretation
Global phenotypic data sharing standards to maximize diagnostic discovery
Monarch Initiative Poster - Rare Disease Symposium 2015
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Laura Furlong. Big Data in Biomedicine debate. Barcelona, Nov 11 2014
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...

More from Michel Dumontier (20)

PPTX
Generating (useful) synthetic data for medical research and AI application
PDF
FAIR & AI Ready KGs for Explainable Predictions.pdf
PPTX
FAIR & AI Ready KGs for Explainable Predictions
PPTX
A metadata standard for Knowledge Graphs
PPTX
Data-Driven Discovery Science with FAIR Knowledge Graphs
PDF
Evaluating FAIRness
PPTX
The Role of the FAIR Guiding Principles for an effective Learning Health System
PPTX
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
PPTX
The role of the FAIR Guiding Principles in a Learning Health System
PPTX
Acclerating biomedical discovery with an internet of FAIR data and services -...
PPTX
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
PPTX
Are we FAIR yet? And will it be worth it?
PPTX
The Future of FAIR Data: An international social, legal and technological inf...
PDF
Keynote at the 2018 Maastricht University Dinner
PPTX
The future of science and business - a UM Star Lecture
PPTX
Are we FAIR yet?
PPTX
Developing and assessing FAIR digital resources
PPTX
Advancing Biomedical Knowledge Reuse with FAIR
PPTX
A Framework to develop the FAIR Metrics
PPTX
FAIR principles and metrics for evaluation
Generating (useful) synthetic data for medical research and AI application
FAIR & AI Ready KGs for Explainable Predictions.pdf
FAIR & AI Ready KGs for Explainable Predictions
A metadata standard for Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
Evaluating FAIRness
The Role of the FAIR Guiding Principles for an effective Learning Health System
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
The role of the FAIR Guiding Principles in a Learning Health System
Acclerating biomedical discovery with an internet of FAIR data and services -...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Are we FAIR yet? And will it be worth it?
The Future of FAIR Data: An international social, legal and technological inf...
Keynote at the 2018 Maastricht University Dinner
The future of science and business - a UM Star Lecture
Are we FAIR yet?
Developing and assessing FAIR digital resources
Advancing Biomedical Knowledge Reuse with FAIR
A Framework to develop the FAIR Metrics
FAIR principles and metrics for evaluation

Recently uploaded (20)

PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPT
protein biochemistry.ppt for university classes
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Taita Taveta Laboratory Technician Workshop Presentation.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
POSITIONING IN OPERATION THEATRE ROOM.ppt
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Placing the Near-Earth Object Impact Probability in Context
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ECG_Course_Presentation د.محمد صقران ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2Systematics of Living Organisms t-.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
2. Earth - The Living Planet Module 2ELS
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
protein biochemistry.ppt for university classes
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

Making the most of phenotypes in ontology-based biomedical knowledge discovery

  • 1. Making the most of phenotypes in ontology-based biomedical knowledge discovery 1 Michel Dumontier, Ph.D. Associate Professor of Medicine (Biomedical Informatics) Stanford University @micheldumontier::Biostats:19-02-15
  • 2. Topics • Computable Phenotypes • Methods to compare Phenotypes • Cross-Species Phenotype Integration • Applications – Undiagnosed Diseases – Drug Target Identification – Drug Repurposing @micheldumontier::Biostats:19-02-152
  • 3. Phenotypes • A phenotype is an observable characteristic of an individual and typically pertains to its morphology, function, and behavior. – qualitative, deals with normal and abnormal phen. – red eye color, abnormal gait, enlarged colon @micheldumontier::Biostats:19-02-153
  • 4. Diagnosis uses observable/measured phenotypes “Phenotypic Profile” @micheldumontier::Biostats:19-02-154
  • 5. Matching patients to diseases Patient Disease X Differential diagnosis with similar but non-matching phenotypes is difficult Flat back of head Hypotonia Abnormal skull morphology Decreased muscle mass @micheldumontier::Biostats:19-02-155
  • 6. Differential diagnosis becomes challenging with rare and complex disorders • Over 7000 rare diseases • < 1 in 1500-2500 • Most have fewer than 50 case reports • Nearly 1 in 10 Americans suffer from one or more rare diseases • Only 250 medicinal products have been approved to diagnose and treat rare diseases @micheldumontier::Biostats:19-02-156 Carpenter Syndrome - acrocephalopolysyndactyly (ACPS) disorder - 40 cases described in the literature - <1 in 1M
  • 7. Genotypes + Phenotypes Improves Diagnosis @micheldumontier::Biostats:19-02-157 Remove off-target, common variants, and variants not in known disease causing genes http://compbio.charite.de/PhenIX/ Target panel of 2741 known Mendelian disease genes Compare phenotype profiles from: Clinvar, OMIM, Orphanet Zemojtel et al. Sci Transl Med. 2014. 6(252):252ra123
  • 8. PhenIX helped diagnose 11/40 patients @micheldumontier::Biostats:19-02-158
  • 9. So how did they do it? 1. Computable representation of phenotypes 2. Methods to compare phenotype profiles 3. Using model organisms to increase coverage of the phenotype space @micheldumontier::Biostats:19-02-159
  • 10. Difficult to find all results using text searches @micheldumontier::Biostats:19-02-1510
  • 11. The Human Phenotype Ontology: A Computable Representation of Human Phenotypes 11,000+ classes Follows the True Path Rule Used to annotate: • Patients • Disorders/Diseases • Genes, Gene Variants, & Genotypes Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74. @micheldumontier::Biostats:19-02-1511
  • 12. HPO has unique terms @micheldumontier::Biostats:19-02-1512 Winnenburg and Bodenreider, ISMB PhenoDay, 2014
  • 13. Increased numbers of diseases are described using the HPO @micheldumontier::Biostats:19-02-1513 Phenotype annotations per species http://www.monarchinitiative.org
  • 14. Phenotype “BLAST”: Which phenotypic profile is most similar? Disease X Patient Disease Y @micheldumontier::Biostats:19-02-1514
  • 15. Phenotips: Getting high quality patient phenotypes @micheldumontier::Biostats:19-02-1515 Girdea et al. (2013), PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat., 34: 1057–1065. doi: 10.1002/humu.22347
  • 16. Semantic Similarity • Semantic similarity is a metric defined over a set of terms, where the distance between them is based on their meaning. • It can be estimated by examining, for instance, – Topological similarity – Information content – Statistical co-occurrence • Widely used in bioinformatics for gene enrichment, function prediction, network screening, clustering, etc. @micheldumontier::Biostats:19-02-1516
  • 18. Measures of Semantic Similarity Edge-Based Measures – Shortest path (Rada) – Common path – Scaling by depth, etc. • Requires uniform distribution of nodes and edges Node-based Measures – Shared terms – Common ancestors – Information content (IC) • Better able to account for structural heterogeneity Set comparisons • Pairwise – Max/average/sum – All or best pairs • Groupwise – Set, graph, vector – Various combinations Implementations – Semanticmeasureslibrary.org – OWL-SIM @micheldumontier::Biostats:19-02-1518 Semantic Similarity in Biomedical Ontologies PLoS Comput Biol. 2009 Jul; 5(7): e1000443.
  • 19. Term specificity @micheldumontier::Biostats:19-02-1519 𝑖𝑐 𝑡 = −log(𝑃 𝑡 ) 𝑖𝑐 𝑡 = 𝑑𝑒𝑝𝑡ℎ 𝑡 𝑥 1 − log 𝑑𝑒𝑠𝑐 𝑡 + 1 log 𝑡𝑜𝑡𝑎𝑙 𝑡𝑒𝑟𝑚𝑠 Structure-based Corpus-based by: Heiko Muller, CSIRO
  • 20. Group-wise Similarity @micheldumontier::Biostats:19-02-1520 𝐽 𝐴, 𝐵 = |𝐴 ∩ 𝐵| |𝐴 ∪ 𝐵| 𝐽 g1, g2 = 6 11 = 0.55
  • 21. Group-wise Semantic Similarity @micheldumontier::Biostats:19-02-1521 IC(g1) = 10.66 IC(g1) = 9.79 IC(g1 ⊕ g2) = 2.79 ------------------- sim(g1,g2)=0.27 𝑠𝑖𝑚 g1, g2 = 1 2 𝐼𝐶(g1 ⊕ g2) 𝐼𝐶(g1) + 𝐼𝐶(g1 ⊕ g2) 𝐼𝐶(g2) X. Chen et al. Gene. 2012. 509(1):131-5
  • 22. Robustness of phenotype annotations @micheldumontier::Biostats:19-02-1522
  • 23. Image credit: Viljoen and Beighton, J Med Genet. 1992 Schwartz-jampel Syndrome, Type I  Schwartz-jampel Syndrome, Type I  Caused by Hspg2 mutation, a proteoglycan ~100 phenotype annotations @micheldumontier::Biostats:19-02-1523
  • 24. Similarity of Schwartz-jampel Syndrome derivations @micheldumontier::Biostats:19-02-1524
  • 25. Semantic similarity is robust in the face of missing information  92% of derived profiles are most similar to original disease profile Profile Similarity Derived Profile Rank @micheldumontier::Biostats:19-02-1525
  • 26. Semantic similarity algorithms are sensitive to specificity of information  The more general the phenotype, the poorer the match the disease Profile Similarity Derived Profile Rank @micheldumontier::Biostats:19-02-1526
  • 28. Problem: less than 40% of human genes are annotated with phenotypes GWAS + ClinVar + OMIM @micheldumontier::Biostats:19-02-1528
  • 29. B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) A multi-species inventory of phenotypes from genetic perturbations @micheldumontier::Biostats:19-02-1529
  • 30. Down Syndrome Mouse @micheldumontier::Biostats:19-02-1530 Ts65Dn mice survive to adulthood and express some characteristics of Down syndrome such as developmental delay, hyperactivity, weight problems, craniofacial dysmorphology, impaired learning, and behavior deficit
  • 31. Each species uniquely covers a different set of phenotypes Provides an opportunity to use this information to inform human disease @micheldumontier::Biostats:19-02-1531
  • 32. Human and model phenotypes can be linked to >75% human genes @micheldumontier::Biostats:19-02-1532
  • 33. Problem: Clinical and model phenotypes are described differently @micheldumontier::Biostats:19-02-1533
  • 34. lung lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype Mouse Anatomy FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development lung lung bud respiratory primordium pharyngeal region Problem: Each organism uses different vocabularies develops_from part_of is_a (SubClassOf) surrounded_by @micheldumontier::Biostats:19-02-1534
  • 35. Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology abnormal endocrine pancreas morphology abnormal pancreatic beta cell number abnormal pancreatic alpha cell morphology abnormal pancreatic alpha cell differentiation abnormal pancreatic alpha cell number @micheldumontier::Biostats:19-02-1535
  • 36. Enhance lexical approach with OWL bridging axioms • Key idea: – Describe the phenotype in a machine-interpretable way • Break it down into digestible chunks! • Logical definition – The machine will then be able to help you • Match phenotypes • Automate ontology checking and addition of new terms • Approach: – Use Web Ontology Language (OWL), a description logic to describe phenotypes – Use OWL reasoning to find connections Mungall et al. (2012). Genome Biology, 13(1), R5 Köhler et al. (2014) F1000Research 2:30 Haendel et al. (2014) JBMS 5:21 Hoendorf et al. (2011). NAR 39(18):e119 Hoendorf et al. (2011) Bioinformatics 27(7):1001 @micheldumontier::Biostats:19-02-1536
  • 37. abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology type B pancreatic cell islet of Langerhans endocrine pancreas part of part of abnormal endocrine pancreas morphology part of @micheldumontier::Biostats:19-02-1537
  • 38. abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology type B pancreatic cell islet of Langerhans endocrine pancreas part of part of abnormal endocrine pancreas morphology part of mass morphology quality @micheldumontier::Biostats:19-02-1538
  • 39. Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology abnormal endocrine pancreas morphology abnormal pancreatic beta cell number abnormal pancreatic alpha cell morphology abnormal pancreatic alpha cell differentiation abnormal pancreatic alpha cell number inferred from CL inferred from PATO ‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some amount ‘abnormal phenotype’ and has_entity some ‘type B pancreatic cell’ and has_quality some ‘reduced amount’ @micheldumontier::Biostats:19-02-1539
  • 41. PhenomeDrug Computational methods that use phenotypes to predict drug targets, drug effects, and drug indications @micheldumontier::Biostats:19-02-1541
  • 42. animal models provide insight for on target effects • In the majority of 100 best selling drugs ($148B in US alone), there is a direct correlation between knockout phenotype and drug effect • Immunological Indications – Anti-histamines (Claritin, Allegra, Zyrtec) – KO of histamine H1 receptor leads to decreased responsiveness of immune system – Predicts on target effects : drowsiness, reduced anxiety @micheldumontier::Biostats:19-02-1542 Zambrowicz and Sands. Nat Rev Drug Disc. 2003.
  • 43. Identifying drug targets from mouse knock-out phenotypes @micheldumontier::Biostats:19-02-1543 drug gene phenotypes effects human gene non-functional gene model ortholog similar inhibits Main idea: if a drug’s phenotypes matches the phenotypes of a null model, this suggests that the drug is an inhibitor of the gene
  • 44. Terminological Interoperability (we must compare apples with apples) Mouse Phenotypes Drug effects (mappings from UMLS to DO, NBO, MP) Mammalian Phenotype OntologyPhenomeNet PhenomeDrug @micheldumontier::Biostats:19-02-15 poor coordination decreased gut peristalsis axon degeneration decreased stride length erotypic ehavior Abnormal EEG failure to find food Unstable posture Constipation Neuronal loss in Substantia Nigra Shuffling gait Resting tremors REM disorder Hyposmia poor rotarod performance decreased gut peristalsis axon degeneration decreased stride length sterotypic behavior abnormal EEG failure to find food abnormal coordination abnormal digestive physiology CNS neuron degeneration abnormal locomotion abnormal motor function sleep disturbance abnormal olfaction
  • 45. Semantic Similarity @micheldumontier::Biostats:19-02-1545 Given a drug effect profile D and a mouse model M, we compute the semantic similarity as an information weighted Jaccard metric. The similarity measure used is non-symmetrical and determines the amount of information about a drug effect profile D that is covered by a set of mouse model phenotypes M.
  • 46. Loss of function models predict targets of inhibitor drugs • 14,682 drugs; 7,255 mouse genotypes • Validation against known and predicted inhibitor-target pairs – 0.76 ROC AUC for human targets (DrugBank) – 0.81 ROC AUC for mouse targets (STITCH) • diclofenac (STITCH:000003032) – NSAID used to treat pain, osteoarthritis and rheumatoid arthritis – Drug effects include liver inflammation (hepatitis), swelling of liver (hepatomegaly), redness of skin (erythema) – 49% explained by PPARg knockout • peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism, proliferation, inflammation and differentiation, • Diclofenac is a known inhibitor – 46% explained by COX-2 knockout • Diclofenac is a known inhibitor @micheldumontier::Biostats:19-02-15 Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Mouse model phenotypes provide information about human drug targets. Bioinformatics. 2014 Mar 1;30(5):719-25
  • 47. Computational Drug Repurposing • Similarity – Guilt by association – If drug i is similar to drug j, and drug i treats disease x, then drug j may treat disease x • Complementarity – if the signature of drug i complements/counters the signature of disease x, then drug i may treat disease x @micheldumontier::Biostats:19-02-1547
  • 48. PhenomeDrug: phenotypic complementarity • Extends the idea to match opposing drug- disease phenotypes – Drugs that induce hypotension may be useful in treating hypertension • Problem: We don’t have any information about phenotypic complementarity – We generated over 300 antonym pairs for the Human Phenotype Ontology – Developed a measure to compute phenotypic complementarity @micheldumontier::Biostats:19-02-1548
  • 50. Preliminary Results • Suggest that for some well annotated diseases, we recapitulate top drug candidates • Quality of drug annotation is an issue – Some drugs have insufficient annotations to find “good” matches • Full assessment underway • Pulmonary Arterial Hypertension @micheldumontier::Biostats:19-02-1550
  • 51. Summary • Ontologies provide the structure and semantics by which phenotypes can be accurately represented and computed with • Measures of semantic similarity in combination with terminological integration enable a broad diversity of ontology-based analyses, including – Diagnosis of rare diseases – Identifying human drug targets – Drug repositioning @micheldumontier::Biostats:19-02-1551
  • 52. Acknowledgements Dumontier Lab • Tanya Hiebert • Joachim Baran PhenomeDrug • Robert Hoehndorf • George Gkoutos Monarch Initiative • Melissa Haendel • Peter Robinson • Chris Mungall • the Monarch Team @micheldumontier::Biostats:19-02-1552