SlideShare a Scribd company logo
Generating (and Testing) Biomedical Hypotheses
Using Semantic Web Technologies

Michel Dumontier, Ph.D.
Associate Professor of Medicine (Biomedical Informatics)
Stanford University

1

@micheldumontier::AAAI-FSW:Nov 16, 2013
2

@micheldumontier::AAAI-FSW:Nov 16, 2013
Science, it works ******

3

@micheldumontier::AAAI-FSW:Nov 16, 2013
We use the biomedical literature to find out what we
believe to be true

4

@micheldumontier::AAAI-FSW:Nov 16, 2013
we need to access to the most recent biomedical data
(problems: access, format, identifiers & linking)

5

@micheldumontier::AAAI-FSW:Nov 16, 2013
we also need access to the most effective software
to analyze, predict and evaluate
(problems: OS, versioning, input/output formats)

6

@micheldumontier::AAAI-FSW:Nov 16, 2013
we build support for our hypotheses by constructing
and (partly) reusing fairly sophisticated workflows

7

@micheldumontier::AAAI-FSW:Nov 16, 2013
Wouldn’t it be great if we could just find the evidence
required to support or dispute a scientific hypothesis using
the most up-to-date and relevant data, tools and scientific
knowledge?
8

@micheldumontier::AAAI-FSW:Nov 16, 2013
madness!
1. Build a massive network of interconnected
data and software using web standards
2. Develop methods to generate and test
hypotheses across these data
1. tactical formalization
2. uncovering associations
3. evidence gathering

3. Contribute back to the global knowledge
graph
9

@micheldumontier::AAAI-FSW:Nov 16, 2013
The Semantic Web
is the new global web of knowledge
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently formulated
and distributed knowledge

10

@micheldumontier::AAAI-FSW:Nov 16, 2013
we’re building an incredible network of linked data

11 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch

@micheldumontier::AAAI-FSW:Nov 16, 2013
Bio2RDF Linked Data Map

Bio2RDF Release 2: Improved coverage, interoperability and provenance of Life Science Linked Data . ESWC 2013.
12

@micheldumontier::AAAI-FSW:Nov 16, 2013
At the heart of Linked Data for the Life Sciences

chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers,
treatments
Terminologies & publications

• Free and open source
• Uses Semantic Web standards
• Release 2 (Jan 2013): 1B+ interlinked statements from 19 conventional and high
value datasets
• Includes provenance, statistics
• Partnerships with EBI, NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool
providers

13

A Callahan, J Cruz-Toledo, M Dumontier. Querying Bio2RDF Linked Open Data with a Global Schema.
2013. Journal of Biomedical Semantics. 4(Suppl 1):S1.
@micheldumontier::AAAI-FSW:Nov 16, 2013
Resource Description Framework
• It’s a language to represent knowledge
– Logic-based formalism -> automated reasoning
– graph-like properties -> data analysis

• Good for
– Describing in terms of type, attributes, relations
– Integrating data from different sources
– Sharing the data (W3C standard)
– Reuse what is available, develop what you’re
missing.
14

Dumontier::FDA:Nov 15,2013
drugbank_vocabulary:Drug

rdf:type
rdfs:label
drugbank:DB00586

Diclofenac [drugbank:DB00586]

drugbank_vocabulary:targets
drugbank_target:290

rdfs:label

Prostaglandin G/H synthase 2
[drugbank_target:290]

rdf:type
drugbank_vocabulary:Target
15

Dumontier::FDA:Nov 15,2013
The linked data network expands
with every reference
DrugBank

drugbank_vocabulary:Drug

rdf:type
rdfs:label
drugbank:DB00586

diclofenac [drugbank:DB00586]

pharmgkb_vocabulary:xref
pharmgkb:PA449293

rdfs:label

PharmGKB

diclofenac [pharmgkb:PA449293]

pharmgkb_vocabulary:Drug
16

Dumontier::FDA:Nov 15,2013
Federated Queries over Independent
SPARQL EndPoints
Get all protein catabolic processes (and more specific) from biomodels

SELECT ?go ?label count(distinct ?x)
WHERE {
service <http://bioportal.bio2rdf.org/sparql> {
?go rdfs:label ?label .
?go rdfs:subClassOf ?tgo
?tgo rdfs:label ?tlabel .
FILTER regex(?tlabel, "^protein catabolic process")
}
service <http://biomodels.bio2rdf.org/sparql> {
?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go .
?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .
}
}

17

@micheldumontier::AAAI-FSW:Nov 16, 2013
Bio2RDF: 1M+ SPARQL queries per month
UniProt: 6.4M queries per month
EBI: 3.5M queries (Oct 2013)

18

@micheldumontier::AAAI-FSW:Nov 16, 2013
Directions
• research
– approaches for data publication
– methods for large scale data reduction
– methods for mining associations

• service
– coverage, quality and licensing
– better user interfaces
– sustainability of service
19

@micheldumontier::AAAI-FSW:Nov 16, 2013
tactical formalization
a. discovery of drug and disease pathway associations
b. prediction of drug targets using phenotypes

20

@micheldumontier::AAAI-FSW:Nov 16, 2013
ontology as a
strategy to formally
represent and
integrate knowledge

21

@micheldumontier::AAAI-FSW:Nov 16, 2013
Have you heard of OWL?

22

@micheldumontier::AAAI-FSW:Nov 16, 2013
Identification of
disease enriched pathways
def: A biological pathway is constituted by a
set of molecular components that undertake
some biological transformation to achieve a
stated objective
glycolysis : a pathway that converts glucose to
pyruvate

23

@micheldumontier::AAAI-FSW:Nov 16, 2013
aberrant pathways
which pathways are associated with disease?
Which pathways can be perturbed by drugs?
disease
pathway

drug
24

@micheldumontier::AAAI-FSW:Nov 16, 2013
Identification of
drug and disease enriched pathways
• Approach
– Integrate 3 datasets: DrugBank, PharmGKB and
CTD
– Integrate 7 terminologies: MeSH, ATC, ChEBI,
UMLS, SNOMED, ICD, DO
– Identify significant pathways via enrichment
analysis over the fully inferred knowledge base

• Formalized as an OWL-EL ontology
– 650,000+ classes, 3.2M subClassOf axioms, 75,000
equivalentClass axioms
25

Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics.
Bioinformatics. 2012.
@micheldumontier::AAAI-FSW:Nov 16, 2013
Top Level Classes
(disjointness)

pathway

drug

gene

disease
Reciprocal
Existentials

Class subsumption

mercaptopurine
[pharmgkb:PA450379]

purine-6-thiol
[CHEBI:2208]

Class Equivalence
mercaptopurine
[drugbank:DB01033]

mercaptopurine
[mesh:D015122]
mercaptopurine
[ATC:L01BB02]

drug
property chains

pathway

drug
disease

gene
26

@micheldumontier::AAAI-FSW:Nov 16, 2013
Benefits: Enhanced Query Capability
– Use any mapped terminology to query a target resource.
– Use knowledge in target ontologies to formulate more
precise questions
• ask for drugs that are associated with diseases of the joint:
‘Chikungunya’ (do:0050012) is defined as a viral infectious disease
located in the ‘joint’ (fma:7490) and caused by a ‘Chikungunya
virus’ (taxon:37124).

– Learn relationships that are inferred by automated
reasoning.
• alcohol (ChEBI:30879) is associated with alcoholism (PA443309)
since alcoholism is directly associated with ethanol (CHEBI:16236)
• ‘parasitic infectious disease’ (do:0001398) retrieves 129 disease
associated drugs, 15 more than are directly associated.

27

@micheldumontier::AAAI-FSW:Nov 16, 2013
Knowledge Discovery through Data
Integration and Enrichment Analysis
• OntoFunc: Tool to discover significant associations between sets of objects
and ontology categories. enrichment of attribute among a selected set of
input items as compared to a reference set. hypergeometric or the
binomial distribution, Fisher's exact test, or a chi-square test.
• We found 22,653 disease-pathway associations, where for each pathway
we find genes that are linked to disease.
– Mood disorder (do:3324) associated with Zidovudine Pathway
(pharmgkb:PA165859361). Zidovudine is for treating HIV/AIDS. Side
effects include fatigue, headache, myalgia, malaise and anorexia
• We found 13,826 pathway-chemical associations
– Clopidogrel (chebi:37941) associated with Endothelin signaling
pathway (pharmgkb:PA164728163). Endothelins are proteins that
constrict blood vessels and raise blood pressure. Clopidogrel inhibits
platelet aggregation and prolongs bleeding time.
28

@micheldumontier::AAAI-FSW:Nov 16, 2013
PhenomeDrug
A computational approach to predict drug
targets, drug effects, and drug indications using
phenotypic information

Mouse model phenotypes provide information about human drug targets.
Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M.
Bioinformatics. 2013.
29

@micheldumontier::AAAI-FSW:Nov 16, 2013
animal models provide insight for on target effects
• In the majority of 100 best selling drugs ($148B in US
alone), there is a direct correlation between knockout
phenotype and drug effect
• Gastroesophageal reflux
– Proton pump inhibitors (e.g Prilosec - $5B)
– KO of alpha or beta unit of H/K ATPase are achlorhydric (normal
pH) in stomach compared to wild type (pH 3-5) when exposed
to acidic solution

• Immunological Indications
– Anti-histamines (Claritin, Allegra, Zyrtec)
– KO of histamine H1 receptor leads to decreased responsiveness
of immune system
– Predicts on target effects : drowsiness, reduced anxiety
Zambrowicz and Sands. Nat Rev Drug Disc. 2003.
30

@micheldumontier::AAAI-FSW:Nov 16, 2013
Identifying drug targets
from mouse knock-out phenotypes
Main idea: if a drug’s phenotypes matches the phenotypes of a
null model, this suggests that the drug is an inhibitor of the gene
phenotypes

similar
effects

non-functional
gene model

drug

ortholog

gene
31

inhibits
human gene
@micheldumontier::AAAI-FSW:Nov 16, 2013
Terminological Interoperability
(we must compare apples with apples)
Mouse
Phenotypes

PhenomeNet

Mammalian
Phenotype
Ontology

PhenomeDrug

Drug effects
(mappings from UMLS to DO, NBO, MP)
@micheldumontier::AAAI-FSW:Nov 16, 2013
Semantic Similarity
Given a drug effect profile D and a mouse model M, we
compute the semantic similarity as an information weighted
Jaccard metric.

The similarity measure used is non-symmetrical and
determines the amount of information about a drug effect
profile D that is covered by a set of mouse model
phenotypes M.

33

@micheldumontier::AAAI-FSW:Nov 16, 2013
null models do well in predicting
targets of drug inhibitors
• 14,682 drugs; 7,255 mouse genotypes
• Validation against known and predicted inhibitor-target pairs
– 0.739 ROC AUC for human targets (DrugBank)
– 0.736 ROC AUC for mouse targets (STITCH)
– 0.705 ROC AUC for human targets (STITCH)

• diclofenac (STITCH:000003032)
– NSAID used to treat pain, osteoarthritis and rheumatoid arthritis
– Drug effects include liver inflammation (hepatitis), swelling of liver
(hepatomegaly), redness of skin (erythema)
– 49% explained by PPARg knockout
• peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism,
proliferation, inflammation and differentiation,
• Diclofenac is a known inhibitor

– 46% explained by COX-2 knockout
• Diclofenac is a known inhibitor
@micheldumontier::AAAI-FSW:Nov 16, 2013
Finding evidence to support or dispute a hypothesis takes
some digging around

35

@micheldumontier::AAAI-FSW:Nov 16, 2013
FDA Use Case:
TKI non-QT Cardiotoxicity
• Tyrosine Kinase Inhibitors (TKI)
– Imatinib, Sorafenib, Sunitinib, Dasatinib, Nilotinib, Lapatinib
– Used to treat cancer
– Linked to cardiotoxicity.

• FDA launched drug safety program to detect toxicity
– Need to integrate data and ontologies (Abernethy, CPT 2011)
– Abernethy (2013) suggest using public data in genetics,
pharmacology, toxicology, systems biology, to
predict/validate adverse events

• What evidence could we gather to give credence that
TKI’s causes non-QT cardiotoxicity?
36

@micheldumontier::AAAI-FSW:Nov 16, 2013
HyQue
• The goal of HyQue is retrieve and evaluate evidence
that supports/disputes a hypothesis
– hypotheses are described as a set of events
• e.g. binding, inhibition, phenotypic effect

– events are associated with types of evidence
• a query is written to retrieve data
• a weight is assigned to provide significance

• Hypotheses are written by people who seek answers
• data retrieval rules are written by people who know the
data and how it should be interpreted 
1. HyQue: Evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
2. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC
2012). Heraklion, Crete. May 27-31, 2012.
37

@micheldumontier::AAAI-FSW:Nov 16, 2013
HyQue: A Semantic Web Application

Hypothesis

Evaluation

Data

Ontologies

Software

@micheldumontier::AAAI-

38
SPIN functions
to evaluate TKI Cardiotoxicity
• Effects [r3.sider, lit]
• Does the drug have known cardiotoxic effects?

• Models [r2.mgi]
• Are there cardiotoxic phenotypes in null mouse models of target genes?

• Assays [ebi.chembl]
• Is hERG inhibited?

• Is the drug TUNEL positive?

• Targets [r2.drugbank, lit]
• Does the drug inhibit cardiotoxic-related proteins?
• RAF1, PDFGFR, VEGFR, AMPK

39

@micheldumontier::AAAI-FSW:Nov 16, 2013
40

@micheldumontier::AAAI-FSW:Nov 16, 2013
http://bio2rdf.org/drugbank:DB01268
41

@micheldumontier::AAAI-FSW:Nov 16, 2013
42

@micheldumontier::AAAI-FSW:Nov 16, 2013
43

@micheldumontier::AAAI-FSW:Nov 16, 2013
In Summary
• This talk was about making sense of the
structured data we already have
• RDF-based Linked Open Data acts as a substrate
for query answering and task-based formalization
in OWL
• Discovery through the generation of testable
hypotheses in the target domain.
• The next big challenge lies in capturing the
output of computational experiments just as
much as extracting/formalizing user-contributed
knowledge .
44

@micheldumontier::AAAI-FSW:Nov 16, 2013
Acknowledgements
Bio2RDF Release 2:
Allison Callahan, Jose Cruz-Toledo, Peter Ansell

Aberrant Pathways: Robert Hoehndorf, Georgios Gkoutos
PhenomeDrug: Tanya Hiebert, Robert Hoehndorf, Georgios
Gkoutos, Paul Schofield
HyQue: Alison Callahan, Nigam Shah

45

@micheldumontier::AAAI-FSW:Nov 16, 2013
dumontierlab.com
michel.dumontier@stanford.edu
Website: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier

46

@micheldumontier::AAAI-FSW:Nov 16, 2013

More Related Content

PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PDF
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PPTX
2016 bmdid-mappings
PPTX
Exploring Chemical and Biological Knowledge Spaces with PubChem
PDF
Link Analysis of Life Sciences Linked Data
PDF
When pharmaceutical companies publish large datasets an abundance of riches o...
PDF
Nucl. Acids Res.-2014-Howe-nar-gku1244
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
2016 bmdid-mappings
Exploring Chemical and Biological Knowledge Spaces with PubChem
Link Analysis of Life Sciences Linked Data
When pharmaceutical companies publish large datasets an abundance of riches o...
Nucl. Acids Res.-2014-Howe-nar-gku1244

What's hot (20)

PDF
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
PDF
Open Access and Property Rights on a Collision Course with Scholars
PDF
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
DOCX
Natalya Csatari_Library Database and Website Search
PPTX
Guide to Malaria Pharmacology, GEMM 2019
PPTX
GuideToImmunopharmacology_SIF_Nov2019
PPTX
GtoPDB_ELIXIR_UK_AllHands_update_Dec2019
PPTX
PubChem: a public chemical information resource for big data chemistry
PPTX
PubChem as a resource for chemical information training
PPTX
Antimalarial drug dscovery data disclosure
PPTX
Toxicological information in PubChem
PPTX
Searching for chemical information using PubChem
PDF
Guide to Pharmacology Poster - ELIXIR All Hands 2020
PDF
PubChem for chemical information literacy training
PDF
PubChem for drug discovery in the age of big data and artificial intelligence
PPTX
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
PPTX
Making the most of phenotypes in ontology-based biomedical knowledge discovery
PPTX
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
PPTX
Patent chemisty big bang: utilities for SMEs
PPTX
GtoPdb and GtoImmuPdb in context
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
Open Access and Property Rights on a Collision Course with Scholars
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Natalya Csatari_Library Database and Website Search
Guide to Malaria Pharmacology, GEMM 2019
GuideToImmunopharmacology_SIF_Nov2019
GtoPDB_ELIXIR_UK_AllHands_update_Dec2019
PubChem: a public chemical information resource for big data chemistry
PubChem as a resource for chemical information training
Antimalarial drug dscovery data disclosure
Toxicological information in PubChem
Searching for chemical information using PubChem
Guide to Pharmacology Poster - ELIXIR All Hands 2020
PubChem for chemical information literacy training
PubChem for drug discovery in the age of big data and artificial intelligence
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Patent chemisty big bang: utilities for SMEs
GtoPdb and GtoImmuPdb in context
Ad

Viewers also liked (20)

PPTX
Data Science for the Win
PPTX
Bio2RDF : A biological knowledge base for the Semantic Web
PDF
Knowledge Discovery using an Integrated Semantic Web
PPT
Curation of scientifica data: Challenges for repositories
PPTX
Table mining and data curation from biomedical literature
PDF
Knowledge management for integrative omics data analysis
PDF
Linux for bioinformatics
PDF
Semantic Web from the 2013 Perspective
PDF
Bio2RDF @ W3C HCLS2009
PPTX
Tendencias Storage
PPTX
Workshop NGS data analysis - 2
PDF
Crowdsourcing Linked Data Quality Assessment
PDF
Linked Data in Healthcare and Life Sciences
PPT
PPS
Flowers
PPTX
What Is Windows Azure
ODP
Web 2.0
PDF
Reputation snapshot for the banking industry, 2012, final
PPT
Writing An Introduction
PPT
Tema 5 1º bach tangencias y enlaces v4
Data Science for the Win
Bio2RDF : A biological knowledge base for the Semantic Web
Knowledge Discovery using an Integrated Semantic Web
Curation of scientifica data: Challenges for repositories
Table mining and data curation from biomedical literature
Knowledge management for integrative omics data analysis
Linux for bioinformatics
Semantic Web from the 2013 Perspective
Bio2RDF @ W3C HCLS2009
Tendencias Storage
Workshop NGS data analysis - 2
Crowdsourcing Linked Data Quality Assessment
Linked Data in Healthcare and Life Sciences
Flowers
What Is Windows Azure
Web 2.0
Reputation snapshot for the banking industry, 2012, final
Writing An Introduction
Tema 5 1º bach tangencias y enlaces v4
Ad

Similar to Generating Biomedical Hypotheses Using Semantic Web Technologies (20)

PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
PDF
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
PPTX
Bio2RDF and Beyond!
PPTX
Quantifying the content of biomedical semantic resources as a core for drug d...
PPT
Towards semantic systems chemical biology
PDF
Linking Linked Data CSHALS2013
PPTX
Scripps bioinformatics seminar_day_2
PPT
Semantic Web for Health Care and Biomedical Informatics
PPT
2011-10-11 Open PHACTS at BioIT World Europe
PDF
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
PDF
Drug Discovery and Development Using AI
PDF
Current advances to bridge the usability-expressivity gap in biomedical seman...
PDF
Drug Repositioning Conference Washington DC 20190923
PDF
Final Acb All Hands 26 11 07.Key
PPT
Revolution in the Connectivity Between Medicinal Chemistry and Biology
PDF
NETWORK PharmaCOLOGY. Pharmacology dept.
PDF
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
PPTX
FAIR & AI Ready KGs for Explainable Predictions
PDF
Assessing Drug Safety Using AI
PDF
Drug Repurposing using Deep Learning on Knowledge Graphs
Building a Network of Interoperable and Independently Produced Linked and Ope...
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
Bio2RDF and Beyond!
Quantifying the content of biomedical semantic resources as a core for drug d...
Towards semantic systems chemical biology
Linking Linked Data CSHALS2013
Scripps bioinformatics seminar_day_2
Semantic Web for Health Care and Biomedical Informatics
2011-10-11 Open PHACTS at BioIT World Europe
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Drug Discovery and Development Using AI
Current advances to bridge the usability-expressivity gap in biomedical seman...
Drug Repositioning Conference Washington DC 20190923
Final Acb All Hands 26 11 07.Key
Revolution in the Connectivity Between Medicinal Chemistry and Biology
NETWORK PharmaCOLOGY. Pharmacology dept.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
FAIR & AI Ready KGs for Explainable Predictions
Assessing Drug Safety Using AI
Drug Repurposing using Deep Learning on Knowledge Graphs

More from Michel Dumontier (20)

PPTX
Generating (useful) synthetic data for medical research and AI application
PDF
FAIR & AI Ready KGs for Explainable Predictions.pdf
PPTX
A metadata standard for Knowledge Graphs
PPTX
Data-Driven Discovery Science with FAIR Knowledge Graphs
PDF
Evaluating FAIRness
PPTX
The Role of the FAIR Guiding Principles for an effective Learning Health System
PPTX
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
PPTX
The role of the FAIR Guiding Principles in a Learning Health System
PPTX
Acclerating biomedical discovery with an internet of FAIR data and services -...
PPTX
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
PPTX
Are we FAIR yet? And will it be worth it?
PPTX
The Future of FAIR Data: An international social, legal and technological inf...
PDF
Keynote at the 2018 Maastricht University Dinner
PPTX
The future of science and business - a UM Star Lecture
PPTX
Are we FAIR yet?
PPTX
Developing and assessing FAIR digital resources
PPTX
Advancing Biomedical Knowledge Reuse with FAIR
PPTX
A Framework to develop the FAIR Metrics
PPTX
FAIR principles and metrics for evaluation
PPTX
Towards metrics to assess and encourage FAIRness
Generating (useful) synthetic data for medical research and AI application
FAIR & AI Ready KGs for Explainable Predictions.pdf
A metadata standard for Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
Evaluating FAIRness
The Role of the FAIR Guiding Principles for an effective Learning Health System
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
The role of the FAIR Guiding Principles in a Learning Health System
Acclerating biomedical discovery with an internet of FAIR data and services -...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Are we FAIR yet? And will it be worth it?
The Future of FAIR Data: An international social, legal and technological inf...
Keynote at the 2018 Maastricht University Dinner
The future of science and business - a UM Star Lecture
Are we FAIR yet?
Developing and assessing FAIR digital resources
Advancing Biomedical Knowledge Reuse with FAIR
A Framework to develop the FAIR Metrics
FAIR principles and metrics for evaluation
Towards metrics to assess and encourage FAIRness

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.

Generating Biomedical Hypotheses Using Semantic Web Technologies

  • 1. Generating (and Testing) Biomedical Hypotheses Using Semantic Web Technologies Michel Dumontier, Ph.D. Associate Professor of Medicine (Biomedical Informatics) Stanford University 1 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 3. Science, it works ****** 3 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 4. We use the biomedical literature to find out what we believe to be true 4 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 5. we need to access to the most recent biomedical data (problems: access, format, identifiers & linking) 5 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 6. we also need access to the most effective software to analyze, predict and evaluate (problems: OS, versioning, input/output formats) 6 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 7. we build support for our hypotheses by constructing and (partly) reusing fairly sophisticated workflows 7 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 8. Wouldn’t it be great if we could just find the evidence required to support or dispute a scientific hypothesis using the most up-to-date and relevant data, tools and scientific knowledge? 8 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 9. madness! 1. Build a massive network of interconnected data and software using web standards 2. Develop methods to generate and test hypotheses across these data 1. tactical formalization 2. uncovering associations 3. evidence gathering 3. Contribute back to the global knowledge graph 9 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 10. The Semantic Web is the new global web of knowledge standards for publishing, sharing and querying facts, expert knowledge and services scalable approach for the discovery of independently formulated and distributed knowledge 10 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 11. we’re building an incredible network of linked data 11 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 12. Bio2RDF Linked Data Map Bio2RDF Release 2: Improved coverage, interoperability and provenance of Life Science Linked Data . ESWC 2013. 12 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 13. At the heart of Linked Data for the Life Sciences chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications • Free and open source • Uses Semantic Web standards • Release 2 (Jan 2013): 1B+ interlinked statements from 19 conventional and high value datasets • Includes provenance, statistics • Partnerships with EBI, NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers 13 A Callahan, J Cruz-Toledo, M Dumontier. Querying Bio2RDF Linked Open Data with a Global Schema. 2013. Journal of Biomedical Semantics. 4(Suppl 1):S1. @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 14. Resource Description Framework • It’s a language to represent knowledge – Logic-based formalism -> automated reasoning – graph-like properties -> data analysis • Good for – Describing in terms of type, attributes, relations – Integrating data from different sources – Sharing the data (W3C standard) – Reuse what is available, develop what you’re missing. 14 Dumontier::FDA:Nov 15,2013
  • 16. The linked data network expands with every reference DrugBank drugbank_vocabulary:Drug rdf:type rdfs:label drugbank:DB00586 diclofenac [drugbank:DB00586] pharmgkb_vocabulary:xref pharmgkb:PA449293 rdfs:label PharmGKB diclofenac [pharmgkb:PA449293] pharmgkb_vocabulary:Drug 16 Dumontier::FDA:Nov 15,2013
  • 17. Federated Queries over Independent SPARQL EndPoints Get all protein catabolic processes (and more specific) from biomodels SELECT ?go ?label count(distinct ?x) WHERE { service <http://bioportal.bio2rdf.org/sparql> { ?go rdfs:label ?label . ?go rdfs:subClassOf ?tgo ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") } service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . } } 17 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 18. Bio2RDF: 1M+ SPARQL queries per month UniProt: 6.4M queries per month EBI: 3.5M queries (Oct 2013) 18 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 19. Directions • research – approaches for data publication – methods for large scale data reduction – methods for mining associations • service – coverage, quality and licensing – better user interfaces – sustainability of service 19 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 20. tactical formalization a. discovery of drug and disease pathway associations b. prediction of drug targets using phenotypes 20 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 21. ontology as a strategy to formally represent and integrate knowledge 21 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 22. Have you heard of OWL? 22 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 23. Identification of disease enriched pathways def: A biological pathway is constituted by a set of molecular components that undertake some biological transformation to achieve a stated objective glycolysis : a pathway that converts glucose to pyruvate 23 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 24. aberrant pathways which pathways are associated with disease? Which pathways can be perturbed by drugs? disease pathway drug 24 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 25. Identification of drug and disease enriched pathways • Approach – Integrate 3 datasets: DrugBank, PharmGKB and CTD – Integrate 7 terminologies: MeSH, ATC, ChEBI, UMLS, SNOMED, ICD, DO – Identify significant pathways via enrichment analysis over the fully inferred knowledge base • Formalized as an OWL-EL ontology – 650,000+ classes, 3.2M subClassOf axioms, 75,000 equivalentClass axioms 25 Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics. 2012. @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 26. Top Level Classes (disjointness) pathway drug gene disease Reciprocal Existentials Class subsumption mercaptopurine [pharmgkb:PA450379] purine-6-thiol [CHEBI:2208] Class Equivalence mercaptopurine [drugbank:DB01033] mercaptopurine [mesh:D015122] mercaptopurine [ATC:L01BB02] drug property chains pathway drug disease gene 26 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 27. Benefits: Enhanced Query Capability – Use any mapped terminology to query a target resource. – Use knowledge in target ontologies to formulate more precise questions • ask for drugs that are associated with diseases of the joint: ‘Chikungunya’ (do:0050012) is defined as a viral infectious disease located in the ‘joint’ (fma:7490) and caused by a ‘Chikungunya virus’ (taxon:37124). – Learn relationships that are inferred by automated reasoning. • alcohol (ChEBI:30879) is associated with alcoholism (PA443309) since alcoholism is directly associated with ethanol (CHEBI:16236) • ‘parasitic infectious disease’ (do:0001398) retrieves 129 disease associated drugs, 15 more than are directly associated. 27 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 28. Knowledge Discovery through Data Integration and Enrichment Analysis • OntoFunc: Tool to discover significant associations between sets of objects and ontology categories. enrichment of attribute among a selected set of input items as compared to a reference set. hypergeometric or the binomial distribution, Fisher's exact test, or a chi-square test. • We found 22,653 disease-pathway associations, where for each pathway we find genes that are linked to disease. – Mood disorder (do:3324) associated with Zidovudine Pathway (pharmgkb:PA165859361). Zidovudine is for treating HIV/AIDS. Side effects include fatigue, headache, myalgia, malaise and anorexia • We found 13,826 pathway-chemical associations – Clopidogrel (chebi:37941) associated with Endothelin signaling pathway (pharmgkb:PA164728163). Endothelins are proteins that constrict blood vessels and raise blood pressure. Clopidogrel inhibits platelet aggregation and prolongs bleeding time. 28 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 29. PhenomeDrug A computational approach to predict drug targets, drug effects, and drug indications using phenotypic information Mouse model phenotypes provide information about human drug targets. Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M. Bioinformatics. 2013. 29 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 30. animal models provide insight for on target effects • In the majority of 100 best selling drugs ($148B in US alone), there is a direct correlation between knockout phenotype and drug effect • Gastroesophageal reflux – Proton pump inhibitors (e.g Prilosec - $5B) – KO of alpha or beta unit of H/K ATPase are achlorhydric (normal pH) in stomach compared to wild type (pH 3-5) when exposed to acidic solution • Immunological Indications – Anti-histamines (Claritin, Allegra, Zyrtec) – KO of histamine H1 receptor leads to decreased responsiveness of immune system – Predicts on target effects : drowsiness, reduced anxiety Zambrowicz and Sands. Nat Rev Drug Disc. 2003. 30 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 31. Identifying drug targets from mouse knock-out phenotypes Main idea: if a drug’s phenotypes matches the phenotypes of a null model, this suggests that the drug is an inhibitor of the gene phenotypes similar effects non-functional gene model drug ortholog gene 31 inhibits human gene @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 32. Terminological Interoperability (we must compare apples with apples) Mouse Phenotypes PhenomeNet Mammalian Phenotype Ontology PhenomeDrug Drug effects (mappings from UMLS to DO, NBO, MP) @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 33. Semantic Similarity Given a drug effect profile D and a mouse model M, we compute the semantic similarity as an information weighted Jaccard metric. The similarity measure used is non-symmetrical and determines the amount of information about a drug effect profile D that is covered by a set of mouse model phenotypes M. 33 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 34. null models do well in predicting targets of drug inhibitors • 14,682 drugs; 7,255 mouse genotypes • Validation against known and predicted inhibitor-target pairs – 0.739 ROC AUC for human targets (DrugBank) – 0.736 ROC AUC for mouse targets (STITCH) – 0.705 ROC AUC for human targets (STITCH) • diclofenac (STITCH:000003032) – NSAID used to treat pain, osteoarthritis and rheumatoid arthritis – Drug effects include liver inflammation (hepatitis), swelling of liver (hepatomegaly), redness of skin (erythema) – 49% explained by PPARg knockout • peroxisome proliferator activated receptor gamma (PPARg) regulates metabolism, proliferation, inflammation and differentiation, • Diclofenac is a known inhibitor – 46% explained by COX-2 knockout • Diclofenac is a known inhibitor @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 35. Finding evidence to support or dispute a hypothesis takes some digging around 35 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 36. FDA Use Case: TKI non-QT Cardiotoxicity • Tyrosine Kinase Inhibitors (TKI) – Imatinib, Sorafenib, Sunitinib, Dasatinib, Nilotinib, Lapatinib – Used to treat cancer – Linked to cardiotoxicity. • FDA launched drug safety program to detect toxicity – Need to integrate data and ontologies (Abernethy, CPT 2011) – Abernethy (2013) suggest using public data in genetics, pharmacology, toxicology, systems biology, to predict/validate adverse events • What evidence could we gather to give credence that TKI’s causes non-QT cardiotoxicity? 36 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 37. HyQue • The goal of HyQue is retrieve and evaluate evidence that supports/disputes a hypothesis – hypotheses are described as a set of events • e.g. binding, inhibition, phenotypic effect – events are associated with types of evidence • a query is written to retrieve data • a weight is assigned to provide significance • Hypotheses are written by people who seek answers • data retrieval rules are written by people who know the data and how it should be interpreted  1. HyQue: Evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3. 2. Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. 37 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 38. HyQue: A Semantic Web Application Hypothesis Evaluation Data Ontologies Software @micheldumontier::AAAI- 38
  • 39. SPIN functions to evaluate TKI Cardiotoxicity • Effects [r3.sider, lit] • Does the drug have known cardiotoxic effects? • Models [r2.mgi] • Are there cardiotoxic phenotypes in null mouse models of target genes? • Assays [ebi.chembl] • Is hERG inhibited? • Is the drug TUNEL positive? • Targets [r2.drugbank, lit] • Does the drug inhibit cardiotoxic-related proteins? • RAF1, PDFGFR, VEGFR, AMPK 39 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 44. In Summary • This talk was about making sense of the structured data we already have • RDF-based Linked Open Data acts as a substrate for query answering and task-based formalization in OWL • Discovery through the generation of testable hypotheses in the target domain. • The next big challenge lies in capturing the output of computational experiments just as much as extracting/formalizing user-contributed knowledge . 44 @micheldumontier::AAAI-FSW:Nov 16, 2013
  • 45. Acknowledgements Bio2RDF Release 2: Allison Callahan, Jose Cruz-Toledo, Peter Ansell Aberrant Pathways: Robert Hoehndorf, Georgios Gkoutos PhenomeDrug: Tanya Hiebert, Robert Hoehndorf, Georgios Gkoutos, Paul Schofield HyQue: Alison Callahan, Nigam Shah 45 @micheldumontier::AAAI-FSW:Nov 16, 2013

Editor's Notes

  • #5: Growth of PubMed citations from 1986 to 2010. Over the past 20 years, the total number of citations in PubMed has increased at a ∼4% growth rate. There are currently over 20-million citations in PubMed
  • #14: The Bio2RDF project transforms silos of life science data into a globally distributed network of linked data for biological knowledge discovery.
  • #23: Slick used car salesman
  • #29: Endothelins are proteins that constrict blood vessels and raise blood pressure. They are normally kept in balance by other mechanisms, but when they are over-expressed, they contribute to high blood pressure (hypertension) and heart disease.
  • #36: Can’t answer questions that require background knowledge
  • #37: Associate Director for Drug Safety: Darrell Abernethy, M.D., Ph.D