SlideShare a Scribd company logo
Building and mining a
heterogenous biomedical
knowledge graph
June 23, 2020
Slides: slideshare.net/andrewsu
Andrew Su, Ph.D.
@andrewsu
http://sulab.org
Q19857262
2
https://www.scientificamerican.com/article/can-you-teach-old-drugs-new-tricks/
CML
Pathway
KEGG:hsa05220
Bcr-Abl
UniProtKB:A9UF02
c-Kit
UniProtKB:P10721
Imatinib
CHEBI:45783
Mast Cell
Degranulation
GO:0043303
Chronic
myelogenous
leukemia
DOID:8552
Asthma
DOID:2841
inhibits inhibits
part of part of
causescauses
treats treats?
3
3
4
5
Citation: https://doi.org/10.7554/eLife.26726
Method based on https://doi.org/10.1109/ASONAM.2011.112
“
6
Daniel Himmelstein,
https://think-lab.github.io/d/102/
We obtained and integrated
data from 29 publicly
available resources to create
Hetionet v1.0. The hetnet
contains 47,031 nodes of 11
types and 2,250,197
relationships of 24 types.
Algorithm:
KB:
Rephetio
MINERVA,
CBR, etc.
Text-mining
Semantic
MEDLINE DB
Crowd-curated Distributed KG
7
Semantic MEDLINE (SemMedDB)
• Processed all PubMed
abstracts (30 million+)
every 6-12 months
• 90 million+ triples covering
key biomedical entity types
(genes, drugs, diseases,
anatomy, physiological
processes, etc.)
• All triples anchored to a
publication (and publication
date)
8
https://skr3.nlm.nih.gov/SemMedDB/
Time-resolved prediction analysis
9
1950
1970
1990
2010
1960
1980
2000
Drug-disease 1
Drug-disease 2
Drug-disease 3
Drug-disease 4
Drug-disease 5
Drug-disease 6
…
Drug-disease n
Approval dates
KB year
Train
Test
Time-resolved prediction analysis
10
1950
1970
1990
2010
1960
1980
2000
Drug-disease 1
Drug-disease 2
Drug-disease 3
Drug-disease 4
Drug-disease 5
Drug-disease 6
…
Drug-disease n
Approval dates
KB year
Train Test
Time-resolved prediction analysis
11
1950
1970
1990
2010
1960
1980
2000
Drug-disease 1
Drug-disease 2
Drug-disease 3
Drug-disease 4
Drug-disease 5
Drug-disease 6
…
Drug-disease n
Approval dates
KB year
Time-resolved prediction analysis
12
1950
1970
1990
2010
1960
1980
2000
Drug-disease 1
Drug-disease 2
Drug-disease 3
Drug-disease 4
Drug-disease 5
Drug-disease 6
…
Drug-disease n
Approval dates
KB year
Time-resolved prediction analysis
13
1950
1970
1990
2010
1960
1980
2000
Drug-disease 1
Drug-disease 2
Drug-disease 3
Drug-disease 4
Drug-disease 5
Drug-disease 6
…
Drug-disease n
Approval dates
KB year
Time-resolved prediction analysis
14
Area Under ROC
https://doi.org/10.1186/s12859-019-3297-0
Time-resolved prediction analysis
15
Area Under ROC
Area Under PRC
https://doi.org/10.1186/s12859-019-3297-0
Key findings
• Computational drug
repurposing using KB
reasoning is hard
• We lose almost all
predictive signal at < 5
years in the future
Time-resolved prediction analysis
16
Area Under ROC
https://doi.org/10.1186/s12859-019-3297-0
Key findings
• Computational drug
repurposing using KB
reasoning is hard
• We lose almost all
predictive signal at < 5
years in the future
• Identified four edge
types that are sensitive
to information gain/loss
compound – TREATS – disease compound – RELATEDTO – compound
anatomy – LOCATIONOF – diseasedisease – ASSOCIATEDWITH – disease
Algorithm:
KB:
Rephetio
MINERVA,
CBR, etc.
Crowd-curatedText-mining
Semantic
MEDLINE DB
Distributed KG
17
is to data
is to text
biomedical
Provide a database of the world’s
knowledge that anyone can edit
- Denny Vrandečić
18
19
https://doi.org/10.7554/eLife.52614
Wikidata as a drug repositioning KB
20
Wikidata as a drug repositioning KB
21
Wikidata as a drug repositioning KB
22
Wikidata as a drug repositioning KB
Key findings
• Wikidata is comparable in
size, scope, structure to be
suitable for computational
drug repositioning
• Results improve over time
as the KB content increases
in size and quality
23
2017-01-16
2018-02-05
2019-09-13
https://doi.org/10.7554/eLife.52614
Algorithm:
KB:
Rephetio
MINERVA,
CBR, etc.
Crowd-curated Distributed KGText-mining
Semantic
MEDLINE DB
24
Data federation is common, and sometimes necessary
25
20,000 genes
20,000genes
Gene correlation
matrices
Health data Licensing
restrictions
The (Re)usable
Data Project
http://reusabledata.org/
Columbia Open
Health Data
http://cohd.io/
26
https://smart-api.info/
Additional Slides
Notebooks
● Github
● colab
Autonomous query planning and API call execution
28
Querying a distributed KG
29
Query
template
Imatinib
[Disease]
[Gene]
[Pathway]
Key question
• Is this architecture
amenable to KB
completion?
Algorithm:
KB:
Evidence:
Rephetio
MINERVA,
CBR, etc.
Text-mining Crowd-curated Distributed KG
Semantic
MEDLINE DB
DrugMechDB
30
CML
Pathway
KEGG:hsa05220
Bcr-Abl
UniProtKB:A9UF02
c-Kit
UniProtKB:P10721
Imatinib
CHEBI:45783
Mast Cell
Degranulation
GO:0043303
Chronic
myelogenous
leukemia
DOID:8552
Asthma
DOID:2841
inhibits inhibits
part of part of
causescauses
treats treats?
31
fomepizole
Compound
Alcohol
dehydrogenase
1A
Gene
Ethanol
oxidation
Pathway
formic acid
Compound
Methanol
poisoning
Disease
MESH:D000077604 UniProt:P07327 REACT:R-HSA-71384 MESH:C030544 MESH:D000138
Inhibits Part of Produces Causes
Indication: Fomepizole – TREATS – Methanol poisoning
DrugMechDB – a database of drug mechanisms
https://github.com/SuLab/DrugMechDB
32
Acknowledgements
Mike Mayers
Núria Queralt-Rosinach
Toby Li
Mike Mayers
Andra Waagmeester (Micelio)
Sabah Ul-Hasan
Ginger Tsueng
Roger Tu
Greg Stupp
Ben Good
Tim Putman
Sebastian Burgstaller-Muehlbacher
Lynn Schriml (Univ. Maryland)
Kevin Hybiske (Univ. Washington)
Kevin Xin
Chunlei Wu
Marco Alvarado
Jerry Zhou
Semantic
MEDLINE DB
DrugMechDB:
Mike Mayers
33
Funding

More Related Content

PPTX
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
PDF
Big Data in Cancer Control
PDF
Expert Panel on Data Challenges in Translational Research
PPT
Wikipedia and Biomedical Research
PDF
How the artificial intelligence tool iPGK-PseAAC is working in predicting lys...
PPTX
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
PPTX
An Ondex dataset for in silico drug discovery
PPTX
Building a Biomedical Knowledge Garden
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Big Data in Cancer Control
Expert Panel on Data Challenges in Translational Research
Wikipedia and Biomedical Research
How the artificial intelligence tool iPGK-PseAAC is working in predicting lys...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
An Ondex dataset for in silico drug discovery
Building a Biomedical Knowledge Garden

Similar to Building and mining a heterogeneous biomedical knowledge graph (20)

PPTX
2016 mem good
PPTX
Scripps bioinformatics seminar_day_2
PPTX
Biomedical_Knowledge_Graph_Presentation.pptx
PPTX
A knowledge capture framework for domain specific search systems
PDF
Drug Repurposing using Deep Learning on Knowledge Graphs
PPTX
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
PPTX
Wikidata for biomedical knowledge integration and curation
PDF
AI approaches in healthcare - targeting precise and personalized medicine
PDF
Evotec - How can Knowledge Graphs support Druh Discovery
PPTX
Quantifying the content of biomedical semantic resources as a core for drug d...
PDF
Drug Discovery Knowledge Graph
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PPTX
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
PPTX
Triples for the People (Scientists):  Liberating biological knowledge with t...
PPTX
Generating Biomedical Hypotheses Using Semantic Web Technologies
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PDF
The Power of Graphs to Analyze Biological Data
PDF
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
PDF
PPTX
FAIR & AI Ready KGs for Explainable Predictions
2016 mem good
Scripps bioinformatics seminar_day_2
Biomedical_Knowledge_Graph_Presentation.pptx
A knowledge capture framework for domain specific search systems
Drug Repurposing using Deep Learning on Knowledge Graphs
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
Wikidata for biomedical knowledge integration and curation
AI approaches in healthcare - targeting precise and personalized medicine
Evotec - How can Knowledge Graphs support Druh Discovery
Quantifying the content of biomedical semantic resources as a core for drug d...
Drug Discovery Knowledge Graph
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Triples for the People (Scientists):  Liberating biological knowledge with t...
Generating Biomedical Hypotheses Using Semantic Web Technologies
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
The Power of Graphs to Analyze Biological Data
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
FAIR & AI Ready KGs for Explainable Predictions
Ad

More from Andrew Su (20)

PPTX
Wikidata as a FAIR knowledge graph for the life sciences
PPTX
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
PPTX
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
PPTX
WikiGenomes Poster (ISMB)
PPTX
The case for an open biomedical knowledgebase
PPTX
Open data, compound repurposing, and rare diseases (ISCB)
PPTX
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
PPTX
Citizen Science and Rare Disease Research
PPTX
Open biomedical knowledge using crowdsourcing and citizen science
PPTX
Heart BD2K, Biocuration, and Citizen Science
PPTX
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
PPTX
Using Citizen Science to organize biomedical knowledge
PPTX
UCSD / DBMI seminar 2015-02-6
PPTX
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
PPTX
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
PPTX
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
PPTX
Centralized Model Organism Database (Biocuration 2014 poster)
PPTX
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
PPTX
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
PPTX
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Wikidata as a FAIR knowledge graph for the life sciences
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
WikiGenomes Poster (ISMB)
The case for an open biomedical knowledgebase
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Citizen Science and Rare Disease Research
Open biomedical knowledge using crowdsourcing and citizen science
Heart BD2K, Biocuration, and Citizen Science
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Using Citizen Science to organize biomedical knowledge
UCSD / DBMI seminar 2015-02-6
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Centralized Model Organism Database (Biocuration 2014 poster)
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
Ad

Recently uploaded (20)

PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
diccionario toefl examen de ingles para principiante
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
Sciences of Europe No 170 (2025)
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPT
Chemical bonding and molecular structure
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
The scientific heritage No 166 (166) (2025)
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
famous lake in india and its disturibution and importance
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Derivatives of integument scales, beaks, horns,.pptx
Cell Membrane: Structure, Composition & Functions
Comparative Structure of Integument in Vertebrates.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
diccionario toefl examen de ingles para principiante
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
7. General Toxicologyfor clinical phrmacy.pptx
neck nodes and dissection types and lymph nodes levels
Sciences of Europe No 170 (2025)
TOTAL hIP ARTHROPLASTY Presentation.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
Chemical bonding and molecular structure
ECG_Course_Presentation د.محمد صقران ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
The scientific heritage No 166 (166) (2025)
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
famous lake in india and its disturibution and importance

Building and mining a heterogeneous biomedical knowledge graph