SlideShare a Scribd company logo
MOLIERE
Automatic Biomedical Hypothesis Generation System
Justin Sybrandt Michael Shtutman Ilya Safro
Clemson University
School of Computing
University of South Carolina
College of Pharmacy
Clemson University
School of Computing
UNDISCOVERED PUBLIC KNOWLEDGE
• Proposed by Swanson in 1986
• The set of public knowledge is too large
• Contains implicit connections
2
HUMAN LIMITS
• No person can read everything
• 2,000 – 4,000 new papers daily
• People Specialize
• Limits knowledge sharing across disciplines
• Bias and Inconsistent Assumptions
3
WHAT IS A HYPOTHESIS?
4
DRUG DISEASE
?Inhibits Causes
• An idea or explanation for something that is based on
known facts but has not yet been proven
• (Definition of “hypothesis” from the Cambridge Academic Content Dictionary ©
Cambridge University Press)
AUTOMATIC
HYPOTHESIS GENERATION
5
DISEASEDRUG
Network of Medical Objects
Connections NotWell Studied
AUTOMATIC
HYPOTHESIS GENERATION
6
DISEASEDRUG
Network of Medical Objects
Connections NotWell Studied
AUTOMATIC
HYPOTHESIS GENERATION
7
DISEASEDRUG
Network of Medical Objects
Connections NotWell Studied
8
Methodology Highlights
ARROWSMITH • One of the first hypothesis generation systems.
• Found link between Fish Oil and Raynaud’s Disease.
DiseaseConnect • Finds connections between genes and diseases.
• Displays information in an interactive prompt.
BioLDA • Constructs high quality topic models aided by domain-specific information.
• Identified link betweenVenlafaxine and HTR1A.
RELATED APPROACHES
9
Methodology Limitation Reference
ARROWSMITH Limited Document Set Neil R Smalheiser and Don R Swanson. 1998.
DiseaseConnect Limited Document Set Chun-Chi Liu et al. 2014.
BioLDA LimitedVocabulary Huijun Wang et al. 2011.
RELATED APPROACHES
NEW APPROACH
10
• Construct a large network
• Identify meaningful paths
• Extend paths to neighboring nodes
• Mine neighborhoods for important topics
NETWORK CONSTRUCTION
11
QUERY PROCESS
NETWORK CONSTRUCTION
• National Library of Medicine (NLM)
• National Center for Biotechnology Information (NCBI)
• MEDLINE
• 25 Million Documents
• Titles and Abstracts
12
• SPECALIST NLPTOOLSET
• Natural LanguageToolKit (NLTK)
13
some example text that
hopefully more understandable
than typical medical abstract
This is some example text that is
hopefully more understandable
than the typical medical abstract.
Raw Data CleanedText
NETWORK CONSTRUCTION
NETWORK CONSTRUCTION
• Topical Pattern Mining
• Groups together common phrases
• Creates 2,3,…,n-grams
14
example text hopefully
more understandable typical
medical abstract
some example text that
hopefully more understandable
than typical medical abstract
CleanedText Phrases
NETWORK CONSTRUCTION
• FastText: Projects phrases into real valued vector space
• Long word embedding composed from subwords
15
example text hopefully
more understandable typical
medical abstract
Phrases
Point Cloud
16
Point Cloud Centroid
NETWORK CONSTRUCTION
• Embed documents by averaging over point clouds
NETWORK CONSTRUCTION
• Construct KNN
• Fast Library for Approximate Nearest Neighbors
17
All Centroids Network
NETWORK CONSTRUCTION
18
• UMLS Metathesarus
• Curated keyword network
• 2 Million Nodes
• Superset of MESH
NETWORK CONSTRUCTION
19
• Edge weight ~ Distance
• Inv.TF-IDF cross-layer edges
• Edges normalized [0,1]
NETWORK CONSTRUCTION
20
QUERY PROCESS
QUERY PROCESS
21
22
• User selects two nodes
• Restrained to keywords
• Can generalize to two sets
QUERY PROCESS
QUERY PROCESS
23
• Identify shortest path
between query sets
QUERY PROCESS
24
• N: Abstracts close to those in original path
• C: Abstracts which share path-adjacent keywords
QUERY PROCESS
25
• PLDA+: Identifies topics present in a set of text.
Topic
…
…
…
Topic
…
…
…
Topic
…
…
…
• Topic patterns shed light on results
QUERY PROCESS
• Hypothesis represented as a topic model
• Shown: Venlaflaxine vs. HTR1A
26
TOPIC: 0
antidepressant_drugs
milnacipran
org
selected
ht
TOPIC: 1
increase
reduced
treatment
dorsal_raphe_nucleus
effect
TOPIC: 2
rats
sert
ht_receptor
escitalopram
potency
RESULTS
27
VENLAFAXINE
DRUG
REPURPOSING
?
VENLAFAXINE EXAMPLE
• Venlafaxine:
• Treats depression / anxiety
• HTR[12]A:
• Linked to depression / anxiety
• No paper linked these concepts
28
Venlafaxine
HTR1A
HTR2A
Depression
Anxiety
VENLAFAXINE RESULTS
29
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
KeywordCount
Topic Number
Depression Related Keywords PerTopic
HTR1A HTR2A
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
KeywordCount
Topic Number
Anxiety Related Keywords PerTopic
HTR1A HTR2A
• Hypothesis represented as a topic model
• Shown: Venlaflaxine vs. HTR1A
30
TOPIC: 0
antidepressant_drugs
milnacipran
org
selected
ht
TOPIC: 1
increase
reduced
treatment
dorsal_raphe_nucleus
effect
TOPIC: 2
rats
sert
ht_receptor
escitalopram
potency
antidepressant
produces
serotonin
“Sertraline”
antidepressant controlled by
HTR1A
antidepressant
VENLAFAXINE RESULTS
DRUG REPURPOSING EXAMPLE
31
• Drugs can be modified to treat new diseases
• Decreases drug development time and costs
HIVDRUG
?
Cancer
DDX3
DRUG REPURPOSING EXPERIMENT
32
• Ran nearly 10,000 queries involving DDX3:
Signal
Transduction
Transcription
Adhesion
Cancer
Development
Translation
RNA
DDX3
DRUG REPURPOSING EXPERIMENT
33
• Ran nearly 10,000 queries involving DDX3:
• Expecting:
WNT Signaling
Pathways
Cell – Cell
Adhesion
Cell – Matrix
Adhesion
DRUG REPURPOSING RESULTS
34
WNT Signaling
Pathways
Cell – Cell
Adhesion
Cell – Matrix
Adhesion
substrate adhesion
RGD cell adhesion domain
cell adhesion factor
focal adhesion kinase
cell-cell adhesion
regulation of cell-cell adhesion
cell-adhesion molecules
signal-transduction associated kinases
cell adhesion kinase
APPLICATIONS
35
• Drug Repurposing
• Extensions to new domains
• Patents, Economics, etc.
• Coping with Deadlines
OPEN RESEARCH QUESTIONS
• Result Interpretation
• SystemVerification
• Automatic Network Tuning
• Streaming Network Reconstruction
• Inclusion of Additional Data Sources
36
THANK YOU
J. Sybrandt, M. Shtutman, I. Safro “MOLIERE:Automatic Biomedical Hypothesis Generation System”
Code and Data: https://people.cs.clemson.edu/~isafro/software.html
Email: JSYBRAN@CLEMSON.EDU

More Related Content

PDF
Big biomedical data is a lie
PDF
Using machine learning to improve the user experience in online health care c...
PPTX
Mobilizing informational resources for rare diseases
PPTX
Diversity and Depth: Implementing AI across many long tail domains
PPTX
What’s next: The future of non-invasive neurotechnology
PPTX
MEDLARS - Medical Literature Analysis And Retrieval System
PPT
How to Conduct a Systematic Search
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Big biomedical data is a lie
Using machine learning to improve the user experience in online health care c...
Mobilizing informational resources for rare diseases
Diversity and Depth: Implementing AI across many long tail domains
What’s next: The future of non-invasive neurotechnology
MEDLARS - Medical Literature Analysis And Retrieval System
How to Conduct a Systematic Search
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs

Similar to MOLIERE: Automatic Biomedical Hypothesis Generation System (20)

PPTX
Connected Data for Machine Learning | Paul Groth
PPTX
Evidence based psychiatry
PDF
BigData'18: Validation and Analysis of Hypothesis Generation Systems
PDF
Mobilizing informational resources webinar
PPTX
NCompass Live: PubMed, PubMed Central, MEDLINE, MedlinePlus...What's the diff...
PPTX
Presentation at Rare Disease conference in San-Antonio
PDF
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
PDF
Biomarkers brain regions
PPT
How to Conduct a Literature Review
PPT
Literature searching the professional way
PPTX
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
PPTX
research_updated.pptx bbbbbbbbbbbbbbbbbb
PPTX
Practicing up-to-date-medicine
PPTX
Serving the medicinal chemistry community with Royal Society of Chemistry che...
PDF
Drug Discovery and Development Using AI
PDF
Research Methodology workshop Day 01.pdf
PPTX
2015 EMS 3.0
PDF
Mini manual-database-instruction-sanders
PPTX
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PPTX
How to Write an Abstract- Dr. Vyaktika Shree.pptx
Connected Data for Machine Learning | Paul Groth
Evidence based psychiatry
BigData'18: Validation and Analysis of Hypothesis Generation Systems
Mobilizing informational resources webinar
NCompass Live: PubMed, PubMed Central, MEDLINE, MedlinePlus...What's the diff...
Presentation at Rare Disease conference in San-Antonio
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
Biomarkers brain regions
How to Conduct a Literature Review
Literature searching the professional way
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
research_updated.pptx bbbbbbbbbbbbbbbbbb
Practicing up-to-date-medicine
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Drug Discovery and Development Using AI
Research Methodology workshop Day 01.pdf
2015 EMS 3.0
Mini manual-database-instruction-sanders
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
How to Write an Abstract- Dr. Vyaktika Shree.pptx
Ad

Recently uploaded (20)

PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Science Form five needed shit SCIENEce so
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPT
Presentation of a Romanian Institutee 2.
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Substance Disorders- part different drugs change body
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
LEC Synthetic Biology and its application.ppt
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPTX
Welcome-grrewfefweg-students-of-2024.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
Mutation in dna of bacteria and repairss
PPTX
Microbes in human welfare class 12 .pptx
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Science Form five needed shit SCIENEce so
Hypertension_Training_materials_English_2024[1] (1).pptx
Biomechanics of the Hip - Basic Science.pptx
Presentation of a Romanian Institutee 2.
Introcution to Microbes Burton's Biology for the Health
Substance Disorders- part different drugs change body
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
LEC Synthetic Biology and its application.ppt
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Welcome-grrewfefweg-students-of-2024.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
The Land of Punt — A research by Dhani Irwanto
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Mutation in dna of bacteria and repairss
Microbes in human welfare class 12 .pptx
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Ad

MOLIERE: Automatic Biomedical Hypothesis Generation System

  • 1. MOLIERE Automatic Biomedical Hypothesis Generation System Justin Sybrandt Michael Shtutman Ilya Safro Clemson University School of Computing University of South Carolina College of Pharmacy Clemson University School of Computing
  • 2. UNDISCOVERED PUBLIC KNOWLEDGE • Proposed by Swanson in 1986 • The set of public knowledge is too large • Contains implicit connections 2
  • 3. HUMAN LIMITS • No person can read everything • 2,000 – 4,000 new papers daily • People Specialize • Limits knowledge sharing across disciplines • Bias and Inconsistent Assumptions 3
  • 4. WHAT IS A HYPOTHESIS? 4 DRUG DISEASE ?Inhibits Causes • An idea or explanation for something that is based on known facts but has not yet been proven • (Definition of “hypothesis” from the Cambridge Academic Content Dictionary © Cambridge University Press)
  • 5. AUTOMATIC HYPOTHESIS GENERATION 5 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  • 6. AUTOMATIC HYPOTHESIS GENERATION 6 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  • 7. AUTOMATIC HYPOTHESIS GENERATION 7 DISEASEDRUG Network of Medical Objects Connections NotWell Studied
  • 8. 8 Methodology Highlights ARROWSMITH • One of the first hypothesis generation systems. • Found link between Fish Oil and Raynaud’s Disease. DiseaseConnect • Finds connections between genes and diseases. • Displays information in an interactive prompt. BioLDA • Constructs high quality topic models aided by domain-specific information. • Identified link betweenVenlafaxine and HTR1A. RELATED APPROACHES
  • 9. 9 Methodology Limitation Reference ARROWSMITH Limited Document Set Neil R Smalheiser and Don R Swanson. 1998. DiseaseConnect Limited Document Set Chun-Chi Liu et al. 2014. BioLDA LimitedVocabulary Huijun Wang et al. 2011. RELATED APPROACHES
  • 10. NEW APPROACH 10 • Construct a large network • Identify meaningful paths • Extend paths to neighboring nodes • Mine neighborhoods for important topics
  • 12. NETWORK CONSTRUCTION • National Library of Medicine (NLM) • National Center for Biotechnology Information (NCBI) • MEDLINE • 25 Million Documents • Titles and Abstracts 12
  • 13. • SPECALIST NLPTOOLSET • Natural LanguageToolKit (NLTK) 13 some example text that hopefully more understandable than typical medical abstract This is some example text that is hopefully more understandable than the typical medical abstract. Raw Data CleanedText NETWORK CONSTRUCTION
  • 14. NETWORK CONSTRUCTION • Topical Pattern Mining • Groups together common phrases • Creates 2,3,…,n-grams 14 example text hopefully more understandable typical medical abstract some example text that hopefully more understandable than typical medical abstract CleanedText Phrases
  • 15. NETWORK CONSTRUCTION • FastText: Projects phrases into real valued vector space • Long word embedding composed from subwords 15 example text hopefully more understandable typical medical abstract Phrases Point Cloud
  • 16. 16 Point Cloud Centroid NETWORK CONSTRUCTION • Embed documents by averaging over point clouds
  • 17. NETWORK CONSTRUCTION • Construct KNN • Fast Library for Approximate Nearest Neighbors 17 All Centroids Network
  • 18. NETWORK CONSTRUCTION 18 • UMLS Metathesarus • Curated keyword network • 2 Million Nodes • Superset of MESH
  • 19. NETWORK CONSTRUCTION 19 • Edge weight ~ Distance • Inv.TF-IDF cross-layer edges • Edges normalized [0,1]
  • 22. 22 • User selects two nodes • Restrained to keywords • Can generalize to two sets QUERY PROCESS
  • 23. QUERY PROCESS 23 • Identify shortest path between query sets
  • 24. QUERY PROCESS 24 • N: Abstracts close to those in original path • C: Abstracts which share path-adjacent keywords
  • 25. QUERY PROCESS 25 • PLDA+: Identifies topics present in a set of text. Topic … … … Topic … … … Topic … … … • Topic patterns shed light on results
  • 26. QUERY PROCESS • Hypothesis represented as a topic model • Shown: Venlaflaxine vs. HTR1A 26 TOPIC: 0 antidepressant_drugs milnacipran org selected ht TOPIC: 1 increase reduced treatment dorsal_raphe_nucleus effect TOPIC: 2 rats sert ht_receptor escitalopram potency
  • 28. ? VENLAFAXINE EXAMPLE • Venlafaxine: • Treats depression / anxiety • HTR[12]A: • Linked to depression / anxiety • No paper linked these concepts 28 Venlafaxine HTR1A HTR2A Depression Anxiety
  • 29. VENLAFAXINE RESULTS 29 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 KeywordCount Topic Number Depression Related Keywords PerTopic HTR1A HTR2A 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 KeywordCount Topic Number Anxiety Related Keywords PerTopic HTR1A HTR2A
  • 30. • Hypothesis represented as a topic model • Shown: Venlaflaxine vs. HTR1A 30 TOPIC: 0 antidepressant_drugs milnacipran org selected ht TOPIC: 1 increase reduced treatment dorsal_raphe_nucleus effect TOPIC: 2 rats sert ht_receptor escitalopram potency antidepressant produces serotonin “Sertraline” antidepressant controlled by HTR1A antidepressant VENLAFAXINE RESULTS
  • 31. DRUG REPURPOSING EXAMPLE 31 • Drugs can be modified to treat new diseases • Decreases drug development time and costs HIVDRUG ? Cancer DDX3
  • 32. DRUG REPURPOSING EXPERIMENT 32 • Ran nearly 10,000 queries involving DDX3: Signal Transduction Transcription Adhesion Cancer Development Translation RNA DDX3
  • 33. DRUG REPURPOSING EXPERIMENT 33 • Ran nearly 10,000 queries involving DDX3: • Expecting: WNT Signaling Pathways Cell – Cell Adhesion Cell – Matrix Adhesion
  • 34. DRUG REPURPOSING RESULTS 34 WNT Signaling Pathways Cell – Cell Adhesion Cell – Matrix Adhesion substrate adhesion RGD cell adhesion domain cell adhesion factor focal adhesion kinase cell-cell adhesion regulation of cell-cell adhesion cell-adhesion molecules signal-transduction associated kinases cell adhesion kinase
  • 35. APPLICATIONS 35 • Drug Repurposing • Extensions to new domains • Patents, Economics, etc. • Coping with Deadlines
  • 36. OPEN RESEARCH QUESTIONS • Result Interpretation • SystemVerification • Automatic Network Tuning • Streaming Network Reconstruction • Inclusion of Additional Data Sources 36
  • 37. THANK YOU J. Sybrandt, M. Shtutman, I. Safro “MOLIERE:Automatic Biomedical Hypothesis Generation System” Code and Data: https://people.cs.clemson.edu/~isafro/software.html Email: JSYBRAN@CLEMSON.EDU