SlideShare a Scribd company logo
Disease-target associations from drug trial links
Proposed confidence metrics:
● nStudy : Study count for association.
● nStudyNewness : Study count weighted by newness of study (newer better).
● nStudyPhase : Study count weighted by phase of study (completed better).
● nPub : Study publications.
● nPubTypes : Study publications (results type better).
● nDiseaseMention : Disease mention count for disease-target association.
● nDrugMention : Drug mention count for disease-target association.
● nDrug : Drug count for disease-target association.
● nAssay : Assay count for drug-target association.
● nAssayPchembl : Assay count for drug-target association, weighted by pChembl.
1.2M associations; 164K unique disease-gene pairs. Examples:
Mining ClinicalTrials.gov via CTTI AACT for
drug target hypotheses
Jeremy Yang1
, Roger Sayle2
, Lars Juhl Jensen3
and Tudor Oprea1
1
University of New Mexico, Albuquerque, USA; 2
NextMove Scientific Software, Cambridge, UK; 3
Novo Nordisk Foundation Center for Protein Research, Copenhagen, DK
We mined ClinicalTrials.gov using the CTTI-AACT db from Duke University for
drugs/chemicals and diseases/conditions. Named entitiy recognition (NER) requires
specialized tools and expertly curated dictionaries for comprehensive and high quality
results, hence use of NextMove Leadmine for chemical NER and JensenLab
Tagger for disease NER. Study designs and outcomes can offer new and unique
drug target knowledge. Target hypotheses can be inferred indirectly via drugs and
diseases, a valuable source of evidence to Illuminate the Druggable Genome.
Drug trials produce new and rich experimental data
AACT accessed Dec 3, 2019.
Chemical NER with NextMove LeadMine
https://www.nextmovesoftware.com/
Intervention names and study descriptions mined by LeadMine v3.14.1. Drug trials
(interventional): 130740; drug names: 14969; SMILES: 4869. Many non-structures,
e.g. "placebo", "test product", "medication", "chemotherapy". Top drugs by total
mentions:
Compounds mapped to: PubChem via PUG REST API, SMILES exact search;
ChEMBL via REST API, InChIKey search; targets via ChEMBL bioassays.
Targets mapped to IDG-TCRD/Pharos via UniProt ID.
Compound-target mapping via PubChem & ChEMBL
Overview
IDG F2F Meeting - 11-12 Feb 2020 - Arlington, VA
This work supported by US National Institutes of Health (grants U54 CA189205 and U24 224370), Illuminating the Druggable Genome Knowledge Management Center (IDG KMC), and
by the Novo Nordisk Foundation (grant number NNF14CC0001).
Disease NER with JensenLab Tagger
https://github.com/larsjuhljensen/tagger
JensenLab diseases dictionary, on detailed descriptions.
Disease mention totals by merging to resolved Disease Ontology term (DOID).
Top diseases by total mentions:
doid N_mentions terms
DOID:162 31143 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; ...
DOID:9351 18955 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus...
DOID:6713 18461 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE;…
DOID:2030 13621 ANXIETY; Anxiety; Anxiety Disorder; Anxiety State; Anxiety disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; ...
DOID:1612 11586 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer;…
DOID:2841 10773 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper re …
DOID:3083 10726 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive L…
DOID:9970 10193 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity
DOID:10763 9816 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high…
Aggregate Analysis of ClinicalTrials.gov (AACT) db
from the Clinical Trials Transformation Initiative (CTTI)
Why not target NER?
Clinical trials not designed to communicate molecular mechanisms to research
scientists, but with focus on clinical efficacy and safety. In due diligence we performed
target NER, and compared with target NER on arbitrary non-biomedical text: tweets
from the Twitter API for #brexit (26 Nov 2019). We find 8.64 target entities per 1000
chars in the tweets (e.g. "TAX", "LIAR", "NHS", "DANGER", "insulin"), vs. 6.63 in the
clinical trials descriptions. While not proof this does support prior belief and target
inference via chemical NER.
nct_id drug_name cid disease_term doid gene_symbol uniprot idgTDL
NCT02600741 Fluphenazine 3372 schizophrenia DOID:5419 MC5R P33032 Tchem
NCT00008190 melphalan 460612 acute leukemia DOID:12603 MMP2 P08253 Tchem
NCT00003012 methotrexate 126941 breast cancer DOID:1612 KDM4E B2RXH2 Tchem
NCT01445522 ABT-888 11960529 lymphoma DOID:0060058 PARP12 Q9H0J9 Tdark
NCT03201250 Cabozantinib 25102847 multiple myeloma DOID:9538 ANKK1 Q8NFD2 Tbio
https://aact.ctti-clinicaltrials.org/
CDK_smi2img N_mentions names
2787 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol
2654 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide;
ciclophosphamide; cyclophosphamide
2552 CISPLATIN; Cis Platinum; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum;
cis-platinum; cisplatin; cisplatine; cisplatinum

More Related Content

PDF
Qiu_CV_Feb12_2017
PDF
Contribution of genome-wide association studies to scientific research: a pra...
PPTX
Early Detection of Rheumatoid Arthritis: A Systematic Review
PDF
Random Musings on Fixing Data Shambles in Science
PDF
IUPHAR Guide to IMMUNOPHARMACOLOGY
PPTX
A Clinical Prediction Rule for Fluoroquinolone Resistance 3 14 11
DOC
Death prompts a review of gene therapy vector
PPTX
Rhetorical moves and audience considerations in the discussion sections of ra...
Qiu_CV_Feb12_2017
Contribution of genome-wide association studies to scientific research: a pra...
Early Detection of Rheumatoid Arthritis: A Systematic Review
Random Musings on Fixing Data Shambles in Science
IUPHAR Guide to IMMUNOPHARMACOLOGY
A Clinical Prediction Rule for Fluoroquinolone Resistance 3 14 11
Death prompts a review of gene therapy vector
Rhetorical moves and audience considerations in the discussion sections of ra...

What's hot (15)

PDF
Data Visualization in Biomedical Sciences: More than Meets the Eye
PDF
Warfarin vitamin k_patel_2008
PDF
HCV research: Recent findings and future challenges
PPTX
Citation practices and the construction of scientific fact--ECA-facts-preconf...
PPTX
PHR 6604- Review of Study Types
PPTX
Personalized medicine through wes and big data analytics
PDF
A common rejection module (CRM) for acute rejection across multiple organs
PDF
Priyakant_Author_IJTLD
PPTX
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
PDF
Using primary care databases to evaluate drug benefits and harms: are the res...
PPTX
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
PDF
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
PDF
Cardiology04 (1)
PPTX
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
PDF
Meg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
 
Data Visualization in Biomedical Sciences: More than Meets the Eye
Warfarin vitamin k_patel_2008
HCV research: Recent findings and future challenges
Citation practices and the construction of scientific fact--ECA-facts-preconf...
PHR 6604- Review of Study Types
Personalized medicine through wes and big data analytics
A common rejection module (CRM) for acute rejection across multiple organs
Priyakant_Author_IJTLD
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
Using primary care databases to evaluate drug benefits and harms: are the res...
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
Cardiology04 (1)
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
Meg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
 
Ad

Similar to Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses (20)

PDF
Bibliological data science and drug discovery
PDF
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
PDF
Open Targets workshop at C4X in 2019
PPTX
Analysing targets and drugs to populate the GToP database
PDF
NPC-PD2 PPP collab-PLoS 2015
PDF
Drug Repositioning Conference Washington DC 20190923
PPTX
Atul Butte's presentation at CTIC 2020
PDF
Text mining and deep learning for biomedicine
PPTX
Session 4 part 1
PPT
Open Targets, identifying targets for drug development in the treatment of di...
PPTX
TargetInsights: A New Method to Rapidly Access 'Specificity” of Selected Prot...
PPT
Polypharmacology - NBIC April 20, 2011
PPTX
Drugdiscoveryanddevelopment by khadga raj
PPTX
Drug discovery and development
PPT
Semantic Web for Health Care and Biomedical Informatics
PPTX
Evolving consensus-based curatorial strategies
PPTX
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
PDF
The Translational Medicine
PPTX
Will the real drug targets please stand up ?
PDF
2015 Report: Medicines in Development for Heart Disease & Stroke
Bibliological data science and drug discovery
WEBINAR: The Yosemite Project PART 6 -- Data-Driven Biomedical Research with ...
Open Targets workshop at C4X in 2019
Analysing targets and drugs to populate the GToP database
NPC-PD2 PPP collab-PLoS 2015
Drug Repositioning Conference Washington DC 20190923
Atul Butte's presentation at CTIC 2020
Text mining and deep learning for biomedicine
Session 4 part 1
Open Targets, identifying targets for drug development in the treatment of di...
TargetInsights: A New Method to Rapidly Access 'Specificity” of Selected Prot...
Polypharmacology - NBIC April 20, 2011
Drugdiscoveryanddevelopment by khadga raj
Drug discovery and development
Semantic Web for Health Care and Biomedical Informatics
Evolving consensus-based curatorial strategies
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
The Translational Medicine
Will the real drug targets please stand up ?
2015 Report: Medicines in Development for Heart Disease & Stroke
Ad

More from Jeremy Yang (19)

PDF
TIGA: Target Illumination GWAS Analytics
PDF
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
PDF
TIN-X v2: modernized architecture with REST API
PDF
Ex-files: Sex-Specific Gene Expression Profiles Explorer
PDF
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
PDF
Open Phenotypic Drug Discovery Resource poster
PDF
Badapple: promiscuity patterns from noisy evidence (poster)
PDF
BioMISS: Language Diversity of Computing
PDF
The Language Diversity of Computing
PDF
RMSD: routine measure stirs doubts
PDF
Canonicalized systematic nomenclature in cheminformatics
PDF
Molecular scaffolds poster
PDF
Molecular scaffolds are special and useful guides to discovery
PDF
The BADAPPLE promiscuity plugin for BARD
PDF
Cheminformatics Software Development: Case Studies
PDF
How am I supposed to organize a protein database when I can't even organize m...
PDF
UNM Division of Biocomputing public web applications
PDF
Cyberinfrastructure Day 2010: Applications in Biocomputing
PPT
Promiscuous patterns and perils in PubChem and the MLSCN
TIGA: Target Illumination GWAS Analytics
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
TIN-X v2: modernized architecture with REST API
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Open Phenotypic Drug Discovery Resource poster
Badapple: promiscuity patterns from noisy evidence (poster)
BioMISS: Language Diversity of Computing
The Language Diversity of Computing
RMSD: routine measure stirs doubts
Canonicalized systematic nomenclature in cheminformatics
Molecular scaffolds poster
Molecular scaffolds are special and useful guides to discovery
The BADAPPLE promiscuity plugin for BARD
Cheminformatics Software Development: Case Studies
How am I supposed to organize a protein database when I can't even organize m...
UNM Division of Biocomputing public web applications
Cyberinfrastructure Day 2010: Applications in Biocomputing
Promiscuous patterns and perils in PubChem and the MLSCN

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
BIOMOLECULES PPT........................
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Sciences of Europe No 170 (2025)
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Microbiology with diagram medical studies .pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
POSITIONING IN OPERATION THEATRE ROOM.ppt
BIOMOLECULES PPT........................
HPLC-PPT.docx high performance liquid chromatography
Taita Taveta Laboratory Technician Workshop Presentation.pptx
An interstellar mission to test astrophysical black holes
Cell Membrane: Structure, Composition & Functions
Sciences of Europe No 170 (2025)
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Biophysics 2.pdffffffffffffffffffffffffff
AlphaEarth Foundations and the Satellite Embedding dataset
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
neck nodes and dissection types and lymph nodes levels
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Introduction to Fisheries Biotechnology_Lesson 1.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Microbiology with diagram medical studies .pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
TOTAL hIP ARTHROPLASTY Presentation.pptx

Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses

  • 1. Disease-target associations from drug trial links Proposed confidence metrics: ● nStudy : Study count for association. ● nStudyNewness : Study count weighted by newness of study (newer better). ● nStudyPhase : Study count weighted by phase of study (completed better). ● nPub : Study publications. ● nPubTypes : Study publications (results type better). ● nDiseaseMention : Disease mention count for disease-target association. ● nDrugMention : Drug mention count for disease-target association. ● nDrug : Drug count for disease-target association. ● nAssay : Assay count for drug-target association. ● nAssayPchembl : Assay count for drug-target association, weighted by pChembl. 1.2M associations; 164K unique disease-gene pairs. Examples: Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses Jeremy Yang1 , Roger Sayle2 , Lars Juhl Jensen3 and Tudor Oprea1 1 University of New Mexico, Albuquerque, USA; 2 NextMove Scientific Software, Cambridge, UK; 3 Novo Nordisk Foundation Center for Protein Research, Copenhagen, DK We mined ClinicalTrials.gov using the CTTI-AACT db from Duke University for drugs/chemicals and diseases/conditions. Named entitiy recognition (NER) requires specialized tools and expertly curated dictionaries for comprehensive and high quality results, hence use of NextMove Leadmine for chemical NER and JensenLab Tagger for disease NER. Study designs and outcomes can offer new and unique drug target knowledge. Target hypotheses can be inferred indirectly via drugs and diseases, a valuable source of evidence to Illuminate the Druggable Genome. Drug trials produce new and rich experimental data AACT accessed Dec 3, 2019. Chemical NER with NextMove LeadMine https://www.nextmovesoftware.com/ Intervention names and study descriptions mined by LeadMine v3.14.1. Drug trials (interventional): 130740; drug names: 14969; SMILES: 4869. Many non-structures, e.g. "placebo", "test product", "medication", "chemotherapy". Top drugs by total mentions: Compounds mapped to: PubChem via PUG REST API, SMILES exact search; ChEMBL via REST API, InChIKey search; targets via ChEMBL bioassays. Targets mapped to IDG-TCRD/Pharos via UniProt ID. Compound-target mapping via PubChem & ChEMBL Overview IDG F2F Meeting - 11-12 Feb 2020 - Arlington, VA This work supported by US National Institutes of Health (grants U54 CA189205 and U24 224370), Illuminating the Druggable Genome Knowledge Management Center (IDG KMC), and by the Novo Nordisk Foundation (grant number NNF14CC0001). Disease NER with JensenLab Tagger https://github.com/larsjuhljensen/tagger JensenLab diseases dictionary, on detailed descriptions. Disease mention totals by merging to resolved Disease Ontology term (DOID). Top diseases by total mentions: doid N_mentions terms DOID:162 31143 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; ... DOID:9351 18955 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus... DOID:6713 18461 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE;… DOID:2030 13621 ANXIETY; Anxiety; Anxiety Disorder; Anxiety State; Anxiety disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; ... DOID:1612 11586 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer;… DOID:2841 10773 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper re … DOID:3083 10726 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive L… DOID:9970 10193 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity DOID:10763 9816 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high… Aggregate Analysis of ClinicalTrials.gov (AACT) db from the Clinical Trials Transformation Initiative (CTTI) Why not target NER? Clinical trials not designed to communicate molecular mechanisms to research scientists, but with focus on clinical efficacy and safety. In due diligence we performed target NER, and compared with target NER on arbitrary non-biomedical text: tweets from the Twitter API for #brexit (26 Nov 2019). We find 8.64 target entities per 1000 chars in the tweets (e.g. "TAX", "LIAR", "NHS", "DANGER", "insulin"), vs. 6.63 in the clinical trials descriptions. While not proof this does support prior belief and target inference via chemical NER. nct_id drug_name cid disease_term doid gene_symbol uniprot idgTDL NCT02600741 Fluphenazine 3372 schizophrenia DOID:5419 MC5R P33032 Tchem NCT00008190 melphalan 460612 acute leukemia DOID:12603 MMP2 P08253 Tchem NCT00003012 methotrexate 126941 breast cancer DOID:1612 KDM4E B2RXH2 Tchem NCT01445522 ABT-888 11960529 lymphoma DOID:0060058 PARP12 Q9H0J9 Tdark NCT03201250 Cabozantinib 25102847 multiple myeloma DOID:9538 ANKK1 Q8NFD2 Tbio https://aact.ctti-clinicaltrials.org/ CDK_smi2img N_mentions names 2787 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol 2654 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide 2552 CISPLATIN; Cis Platinum; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum