SlideShare a Scribd company logo
Open Targets: integrating genetics,
omics and chemical data for drug
discovery
Denise Carvalho-Silva, PhD
EMBL-EBI | Open Targets
Wellcome Genome Campus
United Kingdom
C4X Discovery
June 18th 2019
Workshop materials
Presentation slides:
Coursebook with exercises
https://tinyurl.com/slides-c4x
https://tinyurl.com/exercises-c4x
• Introduction to Open Targets
• Open Targets Platform: presentation + live demos
• Hands-on exercises
Lunch break à 12:00-13:00
• Open Targets Genetics: presentation + live demo
• Feedback survey, wrap up, further discussion
This session 10:00-15:00
Drug discovery R&D
Lengthy, costly, high attrition
https://www.genengnews.com/insights/how-crispr-is-accelerating-drug-discovery/
DOI: 10.1016/j.molonc.2012.02.004
Open Targets
A partnership to transform drug discovery
Founded in 2014
Founding partners 2016 2017 20182018
Systematic identification and prioritisation of targets
Selecting the right target
DATA
GENERATION
Therapeutic
hypothesis
Public data
DATA
INTEGRATION
Experimental projects Bioinformatics projects
Knowledge cycle
https://www.targetvalidation.org
https://genetics.opentargets.org
Data generation: Open Targets
www.opentargets.org/science/
• Organoid knockouts (CRISPR-Cas9) - gut epithelium
• IL22 pathway to treat IBD?
• Wellcome Sanger Institute, GSK, University of Cambridge
• Alzheimer’s and Parkinson’s
• CRISPR/Cas9 screens, iPS cells
• Wellcome Sanger Institute, Biogen, Gurdon Institute
Some examples
• > 1,000 cancer cell lines + biomarkers + tractability
• RNASeq, CRISPR/Cas9 screens
• Wellcome Sanger Institute, GSK and EMBL-EBI
Data generation: Oncology
Behan et al (2019)
WRN sustains in vivo growth in:
• colorectal
• ovarian
• endometrial
• gastric
• New candidate target for tumours
with MSI (WRN antagonists)
Data now available in the Open Targets Platform e.g.
https://www.targetvalidation.org/evidence/ENSG000001
00941/EFO_0000305?view=sec:affected_pathway
Open Targets preprints
Data integration: Open Targets
Data integration: web resources
https://www.opentargets.org/resources/#open-targets-platform
Target Disease
Evidence
https://www.targetvalidation.org
What is a target?
https://www.targetvalidation.org/target/ENSG00000255248
https://www.targetvalidation.org/target/ENSG00000175482?view=sec:genome_browser
28K
targets
Examples
• Modified version of Experimental Factor Ontology (EFO)
• Controlled vocabulary (Coeliac versus Celiac)
• Hierarchy (relationships)
How do we describe our diseases?
• Promotes consistency
• Increases the richness of annotation
• Allow for easier and automatic integration
10K
diseases
Evidence for our T-D associations
https://docs.targetvalidation.org/data-sources/data-sources
Data sources grouped into data types
Data type
Data source
https://docs.targetvalidation.org/data-sources/data-sources
What can you do with
the Open Targets
Platform?
• Target annotations
• Target-disease associations (+ evidence + score)
• Disease annotations
http://www.targetvalidation.org/target/ENSG00000141510
http://www.targetvalidation.org/diseaset/EFO_0000228
https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
Demo 1
Searching for a disease à targets
What is the evidence for
the association?
Which targets are
associated with my disease?
Is there any data to help me
with target prioritization e.g.
tractability?
Open Targets workshop at C4X in 2019
Association score à confidence
Which targets have more evidence for an association?
What is the relative weight of the evidence for different targets?
Overall score Genetic Somatic Drugs Pathways Expression Animal modText mining
ΣH
Calculated at
four levels:
• Evidence
• Data source
• Data type
• Overall
Score: 0 to 1 (max)
Aggregation with
(harmonic sum)
ΣH
Note: Each data set has
its own scoring and
ranking scheme
S1 + S2/22 + S3/32 + S4/42 + Si/i2
European Variation
Archive (germline)
UniProt
Gene2Phenotype
GWAS catalog
Cancer Gene Census
European Variation
Archive (somatic)
IntOGen
ChEMBL
Reactome
Expression Atlas
Europe PMC
PhenoDigm
Genetic associations
Somatic mutations
RNA expression
Animal models
Pathways &
systems biology
Text mining
Drugs
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*0.2
*0.2
*0.2
Association
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
Genomics England
PheWAS catalog*1.0
*1.0
ΣH
ΣH
weight factor
SLAPenrich
PROGENy
ΣH
*0.5
ΣH
ΣH
SysbioΣH
*0.5
*0.5
Four-tier framework
Statistical integration, aggregation and scoring
From evidence to overall score
https://docs.targetvalidation.org/getting-started/scoring
1) Evidence score (e.g. one SNP from a GWAS paper)
2) Data source score (e.g. all SNPs from the GWAS catalog)
3) Data type score (e.g. all sources of Genetic associations)
4) Overall association score
f = sample size (cases and controls)
s = predicted functional consequence (VEP)
c = p value reported in the paper
Computing the score for one evidence
score = f * s * c
f, relative occurrence of a target-disease evidence
s, strength of the effect of the variant
c, confidence of the observation for the target-disease evidence
https://docs.targetvalidation.org/getting-started/scoring
(Factors affecting the relative strength of GWAS Catalog evidence)
https://docs.targetvalidation.org/getting-started/scoring
Factorsaffectingtherelativestrengthoftheevidence
Aggregating scores across the data
• Using a mathematical function, the harmonic sum*
where S1,S2,...,Si are the individual sorted evidence scores in descending order
* PMID: 19107201, PMID: 20118918
• Advantages:
A) account for replication
B) deflate the effect of large amounts of data e.g. text mining
Target-Disease Association Score
EuropePMC
(Text Mining)
UniProt
(Manual Curation)
ChEMBL
(Manual Curation)
Overall
association
score
VERY simplified diagram By Andrea Pierleoni
Open Targets workshop at C4X in 2019
What can you do with
the Open Targets
Platform?
• Target annotations
• Target-disease associations (+ evidence + score)
• Disease annotations
https://www.targetvalidation.org/target/ENSG00000141510
https://www.targetvalidation.org/diseaset/EFO_0000228
https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
Target annotationsTarget
profile page
https://docs.targetvalidation.org/getting-started/getting-started/target-profile
Is my target tractable?
Target tractability buckets
1 2 3 4 5 6 7 8
• Clinical phases e.g. phase IV (bucket 1)
• Cellular localization e.g. plasma membrane (bucket 4)
• DrugEBIlity – ensemble score e.g. > 0.7 (bucket 5)
• GO cell component (plasma membrane)
1
1 2 3 4 5 6 7 8 99
Target tractability – small molecules
• Clinical precedence:
• Discovery precedence:
• Predictable tractable:
1 2 3
4
5 6
7
8
1 2 3 4 7 5 6 8
Target tractability – antibodies
• Clinical precedence
• Predictable tractable
– high confidence
• Predictable tractable
– med-low confidence
1 2
4 5
6 7 8 99
3
1 2 3 4 5 6 7 8 9
Is my target safe?Target
profile page
https://docs.targetvalidation.org/getting-
started/getting-started/target-profile
Disease annotationsDisease
profile page
https://docs.targetvalidation.org/getting-started/getting-started/disease-profile
Demo 2
Target and disease annotations
Target
profile page
Disease
profile page
Which disease associations are available for my target?
Demo 3
Searching for a target à diseases
Open Targets workshop at C4X in 2019
Open Targets workshop at C4X in 2019
Hands-on exercises 1-4
Pages 25-28
Or feel free to explore your target (gene/protein)
and disease of interest
https://tinyurl.com/exercises-c4x
Modes of access → data volume
Batch search
https://www.youtube.com/watch?v=CPkAxnVrt_s
We have a list of 20 possible
targets for multiple myeloma.
https://www.targetvalidation.org/batch-search
Several targets at once
We would like to know if these
targets are represented in
other diseases.
Are there any pathways over
represented in my set of targets?
Open Targets workshop at C4X in 2019
https://platform-api.opentargets.io/v3/platform/
public/association/filter
?target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id
Server
Endpoint
Parameters
Open Targets Platform REST API*
https://docs.targetvalidation.org/tutorials/rest-api
Open Targets
Python client **
*
** https://docs.targetvalidation.org/tutorials/python-client
User interface or REST API?
• Scale and volume of your search or results
• Flexibility in the search
• Visualisation of the results
How to search
https://platform-api.opentargets.io/v3/platform/public/search?q=PTEN
http://platform-api.opentargets.io/v3/platform
/public/association/filter?
target=ENSG00000171862
&direct=true
How to get all diseases associated with a target
http://platform-api.opentargets.io/v3/platform/
public/association?id=ENSG00000171862
-EFO_0000616
How to get the score for an association
How to get the evidence for an association
http://platform-
api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000198947&disease=Orphanet_98896&datat
ype=genetic_association
Data downloads
REST API
https://docs.targetvalidation.org/programmatic-access/rest-api
https://youtu.be/KQbfhwpeEvc
Extra hands-on exercises E1-E4
Pages 29-31
Or feel free to explore your target (gene/protein)
and disease of interest
https://tinyurl.com/exercises-c4x
Data integration: Open Targets
Data integration: web resources
https://www.opentargets.org/resources/#open-targets-platform
Open Targets Genetics. Why?
• Refine genetic associations in the Open Targets Platform
• Make sense of GWAS
• Glimpses of the biology behind the association
• Guide target ID?
Lee et al (2017): PMID:29288389
Genetics increases success
Both common and rare variants
https://genetics.opentargets.org
Which targets are implicated in disease from common variant
genetic analysis?
Key question
Open Targets Genetics: data sources
https://genetics-docs.opentargets.org/our-approach/data-sources
Functional Genomics
Variant – Gene
Full summary statistics
Neale and colleagues v1
337,199 individuals
2,419 traits
SNP-trait
associations
Human genetics
Variant (GnomAD) – Trait (study)
Sun et al. 2018
Ensembl
VEP
*
**
***
* Javierre et al. 2016
** Andersson et al. 2014
*** Thruman et al. 2012
Open Targets Genetics: data model
S VL G
S Study (traits from UK Biobank and GWAS catalog)
VL Lead variant (associated with traits from GWAS catalog)
VT Tag variant (possible causal, expanded from VL)
G Gene
p-value
GWAS Catalog
UK Biobank
VT
Fine mapping
LD expansion
r2
Posterior Prob.
TSS distance
eQTL (GTEx V7)
pQTL
PCHI-C
FANTOM5
VEP
Aggregated
functional score
https://genetics-docs.opentargets.org/our-approach/pipeline-overview
What can you do
with Open Targets
Genetics?
Variant Trait Gene
• Genes functionally implicated
• Variant association across traits (PheWAS)
• Variants tagged
• Associated traits
• Links to drugs, expression,
pathway, mouse phenotype,
etc in Open Targets Platform
• Independently associated loci
• Overlapping susceptibility loci
V S G
Locus plot
Visualising the associations between traits, variants, and genes
Fine mapping analysis
How to access all this data?
http://genetics-api.opentargets.io/
GraphQL API
Bulk download*
* http://blog.opentargets.org/2019/04/09/open-targets-genetics-release-is-out/
https://genetics.opentargets.io/
User interface
The SNP rs12916 has been
associated with cholesterol LDL
(Teslovich et al 2010)
https://genetics.opentargets.org
Demo: searching for a SNP
Can we use Open Targets Genetics
to find which genes are functionally
implicated by this variant?
Which is the nearest protein
coding gene to this variant?
Can we compare Teslovich
et al with other cholesterol
LDL studies?
DATA
GENERATION
Therapeutic
hypothesis
Public data
DATA
INTEGRATION
Experimental projects Bioinformatics
projects
Wrap up
https://www.targetvalidation.org
https://genetics.opentargets.org
Upcoming data
50K exome sequence data
Community
ML model: learn the weight
to give to the different data
sources e.g. VEP
Our latest publication
http://bit.ly/cite-us
blog.opentargets.org/
@targetvalidate
http://tinyurl.com/opentargets-in
Get in touch
https://tinyurl.com/opentargets-youtube
Open Targets Platform Open Targets Genetics
docs.targetvalidation.org genetics-docs.opentargets.org
support@targetvalidation.org geneticsportal@opentargets.org
Acknowledgements
Your feedback is important
https://tinyurl.com/london-180619
Details on data sources to associate
targets and diseases
Other Open Targets resources
Extra extra extra
Target safety
• Clinical phases e.g. phase IV (bucket 1)
• Cellular localization e.g. plasma membrane (bucket 4)
• DrugEBIlity – ensemble score e.g. > 0.7 (bucket 5)1
https://docs.targetvalidation.org/getting-started/getting-started/target-profile
Data source: GWAS catalog
• Genome Wide Association Studies
• Array-based chips à genotyping 100,000 SNPs genome wide
Data type
Data source: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
Data types
• Variants, genes, phenotypes in rare diseases
• Literature curation à consultant clinical geneticists in the UK
Data source: Gene2Phenotype
https://www.nature.com/articles/s41467-019-10016-3
Data type
Data source: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
Data type
Data source: PheWAS
• Phenome Wide Association Studies
• A variant associated with multiple phenotypes
• Clinical phenotypes derived from EMR-linked biobank BioVU
• ICD9 codes mapped to EFO
Data type
Data source: GE PanelApp
• Aid clinical interpretation of genomes for the 100K project
• We include ‘green genes’ from version 1+ and phenotypes
Data type
Data source: EVA
• With ClinVar information for rare diseases
• Clinical significance: pathogenic, protective
EMBL-EBI train online
Data types
Data source: The Cancer Gene Census
• Genes with mutations causally implicated in cancer
• Gene associated with a cancer plus other cancers associated
with that gene
Data type
Data source: IntOGen
• Genes and somatic (driver) mutations, 28 cancer types
• Involvement in cancer biology
• TCGA data
• Rubio-Perez et al. 2015
Data type
Data source: ChEMBL
• Known drugs linked to a disease and a known target
• FDA approved for clinical trials or marketing
EMBL-EBI train online
Data type
Data source: Reactome
• Biochemical reactions and pathways
• Manual curation of pathways affected by mutations
EMBL-EBI train online
Data type
Data source: SLAPenrich
• 374 pathways curated and mapped to cancer hallmarks
• Divergence of the total number of (TCGA) cancer samples with
genomic alterations
• Mutational burden and total exonic block length of genes
• Downweighed (x 0.5)
Data type
Data source: PROGENy
• Comparison of pathway activities between normal and primary
samples from TCGA
• Inferred from RNA-seq: 9,250 tumour and 741 normal samples
• EGFR, hypoxia, JAK.STAT, MAPK, NFkB, PI3K, TGFb, TNFa,
Trail, VEGF, and p53
• Downweighed (x 0.5)
Data type
Data source: SysBio
• Curation of four systems biology papers
• Available for six gene lists: ~ 400 genes
• Late onset Alzheimers, cognitive decline, CHD, and IBD
• Score: p-values or rank-based scores if available, if not s=0.5
• Downweighed (x 0.5)
Data type
https://docs.targetvalidation.org/data-sources/affected-pathways#sysbio
Data source: CRISPR
• papers
• Available for ~ 400 genes
Data type
Data source: Expression Atlas
• Baseline expression for human genes
- target profile page
• Differential mRNA expression (healthy versus diseased):
- target-disease associations
• No propagation in the disease ontology
• Downweighed (x 0.2)
EMBL-EBI train online
Data type
Data source: Europe PMC
• Mining titles, abstracts, full text in research articles
• Target and disease co-occurrence in the same sentence
• Dictionary (not NLP)
• Downweighed (x 0.2)
EMBL-EBI train online
Data type
Data sources: PhenoDigm
• Semantic approach to associate mouse models with diseases
• Similarity score between a mouse model and a human disease
(Smedley et al 2013)
• Downweighed (x 0.2)
Data type
Paste the URL in the location bar of your browser
How to run our REST endpoints (option 1)
Open Targets workshop at C4X in 2019
Command line e.g. CURL –X GET
How to run our REST endpoints (option 2)
How to run our REST endpoints (option 3)
Use our free Python client *
* https://docs.targetvalidation.org/tutorials/python-client
Advantage: you can change the way the associations are scored e.g. increase the weight given to
text mining data
REST API calls: some examples*
https://platform-api.opentargets.io/v3/platform/public/search?q=alzheimer%27s
https://platform-api.opentargets.io/v3/platform/public/association/filter?target=ENSG00000142192
https://platform-api.opentargets.io/v3/platform/public/association/filter
?target=ENSG00000142192&direct=true
https://platform-
api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000142192&disease=EFO_00002
49&datasource=uniprot&direct=true
Open Targets REST API
Private: methods used by the UI to serve external data. Subject to change without notice
https://platform-api.opentargets.io/v3/platform/docs/swagger-ui
LINK
• LINK: Literature coNcept Knowledgebase
• Subject / predicate / object structured relations
From PubMed abstracts
Proof of Concept
Further developement
http://link.opentargets.io/
Addressing text mining shortcomings
• Entities: genes, diseases, drugs
• Concepts extracted via NLP
(Natural Language Processing)
• 28 M documents, 500 M relations
• http://blog.opentargets.org/link/
DoRothEA
• Candidate TF-drug interactions in cancer
• 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors
http://cancerres.aacrjournals.org/content/early/2017/12/09/0008-5472.CAN-17-1679
dorothea.opentargets.io
Example: Rapamycin
• ~ 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors
CELLector
http://blog.opentargets.org/2019/02/04/looking-for-cell-line-models-to-predict-anticancer-drug-response-in-
patients-cellector-can-help-you/

More Related Content

PDF
De-siloing data and building knowledge graphs outside of drug discovery: Oppo...
PPTX
Do Open data badges influence author behaviour? A case study at Springer Nature
PPTX
Research in the time of Covid: Surveying impacts on Early Career Researchers
PDF
cBioPortal Webinar Slides (3/3)
PDF
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
PDF
operationalizing asthma analytic plan using omop cdm brandt
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PPTX
Access Lab 2020: Context aware unified institutional knowledge services
De-siloing data and building knowledge graphs outside of drug discovery: Oppo...
Do Open data badges influence author behaviour? A case study at Springer Nature
Research in the time of Covid: Surveying impacts on Early Career Researchers
cBioPortal Webinar Slides (3/3)
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
operationalizing asthma analytic plan using omop cdm brandt
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Access Lab 2020: Context aware unified institutional knowledge services

What's hot (14)

PPTX
How to Create a Big Data Culture in Pharma
PPTX
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
PPTX
Generating Biomedical Hypotheses Using Semantic Web Technologies
PPTX
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
PPT
P4 c2011 slides ekins
PPTX
Trial Promoter: A Web-Based Tool to Test Stakeholder Engagement in Research o...
PPTX
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
PPT
Big data supporting drug discovery - cautionary tales from the world of chemi...
PPTX
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
PPTX
Exploring Chemical and Biological Knowledge Spaces with PubChem
PDF
PEDSnet DQA CHOP Symposium
PDF
Automate your literature monitoring for more effective pharmacovigilance
PPT
2011-10-11 Open PHACTS at BioIT World Europe
PPTX
Fore FAIR ISMB 2019
How to Create a Big Data Culture in Pharma
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
Generating Biomedical Hypotheses Using Semantic Web Technologies
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris Waller
P4 c2011 slides ekins
Trial Promoter: A Web-Based Tool to Test Stakeholder Engagement in Research o...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
Big data supporting drug discovery - cautionary tales from the world of chemi...
Finding novel lead compounds in pesticide discovery inspired by pharmaceutica...
Exploring Chemical and Biological Knowledge Spaces with PubChem
PEDSnet DQA CHOP Symposium
Automate your literature monitoring for more effective pharmacovigilance
2011-10-11 Open PHACTS at BioIT World Europe
Fore FAIR ISMB 2019
Ad

Similar to Open Targets workshop at C4X in 2019 (20)

PPT
Open Targets, identifying targets for drug development in the treatment of di...
PPTX
Computational prediction of novel drug targets using gene disease association...
PDF
Drug Repositioning Conference Washington DC 20190923
PPTX
Target Identification - Gene Disease and Protein Target Prediction
PDF
In silico prediction of novel therapeutic targets using gene - disease associ...
PPTX
TargetInsights: A New Method to Rapidly Access 'Specificity” of Selected Prot...
PDF
Amia tb-review-13
PDF
Zen and the Art of Data Science Maintenance
PDF
High-Dimensional Machine Learning for Medicine
PPTX
Prediction of therapeutic targets using the Open Targets data
PDF
Amia tb-review-12
PDF
Amia tb-review-15
PDF
Prediction of novel targets using disease association data from Open Targets
PDF
Amia tb-review-10
PPTX
Focus on the Evidence: a knowledge graph approach to profiling drug targets
PDF
Prediction of novel targets using disease association data from Open Targets
PDF
Krithara meetup may18_final (1)
PDF
2014 07 ismb personalized medicine
PDF
Friend NIEHS 2013-03-01
PDF
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Open Targets, identifying targets for drug development in the treatment of di...
Computational prediction of novel drug targets using gene disease association...
Drug Repositioning Conference Washington DC 20190923
Target Identification - Gene Disease and Protein Target Prediction
In silico prediction of novel therapeutic targets using gene - disease associ...
TargetInsights: A New Method to Rapidly Access 'Specificity” of Selected Prot...
Amia tb-review-13
Zen and the Art of Data Science Maintenance
High-Dimensional Machine Learning for Medicine
Prediction of therapeutic targets using the Open Targets data
Amia tb-review-12
Amia tb-review-15
Prediction of novel targets using disease association data from Open Targets
Amia tb-review-10
Focus on the Evidence: a knowledge graph approach to profiling drug targets
Prediction of novel targets using disease association data from Open Targets
Krithara meetup may18_final (1)
2014 07 ismb personalized medicine
Friend NIEHS 2013-03-01
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Ad

More from Denise Carvalho-Silva, PhD (6)

PPTX
Open Targets talk at the BSPR 2019 meeting
PDF
Ensembl Browser Workshop
PDF
EMBL-EBI at Plant and Animal Genome conference
PDF
Genes and Transcripts: Ensembl Online Webinar series
PDF
Variation and the VEP: Ensembl Online Webinar series
PDF
Browsing Genes, Variation and Regulation data with Ensembl
Open Targets talk at the BSPR 2019 meeting
Ensembl Browser Workshop
EMBL-EBI at Plant and Animal Genome conference
Genes and Transcripts: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar series
Browsing Genes, Variation and Regulation data with Ensembl

Recently uploaded (20)

PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
The scientific heritage No 166 (166) (2025)
PPTX
BIOMOLECULES PPT........................
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
An interstellar mission to test astrophysical black holes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Classification Systems_TAXONOMY_SCIENCE8.pptx
Introduction to Cardiovascular system_structure and functions-1
The scientific heritage No 166 (166) (2025)
BIOMOLECULES PPT........................
ECG_Course_Presentation د.محمد صقران ppt
Microbiology with diagram medical studies .pptx
Comparative Structure of Integument in Vertebrates.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
bbec55_b34400a7914c42429908233dbd381773.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The KM-GBF monitoring framework – status & key messages.pptx
An interstellar mission to test astrophysical black holes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Taita Taveta Laboratory Technician Workshop Presentation.pptx
2. Earth - The Living Planet Module 2ELS
7. General Toxicologyfor clinical phrmacy.pptx

Open Targets workshop at C4X in 2019

  • 1. Open Targets: integrating genetics, omics and chemical data for drug discovery Denise Carvalho-Silva, PhD EMBL-EBI | Open Targets Wellcome Genome Campus United Kingdom C4X Discovery June 18th 2019
  • 2. Workshop materials Presentation slides: Coursebook with exercises https://tinyurl.com/slides-c4x https://tinyurl.com/exercises-c4x
  • 3. • Introduction to Open Targets • Open Targets Platform: presentation + live demos • Hands-on exercises Lunch break à 12:00-13:00 • Open Targets Genetics: presentation + live demo • Feedback survey, wrap up, further discussion This session 10:00-15:00
  • 4. Drug discovery R&D Lengthy, costly, high attrition https://www.genengnews.com/insights/how-crispr-is-accelerating-drug-discovery/ DOI: 10.1016/j.molonc.2012.02.004
  • 5. Open Targets A partnership to transform drug discovery Founded in 2014 Founding partners 2016 2017 20182018 Systematic identification and prioritisation of targets
  • 7. DATA GENERATION Therapeutic hypothesis Public data DATA INTEGRATION Experimental projects Bioinformatics projects Knowledge cycle https://www.targetvalidation.org https://genetics.opentargets.org
  • 8. Data generation: Open Targets www.opentargets.org/science/ • Organoid knockouts (CRISPR-Cas9) - gut epithelium • IL22 pathway to treat IBD? • Wellcome Sanger Institute, GSK, University of Cambridge • Alzheimer’s and Parkinson’s • CRISPR/Cas9 screens, iPS cells • Wellcome Sanger Institute, Biogen, Gurdon Institute Some examples • > 1,000 cancer cell lines + biomarkers + tractability • RNASeq, CRISPR/Cas9 screens • Wellcome Sanger Institute, GSK and EMBL-EBI
  • 10. Behan et al (2019) WRN sustains in vivo growth in: • colorectal • ovarian • endometrial • gastric • New candidate target for tumours with MSI (WRN antagonists) Data now available in the Open Targets Platform e.g. https://www.targetvalidation.org/evidence/ENSG000001 00941/EFO_0000305?view=sec:affected_pathway
  • 12. Data integration: Open Targets Data integration: web resources https://www.opentargets.org/resources/#open-targets-platform
  • 14. What is a target? https://www.targetvalidation.org/target/ENSG00000255248 https://www.targetvalidation.org/target/ENSG00000175482?view=sec:genome_browser 28K targets Examples
  • 15. • Modified version of Experimental Factor Ontology (EFO) • Controlled vocabulary (Coeliac versus Celiac) • Hierarchy (relationships) How do we describe our diseases? • Promotes consistency • Increases the richness of annotation • Allow for easier and automatic integration 10K diseases
  • 16. Evidence for our T-D associations https://docs.targetvalidation.org/data-sources/data-sources
  • 17. Data sources grouped into data types Data type Data source https://docs.targetvalidation.org/data-sources/data-sources
  • 18. What can you do with the Open Targets Platform? • Target annotations • Target-disease associations (+ evidence + score) • Disease annotations http://www.targetvalidation.org/target/ENSG00000141510 http://www.targetvalidation.org/diseaset/EFO_0000228 https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
  • 19. Demo 1 Searching for a disease à targets What is the evidence for the association? Which targets are associated with my disease? Is there any data to help me with target prioritization e.g. tractability?
  • 21. Association score à confidence Which targets have more evidence for an association? What is the relative weight of the evidence for different targets? Overall score Genetic Somatic Drugs Pathways Expression Animal modText mining
  • 22. ΣH Calculated at four levels: • Evidence • Data source • Data type • Overall Score: 0 to 1 (max) Aggregation with (harmonic sum) ΣH Note: Each data set has its own scoring and ranking scheme S1 + S2/22 + S3/32 + S4/42 + Si/i2 European Variation Archive (germline) UniProt Gene2Phenotype GWAS catalog Cancer Gene Census European Variation Archive (somatic) IntOGen ChEMBL Reactome Expression Atlas Europe PMC PhenoDigm Genetic associations Somatic mutations RNA expression Animal models Pathways & systems biology Text mining Drugs *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *1.0 *0.2 *0.2 *0.2 Association ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH ΣH Genomics England PheWAS catalog*1.0 *1.0 ΣH ΣH weight factor SLAPenrich PROGENy ΣH *0.5 ΣH ΣH SysbioΣH *0.5 *0.5 Four-tier framework
  • 23. Statistical integration, aggregation and scoring From evidence to overall score https://docs.targetvalidation.org/getting-started/scoring 1) Evidence score (e.g. one SNP from a GWAS paper) 2) Data source score (e.g. all SNPs from the GWAS catalog) 3) Data type score (e.g. all sources of Genetic associations) 4) Overall association score
  • 24. f = sample size (cases and controls) s = predicted functional consequence (VEP) c = p value reported in the paper Computing the score for one evidence score = f * s * c f, relative occurrence of a target-disease evidence s, strength of the effect of the variant c, confidence of the observation for the target-disease evidence https://docs.targetvalidation.org/getting-started/scoring (Factors affecting the relative strength of GWAS Catalog evidence)
  • 26. Aggregating scores across the data • Using a mathematical function, the harmonic sum* where S1,S2,...,Si are the individual sorted evidence scores in descending order * PMID: 19107201, PMID: 20118918 • Advantages: A) account for replication B) deflate the effect of large amounts of data e.g. text mining
  • 27. Target-Disease Association Score EuropePMC (Text Mining) UniProt (Manual Curation) ChEMBL (Manual Curation) Overall association score VERY simplified diagram By Andrea Pierleoni
  • 29. What can you do with the Open Targets Platform? • Target annotations • Target-disease associations (+ evidence + score) • Disease annotations https://www.targetvalidation.org/target/ENSG00000141510 https://www.targetvalidation.org/diseaset/EFO_0000228 https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
  • 31. Is my target tractable?
  • 32. Target tractability buckets 1 2 3 4 5 6 7 8 • Clinical phases e.g. phase IV (bucket 1) • Cellular localization e.g. plasma membrane (bucket 4) • DrugEBIlity – ensemble score e.g. > 0.7 (bucket 5) • GO cell component (plasma membrane) 1 1 2 3 4 5 6 7 8 99
  • 33. Target tractability – small molecules • Clinical precedence: • Discovery precedence: • Predictable tractable: 1 2 3 4 5 6 7 8 1 2 3 4 7 5 6 8
  • 34. Target tractability – antibodies • Clinical precedence • Predictable tractable – high confidence • Predictable tractable – med-low confidence 1 2 4 5 6 7 8 99 3 1 2 3 4 5 6 7 8 9
  • 35. Is my target safe?Target profile page https://docs.targetvalidation.org/getting- started/getting-started/target-profile
  • 37. Demo 2 Target and disease annotations Target profile page Disease profile page
  • 38. Which disease associations are available for my target? Demo 3 Searching for a target à diseases
  • 41. Hands-on exercises 1-4 Pages 25-28 Or feel free to explore your target (gene/protein) and disease of interest https://tinyurl.com/exercises-c4x
  • 42. Modes of access → data volume
  • 44. We have a list of 20 possible targets for multiple myeloma. https://www.targetvalidation.org/batch-search Several targets at once We would like to know if these targets are represented in other diseases. Are there any pathways over represented in my set of targets?
  • 46. https://platform-api.opentargets.io/v3/platform/ public/association/filter ?target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id Server Endpoint Parameters Open Targets Platform REST API* https://docs.targetvalidation.org/tutorials/rest-api Open Targets Python client ** * ** https://docs.targetvalidation.org/tutorials/python-client
  • 47. User interface or REST API? • Scale and volume of your search or results • Flexibility in the search • Visualisation of the results
  • 51. How to get the evidence for an association http://platform- api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000198947&disease=Orphanet_98896&datat ype=genetic_association
  • 54. Extra hands-on exercises E1-E4 Pages 29-31 Or feel free to explore your target (gene/protein) and disease of interest https://tinyurl.com/exercises-c4x
  • 55. Data integration: Open Targets Data integration: web resources https://www.opentargets.org/resources/#open-targets-platform
  • 56. Open Targets Genetics. Why? • Refine genetic associations in the Open Targets Platform • Make sense of GWAS • Glimpses of the biology behind the association • Guide target ID? Lee et al (2017): PMID:29288389
  • 57. Genetics increases success Both common and rare variants
  • 58. https://genetics.opentargets.org Which targets are implicated in disease from common variant genetic analysis? Key question
  • 59. Open Targets Genetics: data sources https://genetics-docs.opentargets.org/our-approach/data-sources Functional Genomics Variant – Gene Full summary statistics Neale and colleagues v1 337,199 individuals 2,419 traits SNP-trait associations Human genetics Variant (GnomAD) – Trait (study) Sun et al. 2018 Ensembl VEP * ** *** * Javierre et al. 2016 ** Andersson et al. 2014 *** Thruman et al. 2012
  • 60. Open Targets Genetics: data model S VL G S Study (traits from UK Biobank and GWAS catalog) VL Lead variant (associated with traits from GWAS catalog) VT Tag variant (possible causal, expanded from VL) G Gene p-value GWAS Catalog UK Biobank VT Fine mapping LD expansion r2 Posterior Prob. TSS distance eQTL (GTEx V7) pQTL PCHI-C FANTOM5 VEP Aggregated functional score https://genetics-docs.opentargets.org/our-approach/pipeline-overview
  • 61. What can you do with Open Targets Genetics? Variant Trait Gene • Genes functionally implicated • Variant association across traits (PheWAS) • Variants tagged • Associated traits • Links to drugs, expression, pathway, mouse phenotype, etc in Open Targets Platform • Independently associated loci • Overlapping susceptibility loci V S G Locus plot Visualising the associations between traits, variants, and genes Fine mapping analysis
  • 62. How to access all this data? http://genetics-api.opentargets.io/ GraphQL API Bulk download* * http://blog.opentargets.org/2019/04/09/open-targets-genetics-release-is-out/ https://genetics.opentargets.io/ User interface
  • 63. The SNP rs12916 has been associated with cholesterol LDL (Teslovich et al 2010) https://genetics.opentargets.org Demo: searching for a SNP Can we use Open Targets Genetics to find which genes are functionally implicated by this variant? Which is the nearest protein coding gene to this variant? Can we compare Teslovich et al with other cholesterol LDL studies?
  • 64. DATA GENERATION Therapeutic hypothesis Public data DATA INTEGRATION Experimental projects Bioinformatics projects Wrap up https://www.targetvalidation.org https://genetics.opentargets.org
  • 65. Upcoming data 50K exome sequence data Community ML model: learn the weight to give to the different data sources e.g. VEP
  • 67. blog.opentargets.org/ @targetvalidate http://tinyurl.com/opentargets-in Get in touch https://tinyurl.com/opentargets-youtube Open Targets Platform Open Targets Genetics docs.targetvalidation.org genetics-docs.opentargets.org support@targetvalidation.org geneticsportal@opentargets.org
  • 69. Your feedback is important https://tinyurl.com/london-180619
  • 70. Details on data sources to associate targets and diseases Other Open Targets resources Extra extra extra
  • 71. Target safety • Clinical phases e.g. phase IV (bucket 1) • Cellular localization e.g. plasma membrane (bucket 4) • DrugEBIlity – ensemble score e.g. > 0.7 (bucket 5)1 https://docs.targetvalidation.org/getting-started/getting-started/target-profile
  • 72. Data source: GWAS catalog • Genome Wide Association Studies • Array-based chips à genotyping 100,000 SNPs genome wide Data type
  • 73. Data source: UniProt • Protein: sequence, annotation, function • Manual curation of coding variants in patients EMBL-EBI train online Data types
  • 74. • Variants, genes, phenotypes in rare diseases • Literature curation à consultant clinical geneticists in the UK Data source: Gene2Phenotype https://www.nature.com/articles/s41467-019-10016-3 Data type
  • 75. Data source: UniProt • Protein: sequence, annotation, function • Manual curation of coding variants in patients EMBL-EBI train online Data type
  • 76. Data source: PheWAS • Phenome Wide Association Studies • A variant associated with multiple phenotypes • Clinical phenotypes derived from EMR-linked biobank BioVU • ICD9 codes mapped to EFO Data type
  • 77. Data source: GE PanelApp • Aid clinical interpretation of genomes for the 100K project • We include ‘green genes’ from version 1+ and phenotypes Data type
  • 78. Data source: EVA • With ClinVar information for rare diseases • Clinical significance: pathogenic, protective EMBL-EBI train online Data types
  • 79. Data source: The Cancer Gene Census • Genes with mutations causally implicated in cancer • Gene associated with a cancer plus other cancers associated with that gene Data type
  • 80. Data source: IntOGen • Genes and somatic (driver) mutations, 28 cancer types • Involvement in cancer biology • TCGA data • Rubio-Perez et al. 2015 Data type
  • 81. Data source: ChEMBL • Known drugs linked to a disease and a known target • FDA approved for clinical trials or marketing EMBL-EBI train online Data type
  • 82. Data source: Reactome • Biochemical reactions and pathways • Manual curation of pathways affected by mutations EMBL-EBI train online Data type
  • 83. Data source: SLAPenrich • 374 pathways curated and mapped to cancer hallmarks • Divergence of the total number of (TCGA) cancer samples with genomic alterations • Mutational burden and total exonic block length of genes • Downweighed (x 0.5) Data type
  • 84. Data source: PROGENy • Comparison of pathway activities between normal and primary samples from TCGA • Inferred from RNA-seq: 9,250 tumour and 741 normal samples • EGFR, hypoxia, JAK.STAT, MAPK, NFkB, PI3K, TGFb, TNFa, Trail, VEGF, and p53 • Downweighed (x 0.5) Data type
  • 85. Data source: SysBio • Curation of four systems biology papers • Available for six gene lists: ~ 400 genes • Late onset Alzheimers, cognitive decline, CHD, and IBD • Score: p-values or rank-based scores if available, if not s=0.5 • Downweighed (x 0.5) Data type https://docs.targetvalidation.org/data-sources/affected-pathways#sysbio
  • 86. Data source: CRISPR • papers • Available for ~ 400 genes Data type
  • 87. Data source: Expression Atlas • Baseline expression for human genes - target profile page • Differential mRNA expression (healthy versus diseased): - target-disease associations • No propagation in the disease ontology • Downweighed (x 0.2) EMBL-EBI train online Data type
  • 88. Data source: Europe PMC • Mining titles, abstracts, full text in research articles • Target and disease co-occurrence in the same sentence • Dictionary (not NLP) • Downweighed (x 0.2) EMBL-EBI train online Data type
  • 89. Data sources: PhenoDigm • Semantic approach to associate mouse models with diseases • Similarity score between a mouse model and a human disease (Smedley et al 2013) • Downweighed (x 0.2) Data type
  • 90. Paste the URL in the location bar of your browser How to run our REST endpoints (option 1)
  • 92. Command line e.g. CURL –X GET How to run our REST endpoints (option 2)
  • 93. How to run our REST endpoints (option 3) Use our free Python client * * https://docs.targetvalidation.org/tutorials/python-client Advantage: you can change the way the associations are scored e.g. increase the weight given to text mining data
  • 94. REST API calls: some examples* https://platform-api.opentargets.io/v3/platform/public/search?q=alzheimer%27s https://platform-api.opentargets.io/v3/platform/public/association/filter?target=ENSG00000142192 https://platform-api.opentargets.io/v3/platform/public/association/filter ?target=ENSG00000142192&direct=true https://platform- api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000142192&disease=EFO_00002 49&datasource=uniprot&direct=true
  • 95. Open Targets REST API Private: methods used by the UI to serve external data. Subject to change without notice https://platform-api.opentargets.io/v3/platform/docs/swagger-ui
  • 96. LINK • LINK: Literature coNcept Knowledgebase • Subject / predicate / object structured relations From PubMed abstracts Proof of Concept Further developement http://link.opentargets.io/
  • 97. Addressing text mining shortcomings • Entities: genes, diseases, drugs • Concepts extracted via NLP (Natural Language Processing) • 28 M documents, 500 M relations • http://blog.opentargets.org/link/
  • 98. DoRothEA • Candidate TF-drug interactions in cancer • 1000 cancer cell lines • 265 anti-cancer compounds • 127 transcription factors http://cancerres.aacrjournals.org/content/early/2017/12/09/0008-5472.CAN-17-1679 dorothea.opentargets.io
  • 99. Example: Rapamycin • ~ 1000 cancer cell lines • 265 anti-cancer compounds • 127 transcription factors