SlideShare a Scribd company logo
Incorporating Commercial and Private
Data into an Open Linked Data Platform
for Drug Discovery
Carole Goble, Alasdair J G Gray, Lee Harland,
Karen Karapetyan, Antonis Loizou, Ivan Mikhailov,
Yrjänä Rankka, Stefan Senger, Valery Tkachenko,
Antony J Williams, and Egon L Willighagen
www.openphacts.org
@open_phacts

A.J.G.Gray@hw.ac.uk
@gray_alasdair
Pre-competitive Informatics
Pharmaceutical companies are all accessing, processing,
storing & re-processing external research data

Literature Genbank
Patents PubChem

Data Integration

Databases

Data Analysis

Downloads

x

Repeat @
each
company

Firewalled Databases

Lowering industry firewalls: pre-competitive informatics in drug discovery
25/10/2013 Reviews Drug Discovery (2009) 2013
ISWC 8, 701-708 doi:10.1038/nrd2944
Nature

1
Open PHACTS objective
Apps

Interactive
responses

Open
Standards
Domain API

Provenance of
data

Drug Discovery Platform
Production
quality
25/10/2013

ISWC 2013

2
Drug Discovery Data
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

3
Public Data
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

4
Real Business Questions
“Let me compare
Proteins logP and PSA
MW,
for known
oxidoreductase
Genes
inhibitors”

Pathways
Interactions

“What is the
Pharmacological
selectivity profile of
Activities
known p38 inhibitors?”

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

“Find me compounds
that inhibit targets in
NFkB pathway assayed
in only functional assays
with a potency <1 μM”

Drugs
Indications

Diseases

Compounds

ISWC 2013

5
OPS Discovery Platform

Core Platform

Apps
Identity
Resolution
Service
Identifier
Management
Service

“Adenosine
receptor 2a”

Linked Data API (RDF/XML, TTL, JSON)

P12374
EC2.43.4
CS4532

Domain
Specific
Services

Semantic Workflow Engine
Chemistry
Registration
Normalisatio
n & Q/C

Data Cache
(Virtuoso Triple Store)

Indexing
VoID

VoID

VoID

Nanopub

Public
Ontologies

Db

Db

25/10/2013

VoID

Nanopub

Db

Nanopub

Db

ISWC 2013

Public Content

VoID

Commercial

User
Annotations

6
Present Content: Public Data
Source

Initial Records

Triples

Properties

ChEMBL

1,247,403

305,419,649

77

19,628

517,584

74

?

533,394,147

82

6,187

73,838

2

ChEBI

40,575

40,575

2

GeneOntology

38,137

1,265,273

26

?

23,489,501

15

ChemSpider

1,194,437

161,336,857

26

ConceptWiki

2,828,966

3,739,884

1

946

1,449,981

34

DrugBank
UniProt
ENZYME

GOA

WikiPathways
25/10/2013

ISWC 2013

7
Semantic Integration Methodology
1. Define use cases
2. Identify Data
–
–

Create RDF
VoID dataset descriptions

3. Create mappings
–
–

25/10/2013

between data set and known data sets
(instance level)
index for text to URL conversion

ISWC 2013

8
Semantic Integration Methodology
4. Ingest RDF into data cache
(i.e. triple store)
5. Define access paths to core concepts in data
6. Extend or create SPARQL queries for API calls
7. Publish API calls

25/10/2013

ISWC 2013

9
Commercial Data Use Case
“What is the
selectivity profile of
known p38 inhibitors?”

• Comprehensive data
coverage
– Commercial data
collections
– Extensive private
collections

“There is relevant
data in various
commercial datasets.”

• Control data responses
– Only authorised data

“My company X has its
own private dataset on
this topic.”

25/10/2013

ISWC 2013

10
Commercial Data Sets Pilot
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

11
Linked Open Data
★

make your stuff available on the Web
(whatever format) under an open license
★★
make it available as structured data
(e.g. Excel instead of image scan of a table)
★★★
use non-proprietary formats
(e.g. CSV instead of Excel)
★★★★ use URIs to denote things,
so that people can point at your stuff
★★★★★ link your data to other data
to provide context
http://5stardata.info/
25/10/2013

ISWC 2013

12
Commercial Linked Data
• Same conversion challenges as Open Data!
– Goal to have 5★ linked data
– www.openphacts.org/specs/rdfguide/

• Pilot (sample) data provided as data dumps
– XML
– CSV
– RDF

• Structurally similar to ChEMBL
• Converted to interoperable RDF
25/10/2013

ISWC 2013

13
Data Modelling Challenges
• Contain private terminologies
– Mapped to public equivalents
– On going work

• Units represented as strings
– Not always consistent, e.g. IC50, IC_50, IC-50
– QUDT extended, e.g. IC50
– www.openphacts.org/specs/units/

25/10/2013

ISWC 2013

14
Dataset Descriptions
www.openphacts.org/specs/datadesc/

Enable
• Discovery
– Name
– Description
– Coverage

• Access control
– License
– File locations

• Answer Provenance
– Returned data links to
description
25/10/2013

Commercial Data
Description
• Publicly discoverable
– Advertisement for data
– Bring in more customers

• Restricted access by
license
Private Data Description
• Hidden to all but
authorised
• Restricted access
ISWC 2013

15
Chemical mappings
• Data is messy!
• Identify common
problems:
– Charge imbalance
– Stereochemistry

• Link based on structure

25/10/2013

ISWC 2013

16
Chemistry Registration
ChemSpider Service

Chemical Registration Service

• Validates and
standardizes chemical
representations
• Manual curation by
RSC staff
• Data loaded in
ChemSpider
• Open data: unsuitable for

• Utilizes ChemSpider
Validation and
Standardization platform
• Utilizes FDA rule set as
basis for standardization
• Generates OPSID for
chemicals
• Computes properties

– Commercial data
– Private data

25/10/2013

ISWC 2013

17
Access Requirements

25/10/2013

ISWC 2013

18
Data Access
• Each data set loaded into separate graph in
cache
• Pilot data same form as open ChEMBL data
– Extend queries with sub-queries for each set

• Restricted access
– Virtuoso offers graph-based access restriction
– Commercial data sets turned on/off

25/10/2013

ISWC 2013

19
Conclusions
• Drug discovery requires full data coverage
– Public/open data
• Open description
• Open data

– Commercial data
• Open description
• Restricted data

– Private data
• Restricted description
• Restricted data

• Pilot study with three commercial datasets
25/10/2013

ISWC 2013

20
Conclusions
• Data Modelling
– Similar challenges as public data

• Access restriction
– Provided by standard mechanisms
– Graph-based access

• Open PHACTS Discovery Platform
– Releasing version 1.3 (late 2013)
– Version 1.4 will contain commercial data (2014)
25/10/2013

ISWC 2013

21
Acknowledgements
• GVK Bio
GOSTAR
gostardb.com
• Thomson Reuters
Integrity
integrity.thomson-pharma.com
• Aureus Sciences Elsevier
AurSCOPE
www.aureus-sciences.com
25/10/2013

ISWC 2013

22
Questions
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33
@gray_alasdair

Open PHACTS Project

pmu@openphacts.org
www.openphacts.org
@open_phacts

More Related Content

PPTX
Lab data integrity
PDF
FDA Data Integrity: Misconceptions of 21 CFR Part 11
PDF
FDA's top data integrity issues during the inspections
PPTX
Pharmaceutical Data integrity training
PPTX
Delta GMP Data Integrity Sept2016
PDF
Data integrity Presentation@GCC Regulatory Summit April-2017
PDF
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
DOCX
Clinical sas course syllabus
Lab data integrity
FDA Data Integrity: Misconceptions of 21 CFR Part 11
FDA's top data integrity issues during the inspections
Pharmaceutical Data integrity training
Delta GMP Data Integrity Sept2016
Data integrity Presentation@GCC Regulatory Summit April-2017
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
Clinical sas course syllabus

What's hot (20)

PDF
Data integrity
PDF
Data integrity
PDF
Liraglutide - Comprehensive patent search
PDF
Rifaximin - Comprehensive patent search
PDF
FDA Data Integrity Issues - DMS hot fixes
PPTX
Gdp alcoa
PPTX
Clinical Data Models - The Hyve - Bio IT World April 2019
PPTX
Ensuring data integrity in pharmaceutical environment
PDF
Data integrity nishodh 01092016
PPTX
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
PPTX
Data Integrity Issues in Pharmaceutical Companies
PPTX
Data integrity: TGA expectations
PDF
Vilazodone - Comprehensive patent search
PDF
Planning for the New Individual Case Safety Report (ICSR) International Stand...
PDF
Colesevelam - Comprehensive patent search
PDF
Data Integrity; Ensuring GMP Six Systems Compliance Pharma Training
PDF
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
PDF
Data integrity - Regulatory Perspective and Challenges:
PDF
Treprostinil - Comprehensive patent search
PDF
Nevirapine - Comprehensive patent search
Data integrity
Data integrity
Liraglutide - Comprehensive patent search
Rifaximin - Comprehensive patent search
FDA Data Integrity Issues - DMS hot fixes
Gdp alcoa
Clinical Data Models - The Hyve - Bio IT World April 2019
Ensuring data integrity in pharmaceutical environment
Data integrity nishodh 01092016
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
Data Integrity Issues in Pharmaceutical Companies
Data integrity: TGA expectations
Vilazodone - Comprehensive patent search
Planning for the New Individual Case Safety Report (ICSR) International Stand...
Colesevelam - Comprehensive patent search
Data Integrity; Ensuring GMP Six Systems Compliance Pharma Training
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
Data integrity - Regulatory Perspective and Challenges:
Treprostinil - Comprehensive patent search
Nevirapine - Comprehensive patent search
Ad

Similar to Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery (20)

PPTX
2013 01-14 ops-dataset_descriptions
PPTX
Open PHACTS (Sept 2013) EBI Industry Programme
PDF
Opening up pharmacological space, the OPEN PHACTs api
PPT
2011-10-11 Open PHACTS at BioIT World Europe
PDF
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
PPT
2011-11-28 Open PHACTS at RSC CICAG
PPTX
Open PHACTS for BDE SC1.1
PPTX
Practical semantics in the pharmaceutical industry - the Open PHACTS project
PPTX
Data Integration in a Big Data Context: An Open PHACTS Case Study
PDF
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
PPTX
The crusade for big data in the AAL domain
PPTX
Linked Data for Biopharma
PPTX
BigDataEurope - Big Data & Health
PPTX
Open PHACTS : Linked Data Future Challenges
PPTX
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
PDF
Linked data in pharma R&D
PPTX
THOR Workshop - Data Publishing Elsevier
2013 01-14 ops-dataset_descriptions
Open PHACTS (Sept 2013) EBI Industry Programme
Opening up pharmacological space, the OPEN PHACTs api
2011-10-11 Open PHACTS at BioIT World Europe
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2011-11-28 Open PHACTS at RSC CICAG
Open PHACTS for BDE SC1.1
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Data Integration in a Big Data Context: An Open PHACTS Case Study
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
The crusade for big data in the AAL domain
Linked Data for Biopharma
BigDataEurope - Big Data & Health
Open PHACTS : Linked Data Future Challenges
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
Linked data in pharma R&D
THOR Workshop - Data Publishing Elsevier
Ad

More from Alasdair Gray (20)

PPTX
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
PPTX
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
PPTX
An Identifier Scheme for the Digitising Scotland Project
PPTX
Supporting Dataset Descriptions in the Life Sciences
PPTX
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
PPTX
Validata: A tool for testing profile conformance
PPTX
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
PPTX
Open PHACTS: The Data Today
PPTX
Project X
PPTX
Data Integration in a Big Data Context
PPTX
Data Linkage
PPTX
Scientific lenses to support multiple views over linked chemistry data
PPTX
Scientific Lenses over Linked Data An approach to support multiple integrate...
PPTX
Describing Scientific Datasets: The HCLS Community Profile
PPTX
SensorBench
PPTX
Data Science meets Linked Data
PPTX
Sensors and Big Data for Health and Well-being
PPTX
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
PPTX
Dataset Descriptions in Open PHACTS and HCLS
PPTX
Computing Identity Co-Reference Across Drug Discovery Datasets
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
An Identifier Scheme for the Digitising Scotland Project
Supporting Dataset Descriptions in the Life Sciences
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Validata: A tool for testing profile conformance
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Open PHACTS: The Data Today
Project X
Data Integration in a Big Data Context
Data Linkage
Scientific lenses to support multiple views over linked chemistry data
Scientific Lenses over Linked Data An approach to support multiple integrate...
Describing Scientific Datasets: The HCLS Community Profile
SensorBench
Data Science meets Linked Data
Sensors and Big Data for Health and Well-being
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Dataset Descriptions in Open PHACTS and HCLS
Computing Identity Co-Reference Across Drug Discovery Datasets

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A comparative analysis of optical character recognition models for extracting...
Machine Learning_overview_presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery

  • 1. Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery Carole Goble, Alasdair J G Gray, Lee Harland, Karen Karapetyan, Antonis Loizou, Ivan Mikhailov, Yrjänä Rankka, Stefan Senger, Valery Tkachenko, Antony J Williams, and Egon L Willighagen www.openphacts.org @open_phacts A.J.G.Gray@hw.ac.uk @gray_alasdair
  • 2. Pre-competitive Informatics Pharmaceutical companies are all accessing, processing, storing & re-processing external research data Literature Genbank Patents PubChem Data Integration Databases Data Analysis Downloads x Repeat @ each company Firewalled Databases Lowering industry firewalls: pre-competitive informatics in drug discovery 25/10/2013 Reviews Drug Discovery (2009) 2013 ISWC 8, 701-708 doi:10.1038/nrd2944 Nature 1
  • 3. Open PHACTS objective Apps Interactive responses Open Standards Domain API Provenance of data Drug Discovery Platform Production quality 25/10/2013 ISWC 2013 2
  • 4. Drug Discovery Data Pathways Proteins Interactions Genes Pharmacological Activities Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 Drugs Indications Diseases Compounds ISWC 2013 3
  • 6. Real Business Questions “Let me compare Proteins logP and PSA MW, for known oxidoreductase Genes inhibitors” Pathways Interactions “What is the Pharmacological selectivity profile of Activities known p38 inhibitors?” Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” Drugs Indications Diseases Compounds ISWC 2013 5
  • 7. OPS Discovery Platform Core Platform Apps Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” Linked Data API (RDF/XML, TTL, JSON) P12374 EC2.43.4 CS4532 Domain Specific Services Semantic Workflow Engine Chemistry Registration Normalisatio n & Q/C Data Cache (Virtuoso Triple Store) Indexing VoID VoID VoID Nanopub Public Ontologies Db Db 25/10/2013 VoID Nanopub Db Nanopub Db ISWC 2013 Public Content VoID Commercial User Annotations 6
  • 8. Present Content: Public Data Source Initial Records Triples Properties ChEMBL 1,247,403 305,419,649 77 19,628 517,584 74 ? 533,394,147 82 6,187 73,838 2 ChEBI 40,575 40,575 2 GeneOntology 38,137 1,265,273 26 ? 23,489,501 15 ChemSpider 1,194,437 161,336,857 26 ConceptWiki 2,828,966 3,739,884 1 946 1,449,981 34 DrugBank UniProt ENZYME GOA WikiPathways 25/10/2013 ISWC 2013 7
  • 9. Semantic Integration Methodology 1. Define use cases 2. Identify Data – – Create RDF VoID dataset descriptions 3. Create mappings – – 25/10/2013 between data set and known data sets (instance level) index for text to URL conversion ISWC 2013 8
  • 10. Semantic Integration Methodology 4. Ingest RDF into data cache (i.e. triple store) 5. Define access paths to core concepts in data 6. Extend or create SPARQL queries for API calls 7. Publish API calls 25/10/2013 ISWC 2013 9
  • 11. Commercial Data Use Case “What is the selectivity profile of known p38 inhibitors?” • Comprehensive data coverage – Commercial data collections – Extensive private collections “There is relevant data in various commercial datasets.” • Control data responses – Only authorised data “My company X has its own private dataset on this topic.” 25/10/2013 ISWC 2013 10
  • 12. Commercial Data Sets Pilot Pathways Proteins Interactions Genes Pharmacological Activities Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 Drugs Indications Diseases Compounds ISWC 2013 11
  • 13. Linked Open Data ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g. Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g. CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context http://5stardata.info/ 25/10/2013 ISWC 2013 12
  • 14. Commercial Linked Data • Same conversion challenges as Open Data! – Goal to have 5★ linked data – www.openphacts.org/specs/rdfguide/ • Pilot (sample) data provided as data dumps – XML – CSV – RDF • Structurally similar to ChEMBL • Converted to interoperable RDF 25/10/2013 ISWC 2013 13
  • 15. Data Modelling Challenges • Contain private terminologies – Mapped to public equivalents – On going work • Units represented as strings – Not always consistent, e.g. IC50, IC_50, IC-50 – QUDT extended, e.g. IC50 – www.openphacts.org/specs/units/ 25/10/2013 ISWC 2013 14
  • 16. Dataset Descriptions www.openphacts.org/specs/datadesc/ Enable • Discovery – Name – Description – Coverage • Access control – License – File locations • Answer Provenance – Returned data links to description 25/10/2013 Commercial Data Description • Publicly discoverable – Advertisement for data – Bring in more customers • Restricted access by license Private Data Description • Hidden to all but authorised • Restricted access ISWC 2013 15
  • 17. Chemical mappings • Data is messy! • Identify common problems: – Charge imbalance – Stereochemistry • Link based on structure 25/10/2013 ISWC 2013 16
  • 18. Chemistry Registration ChemSpider Service Chemical Registration Service • Validates and standardizes chemical representations • Manual curation by RSC staff • Data loaded in ChemSpider • Open data: unsuitable for • Utilizes ChemSpider Validation and Standardization platform • Utilizes FDA rule set as basis for standardization • Generates OPSID for chemicals • Computes properties – Commercial data – Private data 25/10/2013 ISWC 2013 17
  • 20. Data Access • Each data set loaded into separate graph in cache • Pilot data same form as open ChEMBL data – Extend queries with sub-queries for each set • Restricted access – Virtuoso offers graph-based access restriction – Commercial data sets turned on/off 25/10/2013 ISWC 2013 19
  • 21. Conclusions • Drug discovery requires full data coverage – Public/open data • Open description • Open data – Commercial data • Open description • Restricted data – Private data • Restricted description • Restricted data • Pilot study with three commercial datasets 25/10/2013 ISWC 2013 20
  • 22. Conclusions • Data Modelling – Similar challenges as public data • Access restriction – Provided by standard mechanisms – Graph-based access • Open PHACTS Discovery Platform – Releasing version 1.3 (late 2013) – Version 1.4 will contain commercial data (2014) 25/10/2013 ISWC 2013 21
  • 23. Acknowledgements • GVK Bio GOSTAR gostardb.com • Thomson Reuters Integrity integrity.thomson-pharma.com • Aureus Sciences Elsevier AurSCOPE www.aureus-sciences.com 25/10/2013 ISWC 2013 22

Editor's Notes

  • #2: Big waste of resourcesNo competitive advantage
  • #3: A platform for integratedpharmacology data Reliedupon by pharma companiesPublic domain, commercial, and private data sourcesBasedon open standardsProvidesdomainspecific APIMakingiteasyto build multiple drugdiscoveryapplications:examplesdeveloped in the project
  • #4: Comprehensive picture requiredDrug discovery requires data from lots of domains:Chemistry: compound propertiesGenes:Proteins:Drug interactionsDiseasesBiological processes/interactionsPharmacology activities
  • #5: Already exist many public open datasets
  • #6: Driven by real business questions and use cases
  • #7: Cache copies of data: (1) Performance (2) Social (1000s hits a day)Chemistry data normalisation/alignment through ChemSpiderDomain specific APIAPI calls populate SPARQL queries
  • #8: Statistics to be added1,030,727,289 triplesHosted on beefy hardware; data in memory (aim)
  • #11: More data!Driven by users
  • #12: Sample of three commercial datasetsInformation on handful of targets only
  • #13: Commercial data does not have open license
  • #14: Same challenges as open data: data about the same things
  • #16: Statistics may affect the perception of the commercial dataset
  • #17: Data is messy; even chemicalsRequire linking between datasetsCommon backbone to integrate data
  • #18: FDA: US Food and Drug Administration Substance Registration SystemOPSID: Open PHACTS identifier for chemicals
  • #19: Data sets with license requirements