SlideShare a Scribd company logo
Including Co-referent URIs
in a SPARQL Query
Christian Y A Brenninkmeijer,
Carole Goble, Alasdair J G Gray, Paul Groth,
Antonis Loizou, and Steve Pettifer

www.openphacts.org
@open_phacts

A.J.G.Gray@hw.ac.uk
@gray_alasdair
Multiple Identities
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws.html

GB:29384

P12047

Are these the
same thing?

X31045

22/10/2013

COLD 2013

1
Gleevec® = Imatinib Mesylate
Imatinib

Imatinib Mesylate Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N

ChemSpider
22/10/2013

Drugbank
COLD 2013

PubChem
2
22/10/2013

COLD 2013

3
22/10/2013

COLD 2013

4
Multiple Links: Different Reasons

Link: skos:closeMatch
Reason: non-salt form

22/10/2013

Link: skos:exactMatch
Reason: drug name

COLD 2013

6
Dynamic Equality
Strict

Relaxed

Analysing

Browsing

skos:exactMatch
(InChI)

22/10/2013

COLD 2013

7
Dynamic Equality
Strict

Relaxed

Analysing

Browsing
skos:closeMatch
(Drug Name)

skos:exactMatch
(InChI)

skos:closeMatch
(Drug Name)
22/10/2013

COLD 2013

8
Open PHACTS Discovery Platform
Apps
Interactive
responses
Method
Calls

Domain API

Drug Discovery Platform
Production quality
integration platform

22/10/2013

COLD 2013

9
Integration Approach
•
•
•
•

Data kept in original model
Data cached in central triple store
API call translated to SPARQL query
Query expressed in terms of original data

22/10/2013

COLD 2013

10
OPS Discovery Platform

Core Platform

Apps
Identity
Resolution
Service
Identifier
Management
Service

“Adenosine
receptor 2a”

Linked Data API (RDF/XML, TTL, JSON)

P12374
EC2.43.4
CS4532

Domain
Specific
Services

Semantic Workflow Engine
Chemistry
Registration
Normalisatio
n & Q/C

Data Cache
(Virtuoso Triple Store)

Indexing
VoID

VoID

VoID

Nanopub

Public
Ontologies

Db

Db

22/10/2013

VoID

Nanopub

Db

Nanopub

Db

COLD 2013

Public Content

VoID

Commercial

User
Annotations

11
Platform Interaction
1. Resolve user input:
– User enters search text
– Resolve to a URI for concept

2. Request data for URI
– Expand URI to equivalent for each dataset
– Run resulting SPARQL query

22/10/2013

COLD 2013

12
Query Expansion
GRAPH <http://rdf.chemspider.com> {
cw:979b545d-f9a9 cheminf:logd ?logd .
?iri cheminf:logd ?logd .
FILTER (?iri = cw:979b545d-f9a9 ||
?iri = cs:2157 ||
cw:979b545d-f9a9, L
?iri = chembl:1280 || [cw:979b545d-f9a9, 1
cs:2157,
?iri = db:db00945 )

}

Q, L1

Q’

Query Expander
Service

chembl:1280,
db:db00945]

Identity
Mapping Service
(BridgeDB)

Can also be achieved through UNION

Mappings
Profiles

22/10/2013

COLD 2013

13
Experiment
Is it feasible to use a stand-off
mapping service?
• Base lines (no external call):
– “Perfect” URIs
– Linked data querying

• Expansion approaches (external service call):
– FILTER by Graph
– UNION by Graph
22/10/2013

COLD 2013

14
“Perfect” URI Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
chembl_mol:m1280 cheminf:mw ?mw .
}
}

22/10/2013

COLD 2013

15
Linked Data Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
?chemblid cheminf:mw ?mw .
}
cs:2157 skos:exactMatch ?chemblid .
}

22/10/2013

COLD 2013

16
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
22/10/2013

COLD 2013

17
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
22/10/2013

COLD 2013

18
Datasets and Links
Data:
167,783,592 triples

22/10/2013

Mappings:
2,114,584 triples

COLD 2013

Lenses:
1

19
Average execution times

22/10/2013

COLD 2013

20
0.018

Average execution times

22/10/2013

COLD 2013

21
22/10/2013

COLD 2013

28
Conclusions
• Query expansion slower in general
– Due to separate service call
– Difference below human perception
– UNION faster than FILTER on Virtuoso

• Stand-off mappings feasible
• Infrastructure can support lenses
Strict

Relaxed

Analysing

Browsing

22/10/2013

COLD 2013

29
Questions
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33
@gray_alasdair

Open PHACTS Project

pmu@openphacts.org
www.openphacts.org
@open_phacts

More Related Content

PPTX
Observing Linked Data Dynamics
PPT
Noti átomo
PPTX
Dataset Descriptions in Open PHACTS and HCLS
PPTX
Data Integration in a Big Data Context: An Open PHACTS Case Study
PPTX
SensorBench
PPTX
Scientific Lenses over Linked Data An approach to support multiple integrate...
PPT
Things to see in london
PDF
Bota papa noel_foamy
Observing Linked Data Dynamics
Noti átomo
Dataset Descriptions in Open PHACTS and HCLS
Data Integration in a Big Data Context: An Open PHACTS Case Study
SensorBench
Scientific Lenses over Linked Data An approach to support multiple integrate...
Things to see in london
Bota papa noel_foamy

Viewers also liked (11)

PPTX
Data Linkage
PPTX
Sensors and Big Data for Health and Well-being
PDF
Sistema glandular
PPTX
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
PPT
Ed pronunciation
PPTX
2013 01-14 ops-dataset_descriptions
PDF
Bota navidad
PDF
mit gclog
PPTX
Data Science meets Linked Data
PPTX
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
PPTX
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Data Linkage
Sensors and Big Data for Health and Well-being
Sistema glandular
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Ed pronunciation
2013 01-14 ops-dataset_descriptions
Bota navidad
mit gclog
Data Science meets Linked Data
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Ad

Similar to Including Co-Referent URIs in a SPARQL Query (20)

PPTX
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
PPTX
Computing Identity Co-Reference Across Drug Discovery Datasets
PPTX
Scientific lenses to support multiple views over linked chemistry data
PPTX
Practical semantics in the pharmaceutical industry - the Open PHACTS project
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
PPT
Towards semantic systems chemical biology
PDF
Opening up pharmacological space, the OPEN PHACTs api
PDF
PubChem for drug discovery in the age of big data and artificial intelligence
PPTX
Building linked data large-scale chemistry platform - challenges, lessons and...
PDF
Substructure Search Face-off
PPT
Revolution in the Connectivity Between Medicinal Chemistry and Biology
PPTX
BioPAX Models and Pathways
PPTX
Assessing GtoPdb ligand content in PubChem
PPT
The royal society of chemistry and its adoption of semantic web technologies ...
PPTX
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
PPT
2011-10-11 Open PHACTS at BioIT World Europe
PDF
Uni protsparqlcloud
PDF
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
PDF
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Computing Identity Co-Reference Across Drug Discovery Datasets
Scientific lenses to support multiple views over linked chemistry data
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
Towards semantic systems chemical biology
Opening up pharmacological space, the OPEN PHACTs api
PubChem for drug discovery in the age of big data and artificial intelligence
Building linked data large-scale chemistry platform - challenges, lessons and...
Substructure Search Face-off
Revolution in the Connectivity Between Medicinal Chemistry and Biology
BioPAX Models and Pathways
Assessing GtoPdb ligand content in PubChem
The royal society of chemistry and its adoption of semantic web technologies ...
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
2011-10-11 Open PHACTS at BioIT World Europe
Uni protsparqlcloud
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Ad

More from Alasdair Gray (9)

PPTX
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
PPTX
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
PPTX
An Identifier Scheme for the Digitising Scotland Project
PPTX
Supporting Dataset Descriptions in the Life Sciences
PPTX
Validata: A tool for testing profile conformance
PPTX
Open PHACTS: The Data Today
PPTX
Project X
PPTX
Data Integration in a Big Data Context
PPTX
Describing Scientific Datasets: The HCLS Community Profile
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
An Identifier Scheme for the Digitising Scotland Project
Supporting Dataset Descriptions in the Life Sciences
Validata: A tool for testing profile conformance
Open PHACTS: The Data Today
Project X
Data Integration in a Big Data Context
Describing Scientific Datasets: The HCLS Community Profile

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Cell Structure & Organelles in detailed.
PPTX
master seminar digital applications in india
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
O5-L3 Freight Transport Ops (International) V1.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra
human mycosis Human fungal infections are called human mycosis..pptx
Cell Structure & Organelles in detailed.
master seminar digital applications in india
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Module 4: Burden of Disease Tutorial Slides S2 2025
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
O7-L3 Supply Chain Operations - ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Types and Its function , kingdom of life
Final Presentation General Medicine 03-08-2024.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Institutional Correction lecture only . . .

Including Co-Referent URIs in a SPARQL Query

Editor's Notes

  • #2: Each captures a subtly different view of the worldAre they the same? … depends on your point of view
  • #3: Example drug:Gleevec Cancer drug for leukemiaLookup in three popular public chemical databasesDifferent resultsData is messy!
  • #4: Enter with ChemSpider URI forImatinibThis is not Gleevec
  • #7: sameAs != sameAs depends on your point of viewLinks relate individual data instances: source, target, predicate, reason.Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
  • #10: A platform for integratedpharmacology data Reliedupon by pharma companiesPublic domain, commercial, and private data sourcesProvidesdomainspecific APIMakingiteasyto build multiple drugdiscoveryapplications:examplesdeveloped in the project
  • #11: Step 2 requires expansion of URI to cover those used in data setsPerformed by query expansion service and IMS
  • #12: Import data into cacheDomain specific APIAPI calls populate SPARQL queriesQueries expanded by IMS to cover URIs of original datasets
  • #13: Step 2 requires expansion of URI to cover those used in data setsPerformed by query expansion service and IMS
  • #14: Query with URIsExtract URIsFind equivalentsExpand queryOptimise based on context
  • #18: Result size in brackets
  • #19: Result size in brackets
  • #20: Subset of the OPS data
  • #21: Linked data approach performs badly with query 6 due to the query constructionName being bound to the chemical structure returned
  • #22: Focus on other queriesIn general expansion is slower than base linesWorst case delta: 0.01842 (under 20ms)Human perception is 0.050 to 0.2
  • #29: Focus on query 6No linked data as it performed very poorly on this querySize of result obliterates external call cost