SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes

10-12-2014
SAFE
Policy Aware SPARQL Query Federation
Over RDF Data Cubes
Dr. Ratnesh Sahay
Semantics in eHealth & Life Sciences (SeLS)
Insight Centre for Data Analytics
NUI Galway, Ireland
SWAT4LS-2014, Berlin
Germany

Enabling networked knowledge
2
Linked2Safety - Showcases
1. Showcase #1 – Phase III Clinical Trial: Subject Selection Criteria:
 the unbiased randomised selection of subjects in phase III clinical
trials
 e.g. return subjects with diabetesValue > 4 and weight >80 and
hasCancer
2. Showcase #2 – Phase IV Post Marketing Surveillance trial:
 the pharmacovigilance of a drug after it receives permission to be
sold
 e.g. Test Drugx association with headaches
3. Showcase #3 – Chemoinformatics:
 identification of relations between molecular fragments and
specific adverse side effect categories.
 e.g. Test chemicalFragmentX(of DrugX) with rash

3
The Problem
return number of patients that have been administered the drug Insulin and exhibit
BMI > 25 and Hypertension and Diabetes as adverse events
Switzerland Cyprus Greece

Safety First – Ethical & Legal Aspects
4
Patients’ anonymity Data Ownership & Privacy
 Anonymised Clinical Data cubes
 Insensitive clinical parameters without
personal information
 Access-Control Based Query Federation

CING
(Data Cubes)
5
SAFE - Secure SPARQL Query Federation
CHUV
(Data Cubes)
ZEINCR
O
(Data Cubes)
RDF Data Cubes
Index
RDF Data Cubes
RDF Data Cubes
Access
Policy Model
SPARQL Query
Source
Selection
Access
Policy Filter
Query re-
Writer
Results

6
SAFE
SPARQL
Query + User
Info
Source
Selection
Access Policy
Filtering
Query Re-writing
Oya
Clinical Researcher
Expertise – Diabetes
SELECT ?diabetes ?bmi ?hypertension ?cases
WHERE {
?observation a qb:Observation .
?observation l2s-dim:Diabetes ?diabetes.
?observation l2s-dim:BMI ?bmi.
?observation l2s-dim:Hypertension ?hypertension.
?observation sdmx-measure:Cases ?cases.
}

S3={ D ia b e t e s , B M I , H y p e r t e n s i o n , H I V , C }ases
8
SAFE – Source Selection
SPARQL
Query + User
Info
Source
Selection
Access Policy
Filtering
Query Re-writing
Triples Patterns
?observation l2s-dim:Diabetes ?diabetes.
?observation l2s-dim:BMI ?bmi.
?observation l2s-dim:Hypertension ?hypertension.
?observation sdmx-measure:Cases ?cases.
Capable Sources
{S1, S2, S3, }
{S1, S2, S3}
{S1, S2, S3}
{S1, S2, S3 , S4}
S1={ }
S4={ Smoking, Gender, Cases }
INDEX
Diabetes, BMI, Hypertension, Cases
S2={ } Diabetes, BMI, Hypertension, Cases
Diabetes,
S4
Join Awareness

9
SAFE – Access Policy
Access Policy Framework
SPARQL
Query + User
Info
Triple Pattern-based
Source
Selection
Access Policy
Filtering
Query Re-writing
Oya
Clinical Researcher
Expertise – Diabetes
Requested Data
S1 S2 S3
Input Input
Grants Access Denies Access
S1
S2
S3

10
SAFE – Access Policy
• Example Access Policy
AP1 type Access_Policy
AP1 applies_to {S1, S2}
AP1 grants_access Read
AP1 assigned_to Oya
SPARQL
Query + User
Info
Source
Selection
Access Policy
Filtering
Query Re-writing
Oya type User
Oya haslocation Galway
Oya hasPurpose Perform p-value analysis
Oya hasRole Clinical Researcher
Oya hasDomain Diabetes
• SPARQL Query
ASK WHERE {
?accessPolicy a AccessPolicy.
?accessPolicy appliesToNamedGraph S1.
?accessPolicy :grantsAccess
rantsAccess acl:Read_l2s,
?accessPolicy hasUser Oya.
}

11
SAFE – Query Rewriting
SPARQL
Query + User
Info
 Graph Information will be added to the
query triples
 SELECT …. WHERE { GRAPH <S1> { …. } }
 SELECT …. WHERE { GRAPH <S2> { …. } }
 Sub queries sent to relevant sources
 S1
 S2
 Integration of results obtained from each
sources
Source
Selection
Access Policy
Filtering
Query Re-writing
Diabetes BMI Hypertension Cases
0 0 0 40
1 0 1 50
0 1 1 120
1 1 1 90
S1
S2

12
Evaluation - DataSets
Dataset # triples # obs # sub # pred # obj # size # index
size
# index generation
time
Internal Dataset
CHUV 0.8 M 96 K 96 K 36 88 31 MB - -
CING 0.1 M 17 K 17 K 21 51 5 MB - -
ZEINCRO 0.4 M 49 K 49 K 24 59 15 MB - -
Total 1.3 M 162 K 162 K 81 198 51 MB 8 KB 10 sec
External Dataset
World Bank 77 M 10 M 10 M 58 40 K 19 GB - -
IMF 18 M 1.8 M 1.8 M 30 3151 3.51 GB - -
Eurostat 0.3 M 38 K 44 K 31 5717 205 MB - -
Trans. Int. 43 K 3939 4286 64 5290 9.2 MB - -
Total 95 M 12 M 2 M 183 54 K 23 GB 12 KB 571 sec

13
Evaluation - Berlin SPARQL Benchmark
Characteristics Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
# of Triple Patterns 9 7 9 16 7 8 11 10 7 7 3 7
# of Sources 3 4 4 3 4 3 3 4 3 3 3 3
# of Results 41 50 348 41 62 1983 5 10 1701 19656 570 41
Filters 
> 9 Patterns       
Negation 
LIMIT Modifier    
Order By Modifier   
DISTINCT Modifier          
REGEX Operator 
UNION Operator 

• Sum of triple-pattern-wise sources selected for each query
• Number of SPARQL ASK requests used for source selection
14
Evaluation – Source Selection
Systems Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Avg
SAFE 8 10 13 16 15 13 15 16 7 7 9 7 11
FedX 9 13 16 24 20 14 16 19 15 17 9 16 16
Systems Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Avg
SAFE 0 0 0 0 0 0 0 0 0 0 0 0 0
FedX 36 28 40 64 48 40 44 40 21 21 9 21 35

15
Evaluation - Source Selection Time
• Source Selection Time

16
Evaluation - Query Execution Time
• Query Execution Time
Query times-out for FedX

17
SAFE – Highlights
 Source Selection
 SPARQL SERVICE
 Using SPARQL ASK queries
 Using a catalog/index
 SAFE - Hybrid (catalog/index + ASK)
 Lightweight Cache
– RDF Cube Data Structure
– AccessPolicy
 Join Aware
 Excludes ineligible sources before actual query join
 Provenance – via RDF Named Graphs
 Self-contained Data Cubes
 Creator
 Location
 Date
 Access rights

Conclusion & Future Work
18
 Efficient source selection with a lightweight indexing
 Policy aware query execution
 Evaluated against internal and external sets
 Performance is significantly improved compared to FedX
 Cooking at the moment !
 Evaluation extended to federation engines (ANAPSID, HiBISCuS)
 Benchmarking for query federation over statistical data cubes
 SAFE extension for normal RDF data

http://linked2safety.hcls.deri.org:8080/SAFE-Demo/
19
SAFE - Team
• Yasar Khan
• Muhammad Saleem
• Aftab Iqbal
• Muntazir Mehdi
• Aidan Hogan
• Panagiotis Hasapis
• Axel-Cyrille Ngonga Ngomo
• Stefan Decker
• Ratnesh Sahay
Thank You

SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes

More Related Content

What's hot (20)

Similar to SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes (20)

Recently uploaded (20)

SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes