SlideShare a Scribd company logo
Towards Query Generation for
PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University
@junszhao | j.zhao5 at lancaster.ac.uk
2University of Aberdeen
honghan.wu | jeff.z.pan at abdn.ac.uk
Outline
• Motivation
• Profile-driven query generation
– K-Drive
– ProvQ
• Result discussion
• Future work
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
The Big Picture of PROV: A Motivation Scenario
Adapted from:
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
Provenance information
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
Provenance in the Wild v.s. ProvBench
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Workflow
/ scientific
domain
• 11 repositories so far
• Various representations
• Cross different domains
• Openly accessible under
different open licenses
Web
resources
Social
domain
https://github.com/provbench
https://sites.google.com/site/provbench/home
Next Step: Access PROV Datasets
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Can we query
across them?
Can we learn
something by
querying
across them?
What can we
do with them?
……
Query Generation: A Bottom-up Approach
Taverna-
PROV
Wings
PROV
Wikipedia
-PROV
OBIAMA
(social
simulation)
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for PROV-O
datasets
Example profiles:
• Class associations
• Property associations
Query Generation: A First Step
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Example profiles:
• Class associations
• Property associations
Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014
http://www.kdrive-project.eu EU FP7 Marie-Curie 286348
Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116
• University of Aberdeen
• A generic query generation
tool for semantic web data
• Find key sub-graphs in the
RDF data
– Big City: The most
instantialised concepts in the
data
– Big Road: The most frequent
relations connecting those
big cities
K-Drive Query Generation
K-Drive Generator
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
SELECT ?Generation ?x4_1 ?x3_1 ?x0_1
WHERE {
?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.
?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .
?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .
?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .
}
K-Drive Generator
ProvQ: Property Association Mining
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Discover properties that are
used together with each
PROV-O properties
Expand a set of “seed” PROV-O
queries using the discovered
associating properties
https://github.com/junszhao/ProvQ
ProvQ: Property Association Mining
• Advantages
– Reduce the performance challenge usually faced
in association rule mining
– Produce provenance-centric queries
• Disadvantages
– Could miss queries that are not related to PROV-
O terms at all
Expanding Starting Queries
Approach Walk-Through
• Given a seed atomic query,
we have seed property:
• We find all properties used together with
– http://purl.org/wf4ever/wfprov#describedByParameter
– http://purl.org/wf4ever/wfprov#wasOutputFrom
– http://www.w3.org/ns/prov#qualifiedGeneration
• Return resulting conjunctive SPARQL query
Results Comparison
• K-Drive Generator
– 7 Queries
– 3 of them are not
exactly provenance
queries
– Probably easier to
understand because
classes are included in
the queries
– But queries can be
complex
• ProvQ
– 7 Queries
– 1 not returned by K-Drive
(prov:wasDerivedFrom)
– Only provenance queries
are returned
– Queries are simple,
based on properties
associations starting from
“seed” PROV-O
properties
https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
Future Work
• Define and evaluate usefulness
• Test against more datasets
• Experiment with reasoning
• Query generation across multiple datasets
Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0
Unported
http://creativecommons.org/licenses/by-nc-sa/3.0/

More Related Content

PDF
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
PPT
PDF
2010 10 provxg_datagovuk
PPT
2008 11 13 Hcls Call
PPT
Workshopweb20
PPT
Emotion Labor Colloquim Presentation Fall 2006
PDF
Www sociam-2016-policy-reviews
PPTX
Socialenetwerken&web2.0
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
2010 10 provxg_datagovuk
2008 11 13 Hcls Call
Workshopweb20
Emotion Labor Colloquim Presentation Fall 2006
Www sociam-2016-policy-reviews
Socialenetwerken&web2.0

Similar to Query-generation-for-provo-data-201406 (20)

PPTX
Towards Supporting the Life Cycle of Web Data
PPT
Cool Tools for Library Webmasters - Internet Librarian 2007
PPT
Cool Tools For Library
PPT
Invincea: Reasoning in Incident Response in Tapio
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
PPT
Ccanz webinar-oaw
PDF
ISMB Workshop 2014
PDF
Sharing massive data analysis: from provenance to linked experiment reports
PDF
Building collaborative workflows for scientific data
PPT
Java PathFinder
PDF
Semantic Representation of Provenance in Wikipedia
PDF
ICAR 2015 Workshop - Nick Provart
PDF
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
PPTX
Rise presentation-2012-01
PDF
Building communities around open-source scientific software
PPTX
PhD Projects in Java Research Help
PPTX
Milex 2010 final
PDF
Towards a Machine-Actionable Scholarly Communication System
PDF
Esa 2014 qiime
PPTX
The lifecycle of reproducible science data and what provenance has got to do ...
Towards Supporting the Life Cycle of Web Data
Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools For Library
Invincea: Reasoning in Incident Response in Tapio
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Ccanz webinar-oaw
ISMB Workshop 2014
Sharing massive data analysis: from provenance to linked experiment reports
Building collaborative workflows for scientific data
Java PathFinder
Semantic Representation of Provenance in Wikipedia
ICAR 2015 Workshop - Nick Provart
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
Rise presentation-2012-01
Building communities around open-source scientific software
PhD Projects in Java Research Help
Milex 2010 final
Towards a Machine-Actionable Scholarly Communication System
Esa 2014 qiime
The lifecycle of reproducible science data and what provenance has got to do ...
Ad

More from Jun Zhao (17)

PDF
2012 05-swpm-provo
PDF
2012 04-ldow-prov
PDF
2011 03-provenance-workshop-edingurgh
ODP
2011 03-provenance-workshop-edingurgh
PDF
2010 09 opm_tutorial_02-jun-opmv
PPT
2010 09 opm_tutorial_01-jun-usecase-datagovuk
PDF
2010 06 rdf_next
ODP
2010 06 ipaw_prv
PDF
2010 05 edinburgh
PPT
2010 03 Lodoxf Openflydata
PPT
2009 09 Lod London
ODP
2009 0807 Lod Gmod
PPT
2009 Dils Flyweb
PPT
Talk_linked_data_for_hcls_at_iswc2009
PPT
myExperiment and AIDA
PPT
2008 Jun Zhao Eswc
PDF
2008 04 22 Jun Zhao Ldow
2012 05-swpm-provo
2012 04-ldow-prov
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 06 rdf_next
2010 06 ipaw_prv
2010 05 edinburgh
2010 03 Lodoxf Openflydata
2009 09 Lod London
2009 0807 Lod Gmod
2009 Dils Flyweb
Talk_linked_data_for_hcls_at_iswc2009
myExperiment and AIDA
2008 Jun Zhao Eswc
2008 04 22 Jun Zhao Ldow
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Understanding_Digital_Forensics_Presentation.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf

Query-generation-for-provo-data-201406

  • 1. Towards Query Generation for PROV-O Data Jun Zhao1, HongHanWu2 and Jeff Z. Pan2 1Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk
  • 2. Outline • Motivation • Profile-driven query generation – K-Drive – ProvQ • Result discussion • Future work
  • 3. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
  • 4. The Big Picture of PROV: A Motivation Scenario Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png Provenance information
  • 5. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
  • 6. Provenance in the Wild v.s. ProvBench Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Workflow / scientific domain • 11 repositories so far • Various representations • Cross different domains • Openly accessible under different open licenses Web resources Social domain https://github.com/provbench https://sites.google.com/site/provbench/home
  • 7. Next Step: Access PROV Datasets Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Can we query across them? Can we learn something by querying across them? What can we do with them? ……
  • 8. Query Generation: A Bottom-up Approach Taverna- PROV Wings PROV Wikipedia -PROV OBIAMA (social simulation) Provenance Data Profile Generator Provenance Query Builder SPARQL queries for PROV-O datasets Example profiles: • Class associations • Property associations
  • 9. Query Generation: A First Step A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Example profiles: • Class associations • Property associations
  • 10. Big City: Big Road: Slide credit: Dr Wu at Scottish Linked Data Workshop 2014 http://www.kdrive-project.eu EU FP7 Marie-Curie 286348 Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116 • University of Aberdeen • A generic query generation tool for semantic web data • Find key sub-graphs in the RDF data – Big City: The most instantialised concepts in the data – Big Road: The most frequent relations connecting those big cities K-Drive Query Generation
  • 12. Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html SELECT ?Generation ?x4_1 ?x3_1 ?x0_1 WHERE { ?Generation rdf:type <http://www.w3.org/ns/prov#Generation>. ?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 . ?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 . ?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation . } K-Drive Generator
  • 13. ProvQ: Property Association Mining A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Discover properties that are used together with each PROV-O properties Expand a set of “seed” PROV-O queries using the discovered associating properties https://github.com/junszhao/ProvQ
  • 14. ProvQ: Property Association Mining • Advantages – Reduce the performance challenge usually faced in association rule mining – Produce provenance-centric queries • Disadvantages – Could miss queries that are not related to PROV- O terms at all
  • 16. Approach Walk-Through • Given a seed atomic query, we have seed property: • We find all properties used together with – http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration • Return resulting conjunctive SPARQL query
  • 17. Results Comparison • K-Drive Generator – 7 Queries – 3 of them are not exactly provenance queries – Probably easier to understand because classes are included in the queries – But queries can be complex • ProvQ – 7 Queries – 1 not returned by K-Drive (prov:wasDerivedFrom) – Only provenance queries are returned – Queries are simple, based on properties associations starting from “seed” PROV-O properties https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
  • 18. Future Work • Define and evaluate usefulness • Test against more datasets • Experiment with reasoning • Query generation across multiple datasets
  • 19. Thank you! These slides have been created by Jun Zhao This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/

Editor's Notes

  • #16: wasGeneratedBy, startedAtTime, endedAtTime, wasAssociatedWith, wasAttributedTo, actedOnBehalfOf, wasInformedBy
  • #17: From prov:wasGeneratedBy: Select distinct * where { ?s prov:wasGeneratedBy ?o . optional {?s <http://purl.org/wf4ever/wfprov#describedByParameter> ?o1.} optional {?s <http://purl.org/wf4ever/wfprov#wasOutputFrom> ?o3 .} optional {?s <http://www.w3.org/ns/prov#qualifiedGeneration> ?o4 .} } limit 100 2. From prov:used <http://purl.org/wf4ever/wfprov#usedInput>; 1 rdfs:label; 1 prov:endedAtTime; 1 prov:startedAtTime; 1 prov:qualifiedAssociation; 1 prov:qualifiedUsage; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.98 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.98 Select distinct * where { ?s prov:used ?o . ?s <http://purl.org/wf4ever/wfprov#usedInput> ?o1 . ?s rdfs:label ?o2 . ?s prov:endedAtTime ?o3 . ?s prov:startedAtTime ?o4 . ?s prov:qualifiedAssociation ?o5 . ?s prov:qualifiedUsage ?o6 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o7 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o8 .} } limit 100 3. From prov:wasDerivedFrom <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage>; 1 <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace>; 1 Select distinct * where { ?s prov:wasDerivedFrom ?o . ?s <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage> ?o1. ?s <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace> ?o2 . } limit 100 4. From prov:startedAtTime and prov:endedAtTime, will produce similar result as query 2 rdfs:label; 1 prov:endedAtTime; 1 prov:qualifiedAssociation; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.97 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.97 prov:qualifiedUsage; 0.90 prov:used; 0.90 <http://purl.org/wf4ever/wfprov#usedInput>; 0.90 Select distinct * where { ?s prov:startedAtTime?o . ?s rdfs:label ?o1 . ?s prov:endedAtTime ?o2 . ?s prov:qualifiedAssociation ?o3 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o4 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o5 .} optional {?s <http://purl.org/wf4ever/wfprov#usedInput> ?o6 .} optional {?s prov:qualifiedUsage ?o7 .} optional {?s prov:used ?o8 .} } limit 100
  • #18: 3 queries were largely the same, 3 queries were only returned by K-Drive, and the rest had different degrees of overlap. 1 query not returned