SlideShare a Scribd company logo
Brief Introduction to Provenance
"As data becomes plentiful, verifiable truth becomes scarce”
http://go-to-hellman.blogspot.com/2010/02/named-graphs-argleton-and-
truth-economy.html
For JISC KeepItcourse on Digital Preservation Tools for Repository Managers
Module 3, Primer on preservation workflow, formats and characterisation
Westminster-Kingsway College, London, 2 March 2010
Provenance: example
The following excerpt and slides are taken with permission from Moreau, L.
The Open Provenance Model:Towards inter-operability of Provenance
Systems http://users.ecs.soton.ac.uk/lavm/talks/iam09.pdf
Example The provenance of a bottle of wine includes:
• Grapes from which it is made
• Where those grapes grew
• Process in the wine’s preparation
• How the wine was stored
• Between which parties the wine was transported,
e.g. producer to distributer to retailer
• Where it was auctioned
Provenance Definition
• Oxford English Dictionary:
– the fact of coming from some particular source or quarter;
origin, derivation
– the historyor pedigree of a work of art, manuscript, rare
book, etc.;
– concretely, a record of the passage
of an item through its various
owners.
• The provenance of a piece of data is the
process that led to that piece of data
The Science Lifecycle
scientists
Local
Web
Repositories
Graduate
Students
Undergraduate
Students
Virtual Learning
Environment
Technical
Reports
Reprints
Peer-
Reviewed
Journal &
Conference
Papers
Preprints
&
Metadata
Certified
Experimental Results
& Analyses
experimentation
Data, Metadata,
Provenance, Scripts,
Workflows, Services,
Ontologies, Blogs, ...
Digital
Libraries
Next Generation
Researchers
Adapted from David De Roure’s slides
scientists
Local
Web
Repositories
Graduate
Students
Undergraduate
Students
Virtual Learning
Environment
Technical
Reports
Reprints
Peer-
Reviewed
Journal &
Conference
Papers
Preprints
&
Metadata
Certified
Experimental Results
& Analyses
experimentation
Data, Metadata,
Provenance, Scripts,
Workflows, Services,
Ontologies, Blogs, ...
Digital
Libraries
Next Generation
Researchers
Finding the Provenance
of research outputs
across all the systems
data transited through
Open Provenance Model (OPM)
• Allows us to express all the causes of an item
• Allow for process-oriented and dataflow
oriented views
• Based on a notion of annotated causality
graph
Moreau, L., et al. v1.00 (Dec 2007), OPM v1.01
(Jul 2008), OPM v1.1 (Dec 2009)
OPM Requirements
• To allow provenance information to be
exchanged between systems, by means of a
compatibility layer based on a shared provenance
model.
• To allow developers to build and share tools that
operate on such provenance model.
• To define the model in a precise, technology-
agnostic manner.
• To define bindings to XML/RDF separately
• To support a digital representation of provenance
for any “thing”, whether produced by computer
systems or not
OPM Serialisation
• OPM is an abstract data model to represent past
execution and what causes data and processes to occur
• OPM can be serialised in different formats, referred to
as “technology bindings” or serializations
• OPM XML schema
(http://openprovenance.org/model/v1.01.a)
• OPM RDF schema
• OPM OWL ontology
• Effort underway to ensure full equivalence of
representations
Nodes
• Artifact: Immutable piece of state, which
may have a physical embodiment in a
physical object, or a digital
representation in a computer system.
• Process: Action or series of actions
performed on or caused by artifacts, and
resulting in new artifacts.
• Agent: Contextual entity acting as a
catalyst of a process, enabling,
facilitating, controlling, affecting its
execution.
A
P
Ag
Edges
A1 A2
P1 P2
wasTriggeredBy
wasDerivedFrom
A Pused(R)
AP
wasGeneratedBy(R)
Ag P
wasControlledBy(R)
Edge labels are in the past to express that these are used to describe past executions
Illustration
• Process “used” artifacts and
“generated” artifact
• Edge “roles” indicate the
function of the artifact with
respect to the process (akin
to function parameters)
• Edges and nodes can be
typed
Causation chain:
• P was caused by A1 and A2
• A3 and A4 were caused by P
• Does it mean that A3 and A4
were caused by A1 and A2?
P
A1 A2
A3 A4
used(divisor)used(dividend)
wasGeneratedBy(rest)wasGeneratedBy(quotient)
type=division
Time Constraints
A Pused(R)
A
wasGeneratedBy(R)
Ag
wasControlledBy(R)
start: T2
end: T5
T4T3
T1<T3 (artifact must exist before being used)
T2<T3 (process must have started before using artifacts)
T3<T5 (process uses artifacts before it ends)
T2<T4 (process must have started before generating artifacts)
T4<T5 (process generates artifacts before it ends)
T4<T6 (artifact must exist before being used)
T2<T5 (process must have started before ending)
no constraint between t3 and t4
wasGeneratedBy(R)
T1
used(R)
T6
Dublin Core Profile (draft)
• To many people, provenance is primarily
about attribution, citation, bibliographic
information
• DC provides terms to relate resources to such
information
• DC profile aims to use of Dublin Core terms to
OPM concepts and graph patterns
with Simon Miles and Joe Futrelle
DC to OPM example: dc:publisher
A2
A1
P
publish
wasSameResourceAs
state=published
Ag
wasActionOf
state=unpublished
person
name=Luc wasGeneratedBy
What have we learned about
provenance?
• Provenance: describes and records the results of
processes on objects over time
• OPM represents provenance as XML
• OPM can be serialised in different formats
• RDF, Semantic Web
• OPM is a work in progress
By working with an open standard model, that can
pass information as XML and in standard serialisation
formats (e.g. RDF), it should be possible to build
provenance services into repository environments

More Related Content

PPTX
Object Detection with Tensorflow
PPTX
Building enterprise records management solutions for share point 2010
PPTX
Digital Media Episodic Downoadable (Podcasts) - Downham
PPTX
Heritage Management Learning Module
PPTX
Addressing Diversity in Archival Collections with Outreach
PPT
Ch05 records management
PPTX
Keeping a record for your appraisal - Mathieu
PPT
Ch04 records management
Object Detection with Tensorflow
Building enterprise records management solutions for share point 2010
Digital Media Episodic Downoadable (Podcasts) - Downham
Heritage Management Learning Module
Addressing Diversity in Archival Collections with Outreach
Ch05 records management
Keeping a record for your appraisal - Mathieu
Ch04 records management

Viewers also liked (15)

PPSX
Records inventory final
PPT
Ch03 records management
PPS
Records Inventory And Appraisal
PPTX
Ch06 records management slide show part 2 with notes
PPTX
Introduction to archival research 2015
PPTX
Principles of records management Mushi
PPTX
Records inventory and appraisal
PPTX
PPT
Ch07 records management
PDF
Prov-O-Viz: Interactive Provenance Visualization
PPT
Appraisal
PDF
Data Governance: Keystone of Information Management Initiatives
PPT
Behind the Gate: challenges facing archivists in academic research libraries
PPTX
Inventory management
PPTX
How to conduct a records and information inventory
Records inventory final
Ch03 records management
Records Inventory And Appraisal
Ch06 records management slide show part 2 with notes
Introduction to archival research 2015
Principles of records management Mushi
Records inventory and appraisal
Ch07 records management
Prov-O-Viz: Interactive Provenance Visualization
Appraisal
Data Governance: Keystone of Information Management Initiatives
Behind the Gate: challenges facing archivists in academic research libraries
Inventory management
How to conduct a records and information inventory
Ad

Similar to Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau (20)

PDF
OPM Overview
PDF
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
PPTX
PROV Tutorials (Data Provenance Standard)
PPT
Recording and Reasoning Over Data Provenance in Web and Grid Services
PDF
Works 2015-provenance-mileage
PDF
Extending DCAM for Metadata Provenance
PPTX
Provenance as a building block for an open science infrastructure
PDF
Provenance and DataONE: Facilitating Reproducible Science
PDF
Provenance and Trust
PDF
2010 06 rdf_next
PDF
SemTech West 2011 - Digital Provenance
PDF
A Brief Provenance Tour … via DataONE
PPT
Reflections on Provenance Ontology Encodings
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
PDF
A document-inspired way for tracking changes of RDF data - The case of the Op...
PPT
Provinance in scientific workflows in e science
PDF
Camp 4-data workshop presentation
PDF
The Symbiotic Nature of Provenance and Workflow
PPTX
Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealog...
OPM Overview
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
PROV Tutorials (Data Provenance Standard)
Recording and Reasoning Over Data Provenance in Web and Grid Services
Works 2015-provenance-mileage
Extending DCAM for Metadata Provenance
Provenance as a building block for an open science infrastructure
Provenance and DataONE: Facilitating Reproducible Science
Provenance and Trust
2010 06 rdf_next
SemTech West 2011 - Digital Provenance
A Brief Provenance Tour … via DataONE
Reflections on Provenance Ontology Encodings
"Data Provenance: Principles and Why it matters for BioMedical Applications"
A document-inspired way for tracking changes of RDF data - The case of the Op...
Provinance in scientific workflows in e science
Camp 4-data workshop presentation
The Symbiotic Nature of Provenance and Workflow
Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealog...
Ad

More from JISC KeepIt project (20)

PPTX
EPrints Preservation: Why we need Preservation Planning
PPTX
Preserving repository content: practical steps for repository managers by Mig...
PPT
Update on the JISC KeepIt Repository Preservation Exemplars Project, June 2010
PPT
Transforming repositories: from repository managers to institutional data man...
PPT
Keepit Course 5: Concluding the course
PPT
Keepit Course 5: Revision
PPT
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
PPT
Keepit Course 5: Tools for Assessing Trustworthy Repositories
PPT
Keepit Course 5: Trust
PPT
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
PPT
Physical preservation with EPrints: 1 Storage, by Adam Field, David Tarrant, ...
PPT
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
PPT
KeepIt Course 4: Putting storage, format management and preservation planning...
PPT
KeepIt Course 3: Applying Preservation Metadata to Repositories
PPT
Significant Properties - Where Next? (SPs part 6), by Stephen Grace and Garet...
PPT
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
PPT
Significant Properties, Practical 2: Stakeholder Analysis (SPs part 4), by St...
PPT
Significant Properties, Practical 1: Object Analysis (SPs part 3), by Stephen...
PPT
InSPECT Significant Properties Framework (SPs part 2), by Stephen Grace and G...
PPT
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
EPrints Preservation: Why we need Preservation Planning
Preserving repository content: practical steps for repository managers by Mig...
Update on the JISC KeepIt Repository Preservation Exemplars Project, June 2010
Transforming repositories: from repository managers to institutional data man...
Keepit Course 5: Concluding the course
Keepit Course 5: Revision
KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management, by Martin Donn...
Keepit Course 5: Tools for Assessing Trustworthy Repositories
Keepit Course 5: Trust
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Physical preservation with EPrints: 1 Storage, by Adam Field, David Tarrant, ...
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 3: Applying Preservation Metadata to Repositories
Significant Properties - Where Next? (SPs part 6), by Stephen Grace and Garet...
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
Significant Properties, Practical 2: Stakeholder Analysis (SPs part 4), by St...
Significant Properties, Practical 1: Object Analysis (SPs part 3), by Stephen...
InSPECT Significant Properties Framework (SPs part 2), by Stephen Grace and G...
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx

Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau

  • 1. Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce” http://go-to-hellman.blogspot.com/2010/02/named-graphs-argleton-and- truth-economy.html For JISC KeepItcourse on Digital Preservation Tools for Repository Managers Module 3, Primer on preservation workflow, formats and characterisation Westminster-Kingsway College, London, 2 March 2010
  • 2. Provenance: example The following excerpt and slides are taken with permission from Moreau, L. The Open Provenance Model:Towards inter-operability of Provenance Systems http://users.ecs.soton.ac.uk/lavm/talks/iam09.pdf Example The provenance of a bottle of wine includes: • Grapes from which it is made • Where those grapes grew • Process in the wine’s preparation • How the wine was stored • Between which parties the wine was transported, e.g. producer to distributer to retailer • Where it was auctioned
  • 3. Provenance Definition • Oxford English Dictionary: – the fact of coming from some particular source or quarter; origin, derivation – the historyor pedigree of a work of art, manuscript, rare book, etc.; – concretely, a record of the passage of an item through its various owners. • The provenance of a piece of data is the process that led to that piece of data
  • 4. The Science Lifecycle scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers Adapted from David De Roure’s slides
  • 5. scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs, ... Digital Libraries Next Generation Researchers Finding the Provenance of research outputs across all the systems data transited through
  • 6. Open Provenance Model (OPM) • Allows us to express all the causes of an item • Allow for process-oriented and dataflow oriented views • Based on a notion of annotated causality graph Moreau, L., et al. v1.00 (Dec 2007), OPM v1.01 (Jul 2008), OPM v1.1 (Dec 2009)
  • 7. OPM Requirements • To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. • To allow developers to build and share tools that operate on such provenance model. • To define the model in a precise, technology- agnostic manner. • To define bindings to XML/RDF separately • To support a digital representation of provenance for any “thing”, whether produced by computer systems or not
  • 8. OPM Serialisation • OPM is an abstract data model to represent past execution and what causes data and processes to occur • OPM can be serialised in different formats, referred to as “technology bindings” or serializations • OPM XML schema (http://openprovenance.org/model/v1.01.a) • OPM RDF schema • OPM OWL ontology • Effort underway to ensure full equivalence of representations
  • 9. Nodes • Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. • Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. • Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. A P Ag
  • 10. Edges A1 A2 P1 P2 wasTriggeredBy wasDerivedFrom A Pused(R) AP wasGeneratedBy(R) Ag P wasControlledBy(R) Edge labels are in the past to express that these are used to describe past executions
  • 11. Illustration • Process “used” artifacts and “generated” artifact • Edge “roles” indicate the function of the artifact with respect to the process (akin to function parameters) • Edges and nodes can be typed Causation chain: • P was caused by A1 and A2 • A3 and A4 were caused by P • Does it mean that A3 and A4 were caused by A1 and A2? P A1 A2 A3 A4 used(divisor)used(dividend) wasGeneratedBy(rest)wasGeneratedBy(quotient) type=division
  • 12. Time Constraints A Pused(R) A wasGeneratedBy(R) Ag wasControlledBy(R) start: T2 end: T5 T4T3 T1<T3 (artifact must exist before being used) T2<T3 (process must have started before using artifacts) T3<T5 (process uses artifacts before it ends) T2<T4 (process must have started before generating artifacts) T4<T5 (process generates artifacts before it ends) T4<T6 (artifact must exist before being used) T2<T5 (process must have started before ending) no constraint between t3 and t4 wasGeneratedBy(R) T1 used(R) T6
  • 13. Dublin Core Profile (draft) • To many people, provenance is primarily about attribution, citation, bibliographic information • DC provides terms to relate resources to such information • DC profile aims to use of Dublin Core terms to OPM concepts and graph patterns with Simon Miles and Joe Futrelle
  • 14. DC to OPM example: dc:publisher A2 A1 P publish wasSameResourceAs state=published Ag wasActionOf state=unpublished person name=Luc wasGeneratedBy
  • 15. What have we learned about provenance? • Provenance: describes and records the results of processes on objects over time • OPM represents provenance as XML • OPM can be serialised in different formats • RDF, Semantic Web • OPM is a work in progress By working with an open standard model, that can pass information as XML and in standard serialisation formats (e.g. RDF), it should be possible to build provenance services into repository environments