SlideShare a Scribd company logo
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Modeling Data Life Cycles with
PROV
Yann Le Franc, PhD
e-Science Data Factory, France
EUDAT Conference
Semantic Services in EOSC
Porto, January 22-25 2018
What is a Data Life Cycle?
What is a Data Life Cycle?
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
About Data Life Cycles
A lifecycle approach ensures to identify and plan the
necessary data management stages (Higgins, 2008)
Provide a structure for considering the many operations that
will need to be performed on a data record throughout its life
(Ball, 2012)
A large diversity of DLC
Review Committee on Earth Observation Satellite (2012) –
51different DLCs
Review Ball (2012): 7 DLCs
Pennock M. 2007 Digital curation: a life cycle approach to managing and preserving usable digital information.Library and Archives Journal, Issue 1
Higgins S. 2008 The DCC Curation Lifecycle Model, the International Journal of Digital Curation, Issue 1, Volume 3
Ball A. 2012 Review of Data Management Lifecycle Models. University of Bath (unpublished)
A proposed definition
Data One definition
“The data life cycle provides a high level overview of the
stages involved in successful management and
preservation of data for use and reuse. Multiple
versions of a data life cycle exist with differences
attributable to variation in practices across domains or
communities.”
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
UK Data Archive DLC
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
UK Data Archive DLC
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
The Data One DLC
https://www.dataone.org/data-life-cycle
Digital Curation Center DLC
http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
Data Documentation Initiative DLC
http://www.ddialliance.org/training/why-use-ddi
University of Virginia Library DLC
http://data.library.virginia.edu/data-management/lifecycle/
U.S. Geological Survey DLC
https://my.usgs.gov/confluence/download/attachments/82935852/SDLC%20Level%20Two%20Roles%20-
%20FINAL.jpg?version=1&modificationDate=1347556961533&api=v2
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
PIDs  Referencing
data:
Finding data and
making data findable
Data Transfer from
public data servers
Store mutable data
Accessing services
Move data to HPC
Linking EUDAT services to DLC
Going beyond the classical DLC view
Aim: Modeling DLCs and relations with EUDAT services
Rethinking DLC’s definition: a more operational definition
« Data Life Cycle can be considered as the ensemble of all activities,
actions, and steps that describe the stages through which data
passes, from the time it has been created until its obsolescence. »
DLC can be considered as Data Management
Workflows
How to describe workflows?
Declarative langages (before execution)
Workflow Description Language (WDL)
SCULF2 – Taverna Apache
Wf4ever models
Workflow engine specific
Provenance trail (after execution)
W3C PROV: tracking the past
From L. Moreau and P. Groth, Provenance, vol. 3, no. 4. Morgan & Claypool Publishers, 2013, pp. 129–129.
Our model
Modeling activities and agents: Data One Use
case
Provenance trailDLC Plan
Modeling activities and agents: Data One Use
case
Data Life Cycle are
constrained by service
implementation
Activities can be
recurrent through the
DLC
High level (data
publication, data
sharing,… ) vs. low
level activities (data
curation, data
documentation,…)
Integrating data entities: the EPOS use-case
How to deal with
entities as they can
be transformed,
created or obsoleted?
Should we consider
that a DLC is
associated with each
data entity?
Building a proof-of-concept service
User Interface to create graphical representation of the
DLCs
Extended library to create DLC plan and Provenance
template.
Store plans and templates
API to access plan and template
API to fill in provenance template during execution
Conclusion
We can create a declarative description of DLCs using PROV
This description does not support directly logical transition between
the DLC steps
Logic can be added to the PROV graph using graph-based rule
langage such as SWRL (Semantic Web Rule Langage). This
approach is currently tested
Descriptions could be used to orchestrate the various EUDAT
services into a user-defined workflow
We can derive directly provenance templates from the declarative
description
Acknowledgements
Johann Ezelin, e-Science Data Factory
WP8 collaborators: Emanuel Dima, Asela Rajapakse,
Toni Cortes, Christian Pagé, Anna Queralt, Xavier Pivan

More Related Content

PPTX
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
PPTX
Geospatial metadata and spatial data workshop: 19 June 2014
PPT
Cambridge University Geospatial Metadata Workshop 20110524
PPTX
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
PDF
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
PDF
Geospatial Metadata and Spatial Data: It's all Greek to me!
PPT
Discover edina programmefinalmeeting-28-sep-2012
PPT
Licence to Share: Research and Collaboration through Go-Geo! and ShareGeo
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
Geospatial metadata and spatial data workshop: 19 June 2014
Cambridge University Geospatial Metadata Workshop 20110524
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Geospatial Metadata and Spatial Data: It's all Greek to me!
Discover edina programmefinalmeeting-28-sep-2012
Licence to Share: Research and Collaboration through Go-Geo! and ShareGeo

What's hot (20)

PPT
Geoservices Activities at EDINA
PPT
Leeds University Geospatial Metadata Workshop 20110617
PPT
DataCite and its DOI infrastructure - IASSIST 2013
PPTX
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
PPT
Jan Brase: Data and Libraries - the DataCite consortium
PPTX
鏈結資料在圖書館的應用20131107
PPT
Oxford University Geospatial Metadata Workshop 20110415
PPTX
Search Joins with the Web - ICDT2014 Invited Lecture
PPTX
Linked data life cycles
PPTX
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
PPT
Implementation of semantic network dictionary system
PDF
Going for GOLD - Adventures in Open Linked Metadata
PPT
Open Spatial Data: Sources and Tools
PPT
PEER End of Project Report
PPT
Descriptive Standards and Applications in Memory Institutions
PPT
OKFN, CKAN & OpenData at #OpenRoma
PDF
TIB's action for research data managament as a national library's strategy in...
PDF
Demo: Profiling & Exploration of Linked Open Data
PDF
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Geoservices Activities at EDINA
Leeds University Geospatial Metadata Workshop 20110617
DataCite and its DOI infrastructure - IASSIST 2013
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
Jan Brase: Data and Libraries - the DataCite consortium
鏈結資料在圖書館的應用20131107
Oxford University Geospatial Metadata Workshop 20110415
Search Joins with the Web - ICDT2014 Invited Lecture
Linked data life cycles
The Importance of Metadata - EUDAT Summer School (Shaun de Witt, CCFE)
Implementation of semantic network dictionary system
Going for GOLD - Adventures in Open Linked Metadata
Open Spatial Data: Sources and Tools
PEER End of Project Report
Descriptive Standards and Applications in Memory Institutions
OKFN, CKAN & OpenData at #OpenRoma
TIB's action for research data managament as a national library's strategy in...
Demo: Profiling & Exploration of Linked Open Data
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Ad

Similar to Modeling Data Life Cycles with PROV (20)

PPTX
DMP exercise: linking data management activities to services - EUDAT Summer ...
PDF
Camp 4-data workshop presentation
PDF
GBIF and reuse of research data, Bergen (2016-12-14)
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
PPT
Supporting Libraries in Leading the Way in Research Data Management
PPTX
Ariadne: Lifecycles
PPTX
Research Data Management in GLAM: Managing Data for Cultural Heritage
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
PPT
Managing data throughout the research lifecycle
PPTX
Thoughts on Knowledge Graphs & Deeper Provenance
PPTX
EUDAT Research Data Management | www.eudat.eu |
PDF
Workflow Provenance: From Modelling to Reporting
PPTX
Towards a framework for making applications provenance aware: UML2PROV
PPT
Provinance in scientific workflows in e science
PPT
Introduction to Research Data Management for postgraduate students
PDF
Data Science and What It Means to Library and Information Science
ODP
Into the domain
DMP exercise: linking data management activities to services - EUDAT Summer ...
Camp 4-data workshop presentation
GBIF and reuse of research data, Bergen (2016-12-14)
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Supporting Libraries in Leading the Way in Research Data Management
Ariadne: Lifecycles
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Managing data throughout the research lifecycle
Thoughts on Knowledge Graphs & Deeper Provenance
EUDAT Research Data Management | www.eudat.eu |
Workflow Provenance: From Modelling to Reporting
Towards a framework for making applications provenance aware: UML2PROV
Provinance in scientific workflows in e science
Introduction to Research Data Management for postgraduate students
Data Science and What It Means to Library and Information Science
Into the domain
Ad

More from EUDAT (20)

PDF
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
PDF
EUDAT Booklet Mar22 (2).pdf
PDF
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
PDF
EUDAT Brochure - B2HANDLE.pdf
PDF
EUDAT Brochure - B2DROP.pdf
PDF
EUDAT Brochure - B2SHARE.pdf
PDF
EUDAT Brochure - B2SAFE.pdf
PDF
EUDAT Brochure - B2FIND(1).pdf
PDF
EUDAT Brochure - B2ACCESS.pdf
PDF
Rob Carrillo - Writing effective service documentation for EUDAT services
PDF
Ariyo - EUDAT CDI B2 services documentation
PDF
Introduction to eudat and its services
PPTX
Using B2NOTE: The U.Porto Pilot
PPT
OpenAIRE Advance - Kick off last week
PPT
European Open Science Cloud - Skills workshop
PPT
Linking service capabilities to data stweardship competences for professional...
PPT
FAIRness of training materials
PPT
Training by EOSC-hub - Integrating and Managing services for the European Ope...
PDF
Draft Governance Framework for the EOSC
PDF
Building Interoperable AAI for Researchers
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT Booklet Mar22 (2).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2ACCESS.pdf
Rob Carrillo - Writing effective service documentation for EUDAT services
Ariyo - EUDAT CDI B2 services documentation
Introduction to eudat and its services
Using B2NOTE: The U.Porto Pilot
OpenAIRE Advance - Kick off last week
European Open Science Cloud - Skills workshop
Linking service capabilities to data stweardship competences for professional...
FAIRness of training materials
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Draft Governance Framework for the EOSC
Building Interoperable AAI for Researchers

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf

Modeling Data Life Cycles with PROV

  • 1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Modeling Data Life Cycles with PROV Yann Le Franc, PhD e-Science Data Factory, France EUDAT Conference Semantic Services in EOSC Porto, January 22-25 2018
  • 2. What is a Data Life Cycle?
  • 3. What is a Data Life Cycle? CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
  • 4. About Data Life Cycles A lifecycle approach ensures to identify and plan the necessary data management stages (Higgins, 2008) Provide a structure for considering the many operations that will need to be performed on a data record throughout its life (Ball, 2012) A large diversity of DLC Review Committee on Earth Observation Satellite (2012) – 51different DLCs Review Ball (2012): 7 DLCs Pennock M. 2007 Digital curation: a life cycle approach to managing and preserving usable digital information.Library and Archives Journal, Issue 1 Higgins S. 2008 The DCC Curation Lifecycle Model, the International Journal of Digital Curation, Issue 1, Volume 3 Ball A. 2012 Review of Data Management Lifecycle Models. University of Bath (unpublished)
  • 5. A proposed definition Data One definition “The data life cycle provides a high level overview of the stages involved in successful management and preservation of data for use and reuse. Multiple versions of a data life cycle exist with differences attributable to variation in practices across domains or communities.”
  • 6. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA UK Data Archive DLC Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
  • 7. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA UK Data Archive DLC Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle CREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata RE-USING DATA: follow- up research, new research, undertake research reviews, scrutinising findings, teaching & learning ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back- up & archiving, migrating to best format & medium, creating metadata and documentation ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data
  • 8. The Data One DLC https://www.dataone.org/data-life-cycle
  • 9. Digital Curation Center DLC http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
  • 10. Data Documentation Initiative DLC http://www.ddialliance.org/training/why-use-ddi
  • 11. University of Virginia Library DLC http://data.library.virginia.edu/data-management/lifecycle/
  • 12. U.S. Geological Survey DLC https://my.usgs.gov/confluence/download/attachments/82935852/SDLC%20Level%20Two%20Roles%20- %20FINAL.jpg?version=1&modificationDate=1347556961533&api=v2
  • 13. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA PIDs  Referencing data: Finding data and making data findable Data Transfer from public data servers Store mutable data Accessing services Move data to HPC Linking EUDAT services to DLC
  • 14. Going beyond the classical DLC view Aim: Modeling DLCs and relations with EUDAT services Rethinking DLC’s definition: a more operational definition « Data Life Cycle can be considered as the ensemble of all activities, actions, and steps that describe the stages through which data passes, from the time it has been created until its obsolescence. » DLC can be considered as Data Management Workflows
  • 15. How to describe workflows? Declarative langages (before execution) Workflow Description Language (WDL) SCULF2 – Taverna Apache Wf4ever models Workflow engine specific Provenance trail (after execution)
  • 16. W3C PROV: tracking the past From L. Moreau and P. Groth, Provenance, vol. 3, no. 4. Morgan & Claypool Publishers, 2013, pp. 129–129.
  • 18. Modeling activities and agents: Data One Use case Provenance trailDLC Plan
  • 19. Modeling activities and agents: Data One Use case Data Life Cycle are constrained by service implementation Activities can be recurrent through the DLC High level (data publication, data sharing,… ) vs. low level activities (data curation, data documentation,…)
  • 20. Integrating data entities: the EPOS use-case How to deal with entities as they can be transformed, created or obsoleted? Should we consider that a DLC is associated with each data entity?
  • 21. Building a proof-of-concept service User Interface to create graphical representation of the DLCs Extended library to create DLC plan and Provenance template. Store plans and templates API to access plan and template API to fill in provenance template during execution
  • 22. Conclusion We can create a declarative description of DLCs using PROV This description does not support directly logical transition between the DLC steps Logic can be added to the PROV graph using graph-based rule langage such as SWRL (Semantic Web Rule Langage). This approach is currently tested Descriptions could be used to orchestrate the various EUDAT services into a user-defined workflow We can derive directly provenance templates from the declarative description
  • 23. Acknowledgements Johann Ezelin, e-Science Data Factory WP8 collaborators: Emanuel Dima, Asela Rajapakse, Toni Cortes, Christian Pagé, Anna Queralt, Xavier Pivan