The document discusses various aspects of ensuring reproducibility in scientific research through provenance. It begins by providing an overview of the data lifecycle and challenges to reproducibility as experiments and components evolve. It then discusses different levels of reproducibility (rerun, repeat, replicate, reproduce) and approaches to analyzing differences in workflow provenance traces to understand how changes impact results. The remainder of the document describes specific systems and tools developed by the author and collaborators that use provenance to improve reproducibility, including data packaging with Research Objects, provenance recording and analysis workflows with YesWorkflow, process virtualization using TOSCA, and provenance differencing with Pdiff.
Related topics: