SlideShare a Scribd company logo
Agile Data Curation:
A Conceptual Framework and
Approach for Practitioner
Data Management
Presenting Author: Josh Young1
Co-Authors: Karl Benedict2 and Christopher Lenhardt3
1.UniversityCorporationforAtmosphericResearch(UCAR)UnidataProgramCenter,Boulder,USA
3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA
2.UniversityofNewMexico, AlbuquerqueUSA
Scope
Imagine a project:
• that includes a well-thought out and documented
data management plan,
• and robust implementation of that plan through out
the project and beyond.
• This talk is not for that project; it is for the rest of
us.
So why do we care about data
management?
• Internal reasons: do good research, write
papers, get tenure, win more grants.
• External reasons: public access &
reproducibility
 Risk of becoming dark data (Heidorn, 2008)
Why care about external access?
• Intangibles for an Investigator
• Maybe someday I’ll benefit from someone else’s data
• Maybe I’ll learn something through informal dialogue
• Most science funding is from public resources and should/could be
considered a public trust resource
• Peer pressure
• Tangibles for an Investigator
• Increased efficiency
• My funders require it.
So why do we care about data
management?
• Internal reasons: do good research, write
papers, get tenure, win more grants.
• External reasons: greater impact
Agile
Curation
WorkflowsInternal
Public-Access Workflows
Agile Curation:
• Means taking implementable steps to
improve data management for external
access.
• Philosophically, it attempts to apply
lessons from agile software development
to data management.
Agile Curation Principles,
2nd Generation
1) Delivery, access, use and citation of research
data are the primary measures of success.
2) Maximize the impact of research data through the
continuous integration of curation activities
3) Support unanticipated needs for and uses of
research data (and documentation) and develop
flexible systems to capture new uses.
Agile Curation Principles,
2nd Generation
4) Make data open and accessible as early in the process as
possible.
5) Encourage crowd-sourced / community feedback to improve
and enhance the data. Provide basic metadata for data
available early in the process even if the data are not
finalized.
6) Identify key individuals in a research project that have the
requisite motivation, knowledge, or ability to learn and get
out of their way.
Agile Curation Principles,
2nd Generation continued
7) Data creators and data curators should work closely
throughout the data life story to ensure the most efficient and
streamlined process.
8) Identify the most effective method(s) for maintaining close
communication between the data creators and curators
involved and use them.
9) Target the steady delivery of incremental improvements to
research data discovery, access and use that is consistent
with a sustainable level of effort and available funding.
Agile Curation Principles,
2nd Generation continued
9) Start with the basics and only make systems more
complex as needed, while maintaining a low bar to
entry.
10) Continuous attention to technical excellence and
good design enhances agility.
11) Continuously develop a community of data providers,
curators and users that participate in the evolution of
the research data systems.
What happens next?
• Case Studies documentation:
 To clarify and/or verify these principles
 To provide workflow examples that can
be adopted or revised for reuse
• Nascent community of interest within the
Research Data Alliance
Scope
Imagine a project:
• that includes a well-thought out data management
plan,
• and robust implementation of that plan through out
the project.
• This talk is not for that project; it is for the rest of
us.
Unidata is one of the University Corporation for
Atmospheric Research (UCAR)'s Community
Programs (UCP), and is funded primarily by
the National Science Foundation
(Grant NSF-1344155).
Questions?
Contact me at: jwyoung@ucar.edu @unidata_josh 303-497-8646
Background
Agile Curation Principles,
1st Generation
1) Access to data is the first goal;
2) Generative value is supported (Zittrain, 2006)
3) Researcher involvement through a participatory framework that
aligns data management with scientific research processes
(Yarmey and Baker, 2013)
4) Projects will utilize free open-source resources to the greatest
extent practical;
5) Community participation increases project capacity;
Agile Curation Principles,
1st Generation part 2
6) Data management requirements and practices evolve as the
research project proceeds;
7) Bright and dedicated individuals can learn appropriate skills and
respond to the demands of their particular project, as they
proceed;
8) Approaches apply across scales
9) Consider technical debt
10) Data evaluation can be conducted through use and feedback;
How we got here
• Idea formulated during discussion of Data
Management Lifecycles at GeoData 2014
• Principles drafted for AGU 2014
• Two Research Data Alliance (RDA) Birds of a
Feather sessions to explore community
experiences

More Related Content

PDF
You down with dmp yeah you know me!
PPTX
Building and providing data management services a framework for everyone!
PPTX
Open Access as a Means to Produce High Quality Data
PDF
Data Management Lab: Data management plan instructions
PPT
The NIH as a Digital Enterprise: Implications for PAG
PDF
Preservation, Publishing, and People: A SEAD View
PDF
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
PPTX
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
You down with dmp yeah you know me!
Building and providing data management services a framework for everyone!
Open Access as a Means to Produce High Quality Data
Data Management Lab: Data management plan instructions
The NIH as a Digital Enterprise: Implications for PAG
Preservation, Publishing, and People: A SEAD View
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...

What's hot (20)

PPTX
Library resources and services for grant development
PDF
John morrissey c3 dis fair working data.pptx
PDF
Data Management Lab: Session 1 Slides
PDF
Natasha intro to rdm c3 dis may 2018.pptx
PDF
Sue cook c3 dis dm-ps 1.pptx
PDF
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
PPTX
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
PDF
20130222 kaptur training_goldsmiths
PDF
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
PPTX
Data Management, Research Integrity and Ethics
PPTX
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
PPTX
Al aposter mhenderson2015
PDF
Valen Metadata and the [Data] Repository
PPTX
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
PDF
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
PPTX
Practical and Conceptual Considerations of Research Object Preservation
PPTX
Presentation to the UM Library Emergent Research Series
PPTX
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
PDF
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Library resources and services for grant development
John morrissey c3 dis fair working data.pptx
Data Management Lab: Session 1 Slides
Natasha intro to rdm c3 dis may 2018.pptx
Sue cook c3 dis dm-ps 1.pptx
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
20130222 kaptur training_goldsmiths
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
Data Management, Research Integrity and Ethics
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Al aposter mhenderson2015
Valen Metadata and the [Data] Repository
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
RDAP14 Poster: openICPSR: a public access repository for storing and sharing ...
Practical and Conceptual Considerations of Research Object Preservation
Presentation to the UM Library Emergent Research Series
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Ad

Viewers also liked (20)

PPTX
Agile Data Governance
PDF
Agile Data
PPTX
Agile Data Management & Integration
PDF
Tdwi agile data warehouse - dv, what is the buzz about
PDF
Agile Data Strategy and Lean Execution
PPTX
Agile Data Governance Tutorial
PPTX
MDM & BI Strategy For Large Enterprises
PDF
Real-World Data Governance Webinar: Agile and Data Governance - Bridging the Gap
PDF
RWDG Webinar: Agile Data Governance - How to Apply Governance to Agile
PPTX
Implementing Agile Data Governance
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
PDF
The Business Value of Metadata for Data Governance
PDF
Data-Ed Webinar: Data Governance Strategies
PDF
The Future of Enterprise IT: DevOps and Data Lifecycle Management
PPT
Real-World Data Governance: Master Data Management & Data Governance
PDF
Agile Data Warehouse Design for Big Data Presentation
PDF
Data Governance - Atlas 7.12.2015
PDF
RWDG Slides: Using Agile to Justify Data Governance
PPTX
Data Governance
PDF
Agile Data Science 2.0
Agile Data Governance
Agile Data
Agile Data Management & Integration
Tdwi agile data warehouse - dv, what is the buzz about
Agile Data Strategy and Lean Execution
Agile Data Governance Tutorial
MDM & BI Strategy For Large Enterprises
Real-World Data Governance Webinar: Agile and Data Governance - Bridging the Gap
RWDG Webinar: Agile Data Governance - How to Apply Governance to Agile
Implementing Agile Data Governance
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
The Business Value of Metadata for Data Governance
Data-Ed Webinar: Data Governance Strategies
The Future of Enterprise IT: DevOps and Data Lifecycle Management
Real-World Data Governance: Master Data Management & Data Governance
Agile Data Warehouse Design for Big Data Presentation
Data Governance - Atlas 7.12.2015
RWDG Slides: Using Agile to Justify Data Governance
Data Governance
Agile Data Science 2.0
Ad

Similar to Agile Curation: 2015 AGU Presentation (20)

PDF
Data Management and Broader Impacts: a holistic approach
PDF
Open science as roadmap to better data science research
PDF
Open Data - strategies for research data management & impact of best practices
PDF
NordForsk Open Access Reykjavik 14-15/8-2014:Finnish data-initiative
PPTX
Why managedata
PPTX
Funder requirements for Data Management Plans
PPTX
DataONE Education Module 02: Data Sharing
PPT
Survey of research data management practices up2010
PPTX
2016 Ocean Sciences Meeting tutorial
PDF
Examining Group Process - Thesis talk
PPTX
Dissemination Information Packages (DIPS) for Information Reuse
PPTX
The Horizon 2020 Open Data Pilot
PPTX
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
PPTX
Practical Research Data Management: tools and approaches, pre- and post-award
PPTX
The Challenges of Making Data Travel, by Sabina Leonelli
PPTX
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
PDF
Ratan "Are we there yet? Keeping the promise of open science"
PPTX
Research-Trends
PDF
Digital Resources for Open Science
PPTX
DMPTool: Integration with other open science software
Data Management and Broader Impacts: a holistic approach
Open science as roadmap to better data science research
Open Data - strategies for research data management & impact of best practices
NordForsk Open Access Reykjavik 14-15/8-2014:Finnish data-initiative
Why managedata
Funder requirements for Data Management Plans
DataONE Education Module 02: Data Sharing
Survey of research data management practices up2010
2016 Ocean Sciences Meeting tutorial
Examining Group Process - Thesis talk
Dissemination Information Packages (DIPS) for Information Reuse
The Horizon 2020 Open Data Pilot
The Horizon2020 Open Data Pilot - OpenAIRE Webinar
Practical Research Data Management: tools and approaches, pre- and post-award
The Challenges of Making Data Travel, by Sabina Leonelli
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
Ratan "Are we there yet? Keeping the promise of open science"
Research-Trends
Digital Resources for Open Science
DMPTool: Integration with other open science software

More from Josh Young (7)

PDF
Sustainability for Digital Research Resources
PPTX
Data Extension for a public-trust resource
PDF
EarthCube Science of Team Science Poster
PDF
Unidata Fostering Community, Science, and Technology, in that order.
PPTX
ESIP presentation on DMRC 7.14.15
PPTX
Unidata Overview 3.6.15
PDF
Agile Curation Poster
Sustainability for Digital Research Resources
Data Extension for a public-trust resource
EarthCube Science of Team Science Poster
Unidata Fostering Community, Science, and Technology, in that order.
ESIP presentation on DMRC 7.14.15
Unidata Overview 3.6.15
Agile Curation Poster

Recently uploaded (20)

PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
famous lake in india and its disturibution and importance
PDF
An interstellar mission to test astrophysical black holes
PDF
The scientific heritage No 166 (166) (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
INTRODUCTION TO EVS | Concept of sustainability
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
famous lake in india and its disturibution and importance
An interstellar mission to test astrophysical black holes
The scientific heritage No 166 (166) (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Comparative Structure of Integument in Vertebrates.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
7. General Toxicologyfor clinical phrmacy.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
Phytochemical Investigation of Miliusa longipes.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life

Agile Curation: 2015 AGU Presentation

  • 1. Agile Data Curation: A Conceptual Framework and Approach for Practitioner Data Management Presenting Author: Josh Young1 Co-Authors: Karl Benedict2 and Christopher Lenhardt3 1.UniversityCorporationforAtmosphericResearch(UCAR)UnidataProgramCenter,Boulder,USA 3. Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, USA 2.UniversityofNewMexico, AlbuquerqueUSA
  • 2. Scope Imagine a project: • that includes a well-thought out and documented data management plan, • and robust implementation of that plan through out the project and beyond. • This talk is not for that project; it is for the rest of us.
  • 3. So why do we care about data management? • Internal reasons: do good research, write papers, get tenure, win more grants. • External reasons: public access & reproducibility  Risk of becoming dark data (Heidorn, 2008)
  • 4. Why care about external access? • Intangibles for an Investigator • Maybe someday I’ll benefit from someone else’s data • Maybe I’ll learn something through informal dialogue • Most science funding is from public resources and should/could be considered a public trust resource • Peer pressure • Tangibles for an Investigator • Increased efficiency • My funders require it.
  • 5. So why do we care about data management? • Internal reasons: do good research, write papers, get tenure, win more grants. • External reasons: greater impact Agile Curation
  • 8. Agile Curation: • Means taking implementable steps to improve data management for external access. • Philosophically, it attempts to apply lessons from agile software development to data management.
  • 9. Agile Curation Principles, 2nd Generation 1) Delivery, access, use and citation of research data are the primary measures of success. 2) Maximize the impact of research data through the continuous integration of curation activities 3) Support unanticipated needs for and uses of research data (and documentation) and develop flexible systems to capture new uses.
  • 10. Agile Curation Principles, 2nd Generation 4) Make data open and accessible as early in the process as possible. 5) Encourage crowd-sourced / community feedback to improve and enhance the data. Provide basic metadata for data available early in the process even if the data are not finalized. 6) Identify key individuals in a research project that have the requisite motivation, knowledge, or ability to learn and get out of their way.
  • 11. Agile Curation Principles, 2nd Generation continued 7) Data creators and data curators should work closely throughout the data life story to ensure the most efficient and streamlined process. 8) Identify the most effective method(s) for maintaining close communication between the data creators and curators involved and use them. 9) Target the steady delivery of incremental improvements to research data discovery, access and use that is consistent with a sustainable level of effort and available funding.
  • 12. Agile Curation Principles, 2nd Generation continued 9) Start with the basics and only make systems more complex as needed, while maintaining a low bar to entry. 10) Continuous attention to technical excellence and good design enhances agility. 11) Continuously develop a community of data providers, curators and users that participate in the evolution of the research data systems.
  • 13. What happens next? • Case Studies documentation:  To clarify and/or verify these principles  To provide workflow examples that can be adopted or revised for reuse • Nascent community of interest within the Research Data Alliance
  • 14. Scope Imagine a project: • that includes a well-thought out data management plan, • and robust implementation of that plan through out the project. • This talk is not for that project; it is for the rest of us.
  • 15. Unidata is one of the University Corporation for Atmospheric Research (UCAR)'s Community Programs (UCP), and is funded primarily by the National Science Foundation (Grant NSF-1344155).
  • 16. Questions? Contact me at: jwyoung@ucar.edu @unidata_josh 303-497-8646
  • 18. Agile Curation Principles, 1st Generation 1) Access to data is the first goal; 2) Generative value is supported (Zittrain, 2006) 3) Researcher involvement through a participatory framework that aligns data management with scientific research processes (Yarmey and Baker, 2013) 4) Projects will utilize free open-source resources to the greatest extent practical; 5) Community participation increases project capacity;
  • 19. Agile Curation Principles, 1st Generation part 2 6) Data management requirements and practices evolve as the research project proceeds; 7) Bright and dedicated individuals can learn appropriate skills and respond to the demands of their particular project, as they proceed; 8) Approaches apply across scales 9) Consider technical debt 10) Data evaluation can be conducted through use and feedback;
  • 20. How we got here • Idea formulated during discussion of Data Management Lifecycles at GeoData 2014 • Principles drafted for AGU 2014 • Two Research Data Alliance (RDA) Birds of a Feather sessions to explore community experiences

Editor's Notes

  • #2: This work is a joint effort of all authors.
  • #3: This talk and effort is inspired by the desire to move projects currently at risk of becoming dark data to at least become long tail data. However, the concepts described maybe useful to projects currently in the long tail or even big head spectrum.
  • #4: We need to recognize that there are at least two motivations for data management: internal reasons and external reasons. As researchers, there is a focus on our internal research needs but from a societal perspective the potentially greater value is from external access.
  • #6: Agile curation is not focused on assisting you with the workflow for your internal goals (though their maybe benefits there too). Instead the focus is on helping researchers meet external data management challenges.
  • #7: Internal workflows tend to be optimized at least based on the preferences of the individual researcher.
  • #8: Public-access or external access from the perspective of most researchers is at best a secondary purpose. These workflows are not optimized in the same way. These photos are analogous examples. A sign may be put out notifying the public something is freely available but the quality statement may be questioned (sign says good free stuff but it is for upholstered furniture in snow), it may offer no quality descriptor, or even no sign notifying free access and instead relies on awareness of social conventions. Does this sound like our current public access approach?
  • #10: Principles of agile curation