SlideShare a Scribd company logo
Data Selection & Triage



     JISC/DCC
      Progress
    Workshop
    Managing
Research Data
& Institutional
  Engagement
  Nottingham
   25 October
          2012



                  This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
Introduction

How can researchers and support staff
effectively decide what data is worth holding
on to, agree what to do with it, and arrange
for its handover?
What challenges does this represent
How to address them?
Outline

• What guidelines are there and why do we need
  more?Angus Whyte DCC and Marie Therese
  Gramstadt - KAPTUR
• UK Data Archive's Data Review Process - Veerle
  van Eynden UKDA
• Applying NERC's Data Value Checklist - Sam
  Pepler, British Atmospheric Data Centre
• Discussion
Guidelines clarify expectations
                           …adapted by
                           Archaeology Data Service
                           NERC
                           KAPTUR
                           University of Leicester


        What criteria
         will be used to
         judge what’s
         handed over?
Basic model

1. Define a policy i.e. criteria
   and range of decisions                  All
2. Archive manager applies                data
   criteria, involving researchers
3. Select the significant,
   dispose of the rest               10
                                     %
For records records yes, but
   researchdata?                           90%
Characterising research data…
• Research process more uncertain and open-ended
  than admin processes
• Research data purpose may change before complete
• More effort to make reusable - complex inter-
  relationships, and richer contexts to document
• Originators should be engaged but may not have
  capacity e.g. if project funding has ceased
• Others may need to be involved with broader view of
  potential in other disciplines
• More than keep/dispose choice –need to prioritise
  attention and effort to make data fit for reuse
Triage analogy
     First                                     Deposit location
 characterise
research data                                  Institutional Data
                             Prioritise        Repository
      Criteria            High reuse value +   Data Centre
                          needs attention
Duty of care              affordable           Subject Repository etc.

Reuse value               Other
                          permutations         Tiered approach to
Quality and                                    deploying resources
                          More permutations
condition                                      Discoverability
Accessibility             Low reuse value,
                          Unaffordable         Access management
Costs associated
                                               Storage performance
         Potential to automate ?
                                               Preservation actions
Clarify expectations



        What kinds of
         “data” are
         wanted
          For what kinds
          of reuse
e.g.Data Centre Collection Policies

                “The ADS expects to
                  collect all of the
                  following
                  archaeological data
                  types…”




          http://archaeologydataservice.ac.uk/advice/collectionsPolicy

                                                                   9
Costs should persuade us
 IDC Digital Universe Study- Increasing volumes outpace declining storage
 hardware costs




According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos
http://www.emc.com/digital_universe.

                                                                              10
We can’t afford it all
                                       “Keeping 2018’s data in S3 would
                                       cost the entire global GDP”




http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html

                                                                          11
Selection presumes description

• You can’t value what you don’t know about!
• Researchers can’t afford NOT to spend effort
  on minimal metadata description and
  organisation, because costs of retention will
  be much higher if they don’t
• Description makes data affordable – is citation
  potential a concrete enough reward?


                                                  12
Challenges

• Identify what datasets are created
  and where they are
• Differentiate those that are of high
  value from those where most
  uncertainty or least reusability
• Be able to justify ‘natural’ wastage
  of low priority data as much as
  deliberate selection of high value
Questions
• What has worked/is working
• What lessons have you learned and
  how generalisable
• What challenges remain
• How may they be approached and
  what do you intend to do
• What DCC / MRD activity do you
  think may help make the challenge
  more tractable.

More Related Content

PPTX
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
PDF
Goethals Harvard Library's Digital Preservation Repository
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PPTX
Data curation
PDF
Dataset Citation and Identification
PPT
Developing research data management policy & services
PPTX
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
PPTX
Research data lifecycle diagram
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Goethals Harvard Library's Digital Preservation Repository
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Data curation
Dataset Citation and Identification
Developing research data management policy & services
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Research data lifecycle diagram

What's hot (12)

PPT
Digital Curation 101 - Taster
PPTX
Research bites: Digital Preservation for Research Data
PPT
Facing the data challenge: Developing data policy and services
PDF
Planning for Research Data Managment
PPTX
Developing an institutional research management plan: guidelines
PPTX
Managing your data paget
PPTX
Introduction to RDM for Geoscience PhD Students
PPTX
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
PPTX
Repository Federation: Towards Data Interoperability
PPTX
SEAD Datanet and Sustainability Science
PPTX
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
PPTX
Writing successful data management plans
Digital Curation 101 - Taster
Research bites: Digital Preservation for Research Data
Facing the data challenge: Developing data policy and services
Planning for Research Data Managment
Developing an institutional research management plan: guidelines
Managing your data paget
Introduction to RDM for Geoscience PhD Students
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
Repository Federation: Towards Data Interoperability
SEAD Datanet and Sustainability Science
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
Writing successful data management plans
Ad

Similar to Data Selection & Triage (20)

PDF
Challenges in setting up an RDM Support Service
PDF
Graham Pryor
PDF
Wheeler & Benedict -- Enabling the Preservation Relay
PPT
Managing data throughout the research lifecycle
PPTX
Introduction to research data management
PDF
Planning for Research Data Management
PPT
What is-rdm
PPTX
Gareth Knight: Building sustainability: Preserving research data without brea...
PPTX
Building Sustainability: Preserving research data without breaking the bank
PPTX
Why manage research data?
PPTX
Practical Research Data Management: tools and approaches, pre- and post-award
PPT
Supporting Libraries in Leading the Way in Research Data Management
PDF
Supporting Research Data Management at the University of Stirling
PPTX
Managing and Sharing Research Data
PPTX
Love Your Data Locally
PPTX
Improving RDM through closer integration of electronic lab notebooks and data...
PPTX
Research Data Management Storage Requirements: University of Leeds
PPT
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
PPTX
Research Data Mangagement Essentials, 5th July 2017
Challenges in setting up an RDM Support Service
Graham Pryor
Wheeler & Benedict -- Enabling the Preservation Relay
Managing data throughout the research lifecycle
Introduction to research data management
Planning for Research Data Management
What is-rdm
Gareth Knight: Building sustainability: Preserving research data without brea...
Building Sustainability: Preserving research data without breaking the bank
Why manage research data?
Practical Research Data Management: tools and approaches, pre- and post-award
Supporting Libraries in Leading the Way in Research Data Management
Supporting Research Data Management at the University of Stirling
Managing and Sharing Research Data
Love Your Data Locally
Improving RDM through closer integration of electronic lab notebooks and data...
Research Data Management Storage Requirements: University of Leeds
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Research Data Mangagement Essentials, 5th July 2017
Ad

More from The University of Edinburgh (8)

PPTX
Paving the way to open and interoperable research data service workflows
PPTX
Lhstm whyte readiness_slides
PPTX
Institutional Support for Research Data Management- Why, what and where next?...
PPTX
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
PPTX
How will repository and subject librarians roles interact to support data man...
PPTX
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
PPTX
Introduction to Research Data Management
PPTX
Reasons to select research data and where to start
Paving the way to open and interoperable research data service workflows
Lhstm whyte readiness_slides
Institutional Support for Research Data Management- Why, what and where next?...
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
How will repository and subject librarians roles interact to support data man...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Introduction to Research Data Management
Reasons to select research data and where to start

Data Selection & Triage

  • 1. Data Selection & Triage JISC/DCC Progress Workshop Managing Research Data & Institutional Engagement Nottingham 25 October 2012 This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
  • 2. Introduction How can researchers and support staff effectively decide what data is worth holding on to, agree what to do with it, and arrange for its handover? What challenges does this represent How to address them?
  • 3. Outline • What guidelines are there and why do we need more?Angus Whyte DCC and Marie Therese Gramstadt - KAPTUR • UK Data Archive's Data Review Process - Veerle van Eynden UKDA • Applying NERC's Data Value Checklist - Sam Pepler, British Atmospheric Data Centre • Discussion
  • 4. Guidelines clarify expectations …adapted by Archaeology Data Service NERC KAPTUR University of Leicester What criteria will be used to judge what’s handed over?
  • 5. Basic model 1. Define a policy i.e. criteria and range of decisions All 2. Archive manager applies data criteria, involving researchers 3. Select the significant, dispose of the rest 10 % For records records yes, but researchdata? 90%
  • 6. Characterising research data… • Research process more uncertain and open-ended than admin processes • Research data purpose may change before complete • More effort to make reusable - complex inter- relationships, and richer contexts to document • Originators should be engaged but may not have capacity e.g. if project funding has ceased • Others may need to be involved with broader view of potential in other disciplines • More than keep/dispose choice –need to prioritise attention and effort to make data fit for reuse
  • 7. Triage analogy First Deposit location characterise research data Institutional Data Prioritise Repository Criteria High reuse value + Data Centre needs attention Duty of care affordable Subject Repository etc. Reuse value Other permutations Tiered approach to Quality and deploying resources More permutations condition Discoverability Accessibility Low reuse value, Unaffordable Access management Costs associated Storage performance Potential to automate ? Preservation actions
  • 8. Clarify expectations What kinds of “data” are wanted For what kinds of reuse
  • 9. e.g.Data Centre Collection Policies “The ADS expects to collect all of the following archaeological data types…” http://archaeologydataservice.ac.uk/advice/collectionsPolicy 9
  • 10. Costs should persuade us IDC Digital Universe Study- Increasing volumes outpace declining storage hardware costs According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos http://www.emc.com/digital_universe. 10
  • 11. We can’t afford it all “Keeping 2018’s data in S3 would cost the entire global GDP” http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html 11
  • 12. Selection presumes description • You can’t value what you don’t know about! • Researchers can’t afford NOT to spend effort on minimal metadata description and organisation, because costs of retention will be much higher if they don’t • Description makes data affordable – is citation potential a concrete enough reward? 12
  • 13. Challenges • Identify what datasets are created and where they are • Differentiate those that are of high value from those where most uncertainty or least reusability • Be able to justify ‘natural’ wastage of low priority data as much as deliberate selection of high value
  • 14. Questions • What has worked/is working • What lessons have you learned and how generalisable • What challenges remain • How may they be approached and what do you intend to do • What DCC / MRD activity do you think may help make the challenge more tractable.