SlideShare a Scribd company logo
INTRODUCTION
TO
DATA CURATION
A presentation to the Illinois Association of
Astronomers and Astrophysicists (IAAA)
during the 2013 Conference in Chicago, IL.

By Katie Schmitt
WHAT IS DATA?
 "A reinterpretable representation of information in a formalized

manner suitable for communication, interpretation, or processing.
Examples of data include a sequence of bits, a table of numbers,
the characters on a page, the recording of sounds made by a
person speaking, or a moon rock specimen“ – OAIS reference
model, 2012
 Types of Data:
 Experimental
 Samples
 Reports
 Maps
 Websites
WHAT IS DATA CURATION?
DDI Combined
Life Cycle Model
[2004]

Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI
Alliance. 2004. http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pd
BEST PRACTICES - PROVENANCE
 prov·e·nance [ próvvənənss ]:
 the place of origin of something

 the source and ownership history

 Data Provenance
 Instrument characteristics, calibration data and method of discovery
 Processing algorithms
 Changes in location or instrumentation
 Changes in ownership of the data

 Where did the data come from and how did it get here?
BEST PRACTICES - METADATA
 Used to enable data discovery
 Metadata standards vary per data repository
 In general, metadata must be:
 Consistent
 Written for humans
 In a digital format
BEST PRACTICES PRESERVATION
 The best format is
 Platform and Vendor-independent
 Non-proprietary
 Stable
 Open
 Well-supported
 Unencrypted
 Uncompressed
 Self-describing

Source: Week 4 Slides by Ruth Duerr,
LIS590DCL
BEST PRACTICES – ACCESS
 Constant balance between preservation and access
 Similar to preservation format
 Master v. Access
TYPES OF REPOSITORIES
 Domain
 Established
 Often connected to a University
 Usually provide high levels of service
 Specialized by discipline

 Institutional
 Excel in basic service
 New to the data management realm.
A FEW RESOURCES…
 Choudhury, G. S., Palmer, C. L., Baker, K. S., & DiLauro, T. (2013,

January). Levels of services and curation for high-functioning data.
Presented at the International Digital Curation Conference,
Amsterdam, Netherlands.
 Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., & Moreau, L.

(2007). Connecting Scientific Data to Scientific Experiments with
Provenance. e-Science and Grid Computing, IEEE International
Conference, 179-186. http://dx.doi.org/ 10.1109/ESCIENCE.2007.22
 Renear, A. H., Sacchi, S., & Wickett, K. M. (2010). Definitions of

dataset in the scientific and technical literature. Proceedings of the
American Society for Information Science and Technology, 47(1), 1–4
QUESTIONS?

Katie Schmitt
@kmschmitt
kmschmi2@illinois.edu
http://www.katieschmitt.me

More Related Content

PDF
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
PPT
It's all semantics! -The premises and promises of the semantic web
PPTX
Encouraging an ecological evolution of data infrastructure
PPT
Being a Good Data Provider, by Alastair Dunning
PDF
NISO Webinar on data curation services at the CDL
PPT
Using DAF as a Data Scoping Tool, by Sarah Jones
PPTX
Geographic Information Management Transformation
DOCX
International Journal of Data mining Management Systems (IJDMS)
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
It's all semantics! -The premises and promises of the semantic web
Encouraging an ecological evolution of data infrastructure
Being a Good Data Provider, by Alastair Dunning
NISO Webinar on data curation services at the CDL
Using DAF as a Data Scoping Tool, by Sarah Jones
Geographic Information Management Transformation
International Journal of Data mining Management Systems (IJDMS)

What's hot (20)

PPTX
Scratchpads: the Virtual Research Environment for biodiversity data
PPT
Data Citation, The Dataverse Network ®, and Contributor Identifiers
PDF
Handout for Next-generation Subject Access for Music: Infrastructure Needs
PPTX
Fair - Interoperability - Keith Russell
PPTX
Providing geospatial information as Linked Open Data
PPTX
Sharing data
DOCX
International Journal of Data mining Management Systems (IJDMS)
PPTX
FAIR Data - A is for accessible - Keith Russell 6 Sept 2017
DOCX
International Journal of Data mining Management Systems (IJDMS)
PPTX
Connecting Heterogeneous Collections using Linked Data
PPTX
Dilemmata of research infrastructures
DOCX
International Journal of Data mining Management Systems (IJDMS)
DOCX
International Journal of Data mining Management Systems (IJDMS)
DOCX
International Journal of Data mining Management Systems (IJDMS)
DOCX
International Journal of Data mining Management Systems (IJDMS)
DOCX
Ijdms
PPTX
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
PPTX
Data Management for Collaboration, Access, and Interoperability
PPTX
CORAL: Implementing an open source ERM
DOCX
International Journal of Data mining Management Systems (IJDMS)
Scratchpads: the Virtual Research Environment for biodiversity data
Data Citation, The Dataverse Network ®, and Contributor Identifiers
Handout for Next-generation Subject Access for Music: Infrastructure Needs
Fair - Interoperability - Keith Russell
Providing geospatial information as Linked Open Data
Sharing data
International Journal of Data mining Management Systems (IJDMS)
FAIR Data - A is for accessible - Keith Russell 6 Sept 2017
International Journal of Data mining Management Systems (IJDMS)
Connecting Heterogeneous Collections using Linked Data
Dilemmata of research infrastructures
International Journal of Data mining Management Systems (IJDMS)
International Journal of Data mining Management Systems (IJDMS)
International Journal of Data mining Management Systems (IJDMS)
International Journal of Data mining Management Systems (IJDMS)
Ijdms
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Data Management for Collaboration, Access, and Interoperability
CORAL: Implementing an open source ERM
International Journal of Data mining Management Systems (IJDMS)
Ad

Viewers also liked (12)

PPTX
Thermoregulation by the skin
DOCX
егэ
DOCX
Plan meropriatiy ege-2014
PPTX
Thermoregulation by the skin
PPTX
The Milliennials
 
PPTX
Thermoregulation by the skin
PPTX
Thermoregulation by the skin
PPT
перевозка хрупких грузов
PPTX
Thermoregulation by the skin
PPTX
Effective learning
 
PPTX
Como crear un correo hotmail
PPTX
Sunil thawani bpr winning edge by osama hanafi
Thermoregulation by the skin
егэ
Plan meropriatiy ege-2014
Thermoregulation by the skin
The Milliennials
 
Thermoregulation by the skin
Thermoregulation by the skin
перевозка хрупких грузов
Thermoregulation by the skin
Effective learning
 
Como crear un correo hotmail
Sunil thawani bpr winning edge by osama hanafi
Ad

Similar to Foundations of Data Curation Final Project (20)

PPT
Data Sharing & Data Citation
PDF
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
PPT
Data curation issues for repositories
PPTX
Toward universal information access on the digital object cloud
PDF
Open Data is Not Enough: Making Data Sharing Work
PDF
Agile Curation Poster
PPTX
Meeting the NSF DMP Requirement June 13, 2012
PPTX
FAIRer Research
PPTX
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
PPTX
Research Objects: more than the sum of the parts
PPT
Using Dataverse Virtual Archive Technology for Research Data Management
PPTX
Open data: Enhancing preservation, reproducibility, and innovation
PDF
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
PDF
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
PPTX
State of the Art Informatics for Research Reproducibility, Reliability, and...
PPTX
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
PDF
03 keynote dillo
PPTX
Research Data Management and Librarians
PPT
Provinance in scientific workflows in e science
PPT
Ands National Identifier Solution
Data Sharing & Data Citation
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Data curation issues for repositories
Toward universal information access on the digital object cloud
Open Data is Not Enough: Making Data Sharing Work
Agile Curation Poster
Meeting the NSF DMP Requirement June 13, 2012
FAIRer Research
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Research Objects: more than the sum of the parts
Using Dataverse Virtual Archive Technology for Research Data Management
Open data: Enhancing preservation, reproducibility, and innovation
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
State of the Art Informatics for Research Reproducibility, Reliability, and...
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
03 keynote dillo
Research Data Management and Librarians
Provinance in scientific workflows in e science
Ands National Identifier Solution

Recently uploaded (20)

PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
Unit 1 Cost Accounting - Cost sheet
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPTX
Principles of Marketing, Industrial, Consumers,
PPT
Data mining for business intelligence ch04 sharda
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
Chapter 5_Foreign Exchange Market in .pdf
PPTX
Lecture (1)-Introduction.pptx business communication
PDF
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
A Brief Introduction About Julia Allison
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PPTX
HR Introduction Slide (1).pptx on hr intro
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Unit 1 Cost Accounting - Cost sheet
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
Ôn tập tiếng anh trong kinh doanh nâng cao
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Principles of Marketing, Industrial, Consumers,
Data mining for business intelligence ch04 sharda
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
2025 Product Deck V1.0.pptxCATALOGTCLCIA
Laughter Yoga Basic Learning Workshop Manual
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Chapter 5_Foreign Exchange Market in .pdf
Lecture (1)-Introduction.pptx business communication
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
ICG2025_ICG 6th steering committee 30-8-24.pptx
A Brief Introduction About Julia Allison
Roadmap Map-digital Banking feature MB,IB,AB
HR Introduction Slide (1).pptx on hr intro
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions

Foundations of Data Curation Final Project

  • 1. INTRODUCTION TO DATA CURATION A presentation to the Illinois Association of Astronomers and Astrophysicists (IAAA) during the 2013 Conference in Chicago, IL. By Katie Schmitt
  • 2. WHAT IS DATA?  "A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen“ – OAIS reference model, 2012  Types of Data:  Experimental  Samples  Reports  Maps  Websites
  • 3. WHAT IS DATA CURATION? DDI Combined Life Cycle Model [2004] Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pd
  • 4. BEST PRACTICES - PROVENANCE  prov·e·nance [ próvvənənss ]:  the place of origin of something  the source and ownership history  Data Provenance  Instrument characteristics, calibration data and method of discovery  Processing algorithms  Changes in location or instrumentation  Changes in ownership of the data  Where did the data come from and how did it get here?
  • 5. BEST PRACTICES - METADATA  Used to enable data discovery  Metadata standards vary per data repository  In general, metadata must be:  Consistent  Written for humans  In a digital format
  • 6. BEST PRACTICES PRESERVATION  The best format is  Platform and Vendor-independent  Non-proprietary  Stable  Open  Well-supported  Unencrypted  Uncompressed  Self-describing Source: Week 4 Slides by Ruth Duerr, LIS590DCL
  • 7. BEST PRACTICES – ACCESS  Constant balance between preservation and access  Similar to preservation format  Master v. Access
  • 8. TYPES OF REPOSITORIES  Domain  Established  Often connected to a University  Usually provide high levels of service  Specialized by discipline  Institutional  Excel in basic service  New to the data management realm.
  • 9. A FEW RESOURCES…  Choudhury, G. S., Palmer, C. L., Baker, K. S., & DiLauro, T. (2013, January). Levels of services and curation for high-functioning data. Presented at the International Digital Curation Conference, Amsterdam, Netherlands.  Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., & Moreau, L. (2007). Connecting Scientific Data to Scientific Experiments with Provenance. e-Science and Grid Computing, IEEE International Conference, 179-186. http://dx.doi.org/ 10.1109/ESCIENCE.2007.22  Renear, A. H., Sacchi, S., & Wickett, K. M. (2010). Definitions of dataset in the scientific and technical literature. Proceedings of the American Society for Information Science and Technology, 47(1), 1–4