SlideShare a Scribd company logo
Documentation and Metadata
Sherry Lake
Data Life Cycle
Re-Purpose
Re-Use Deposit
Data
Collection
Data
Analysis
Data
Sharing
Proposal
Planning
Writing
Data
Discovery
End of
Project
Data
Archive
Project
Start Up
Andrea Denton
We’ll Explore
• Why is documenting your research
important?
• What do you document (files? datasets?
projects? Hands-on
• What are the common types of
documentation?
• Metadata: What is it? Why is it important?
Hands-on
• Q & A
You’re already documenting your data
• Notebook
– Paper
– Digital
– Lab
• Folders with notes, text files
• Sources, experiments or surveys,
procedures, etc.
Critical roles of data documentation
• Data Use
– To know enough details about how the how the data
were collected and stored
• Data Discovery
– To be able to identify important data sets
• Data Retrieval
– To know how and where to access data
• Data Archiving
– Data can grow more valuable with time, but only if the
critical information required to retrieve and interpret
the data remains available
Information EntropyInformationContentofDataandMetadata
Time of data development
Specific details about problems with individual items or specific
dates are lost relatively rapidly
General details about datasets are lost
through time
Accident or
technology
change may
make data
unusable
Retirement or career change makes
access to “mental storage” difficult
or unlikely
Loss of investigator
leads to loss of
remaining information
TIME
From Michener et al 1997
http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2)
Elements of Documentation
Good data documentation answers these
basic questions:
• Why were the data created?
• What is the data about?
• What is the content of the data? The
structure?
• Who created the data?
• Who maintains it?
Elements of Documentation, continued
• How were the data created?
• How were the data produced/analyzed?
• Where was it collected (geographic
location)?
• When were the data collected? When
were they published?
• How should the data be cited?
Documentation throughout your research
Variable or Item Level File or Dataset Level Project or Study Level
• Labels, codes,
classifications
• Missing values (and
how they are
represented)
• Inventory of data files
• Relationship between
those files
• Records, cases, etc.
• What the study set out
to do; research
questions
• How it contributes
new knowledge to the
field
• Methodologies used,
instruments and
measures
UK Data Service: http://ukdataservice.ac.uk/media/440277/documentingdata.pdf/
Exercise 1: Exploring Documentation
• Refer to the files on the Data Management
Bootcamp site, either
– http://guides.lib.odu.edu/VADMBC/materials
• In the section Documentation and Metadata
Exercise_1_Data_Documentation Worksheet
– Or, you may have a handout “Exercise 1”
Exercise 1: Exploring Documentation
• For Column 1, take 2-3 minutes and, for each
row, write down what general concept (who,
what, when, where, how, or why, or a combination
of these) that field describes about data, if
applicable.
• Now take 2-3 minutes to complete Column 2.
Considering your research data, what
information would you provide for each field?
• Don’t have research data? Use the file
DailyWeather to fill in Column 2.
Exercise 1 continued
• Take 2 minutes
• There is a blank row under each category for any
information specific to your field, e.g. latitude and
longitude, species, etc.
• Please share an example with the class in the
Google doc “Questions: Ask them here”
Wrapping up: elements of documentation
• We’ve looked at commonly used fields
• What does your discipline say about
what you should document?
• The answers you’ve provided could be
used to create a data dictionary
– we’ll examine next
Types of Documentation
• ReadMe File
• Data Dictionary
• Codebook
ReadMe
• Describes the core documentation about
an investigation and its data files
• Typically a simple text file
• Can describe the individual file(s) and/or
data package as a whole
ReadMe Example - File
ReadMe Example - File
ReadMe Example - Dataset
Data Dictionary
• Provides definitions of the data fields in a
data file
• More details on the variables, observations
of a file
Data Dictionary
• Used to understand the data and the
databases that contain it
• Identifies data elements and their
attributes including names, definitions and
units of measure and other information
• Often they are organized as a table
http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_
and_usage_version_1.1_2006-11-14.pdf
Data Dictionary Example: the dataset
http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf
Data Dictionary Example: the dictionary
Exercise 2: Data Dictionary
• Refer to the files on the Data Management
Bootcamp site, either
– http://guides.lib.odu.edu/VADMBC/materials
• In the section Documentation and Metadata
Exercise_2_DataDictionaryTemplate
– Or, you may have a handout “Exercise 2”
• Open the file DailyWeather
Weather data source:
http://www.ncdc.noaa.gov/cdo-
web/search?datasetid=GHCND
• Use the Daily Weather dataset
– Two worksheets (tabs)
• Data
• Definitions
• Start by answering the questions
• Fill out a data dictionary for this dataset
Exercise 2: Data Dictionary Creation
Exercise 2 Discussion
What is a Codebook?
• Typical in social sciences research
• Includes elements similar to readme and
dictionary
– Project level information (e.g. survey design
and methodology)
– Response codes for each variable
– Codes used to indicate nonresponse and
missing data
http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-
codebook
What is a Codebook?
• Additionally, codebooks may also contain:
– A copy of the survey questionnaire (if applicable)
– Exact questions and skip patterns used in a
survey
– Frequencies of response
• Quite long!
http://www.icpsr.umich.edu/icpsrweb/ICPSR/s
upport/faqs/2006/01/what-is-codebook
Codebook Example
http://www.icpsr.umich.edu/icpsrweb/ICPSR/help/cb9721.jsp
Codebook Example
http://dataarchives.ss.ucla.edu/archive%20tutorial/aboutcodebooks.html
Other Examples of Data Documentation
• Lab notebooks
• Software syntax
• Programming code
• Instrument settings and/or calibration
• Provenance of sources of data
• Embedded metadata (e.g. EXIF, FITS)
Metadata
• What is it?
– Information that describes a resource
– NISO: “metadata is structured information that
describes, explains, locates, or otherwise makes it
easier to retrieve, use, or manage an information
resource”
• Why is it important?
– Enables a resource or data to be easily
discovered
– Good metadata will help others understand and
use your data
Metadata in Everyday Life
DataONE Education Module: Metadata. DataONE. Retrieved Nov 12, 2012. From
http://www.dataone.org/sites/all/documents/L07_Metadata.pptx
Author(s) Boullosa, Carmen.
Title(s) They're cows, we're pigs /
by Carmen Boullosa
Place New York : Grove Press, 1997.
Physical Descr viii, 180 p ; 22 cm.
Subject(s) Pirates Caribbean Area Fiction.
Format Fiction
Metadata Formats
• Documentation for understanding & re-use
– Readme File
– Data Dictionary
– Codebook
• Structured documentation in XML format for
use in programs (few examples)
– DDI
– FGDC
– EML
Exercise 3: XML File Creation
• Refer to the files on the Data Management
Bootcamp site, either
– http://guides.lib.odu.edu/VADMBC/materials
• In the section Documentation and Metadata
Exercise_3_Weather-DDI-XML-FillinBlanks
– Or, you may have a handout “Exercise 3”
Exercise 3: XML File Creation
• Take the file Weather-DDI-XML and fill in
the blanks (as best you can) using:
• the file DailyWeather
• and/or Exercise 2 Data Dictionary
Exercise 3 Discussion
Exercise 3 Discussion
Exercise 3 Discussion
Structured XML
A Few Standard Schemes (XML)
– DDI– Data Document Initiative
http://www.ddialliance.org/
– FGDC– Geospatial Metadata Standard
http://www.fgdc.gov/metadata/geospatial-metadata-
standards
– EML– Ecological Metadata Language
http://knb.ecoinformatics.org/software/eml/
FGDC Example
Structured Metadata Tools
Tools
– Colectica add-on for Excel (DDI)
– Nesstar (DDI)
– Metavist (FGDC)
– ArcGIS (FGDC) *
– Morpho (EML)
http://data.library.virginia.edu/data-management/plan/metadata/metadata-workshop/
Example 1: Nesstar DDI Tool
Example 2: Metavist FGDC Tool
Metadata Concept Map by Amanda Tarbet is licensed under a Creative Commons Attribution-
NonCommercial-ShareAlike 3.0 Unported License.
Metadata Standards
Metadata Wrap-up
How to chose a metadata standard or
documentation format?
• What does your discipline use?
• Look at what depositing repository requires
Research Life Cycle
Data Life Cycle
Re-
Purpose
Re-
Use
Deposit
Data
Collection
Data
Analysis
Data
Sharing
Proposal
Planning
Writing
Data
Discovery
End of
Project
Data
Archive
Project
Start Up
QUESTIONS?

More Related Content

PDF
Using a Case Study to Teach Data Management to Librarians
PPTX
Creating dmp
PPTX
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
PPTX
Managing the research life cycle
PPTX
Why managedata
PPTX
Best practices data collection
PPTX
Best practices data management
PPTX
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Using a Case Study to Teach Data Management to Librarians
Creating dmp
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Managing the research life cycle
Why managedata
Best practices data collection
Best practices data management
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...

What's hot (20)

PPTX
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
PPTX
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
PDF
Support Your Data, Kyoto University
PPTX
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
PPTX
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
PPTX
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
PPTX
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
PPTX
Data management for TA's
PDF
Praetzellis "Data Management Planning and Tools"
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
Introduction to research data management
PPTX
Introduction to Data Management
PPTX
Data as a Library Aquisition
PPTX
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
PDF
Strasser "Effective data management and its role in open research"
PPTX
Research Data Management for SOE
PPTX
Data Management for Research (New Faculty Orientation)
PDF
Best Practice in Data Management and Sharing
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
Research Lifecycles and RDM
Preparing Your Research Material for the Future - 2014-06-09 - Humanities Div...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Support Your Data, Kyoto University
Introduction to Research Data Management - 2015-05-27 - Social Sciences Divis...
Introduction to Research Data Management - 2016-02-03 - MPLS Division, Univer...
Preparing Your Research Material for the Future - 2015-02-23 - Humanities Div...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Data management for TA's
Praetzellis "Data Management Planning and Tools"
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Introduction to research data management
Introduction to Data Management
Data as a Library Aquisition
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Strasser "Effective data management and its role in open research"
Research Data Management for SOE
Data Management for Research (New Faculty Orientation)
Best Practice in Data Management and Sharing
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Research Lifecycles and RDM
Ad

Viewers also liked (20)

PPTX
Presentacionpiramidedobleproposito
PDF
Elements of Data Documentation
PDF
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
PDF
Data documentation and contextual descriptions
PPTX
Data Life Cycle
PDF
DMTool-ASERL-Webinar
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
PPTX
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
PPTX
JOSA TechTalk: Metadata Management
in Big Data
PPT
0104 abap dictionary
PDF
Data Lakes: 8 Enterprise Data Management Requirements
PPT
Implementing an REA Model in a Relational Database (Chapter 16:)
PPT
Sad format
PPTX
Video Analysis in Hadoop
PDF
Data Lake: A simple introduction
PPTX
Audit Documentation Presentation
PDF
Reactive app using actor model & apache spark
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
PDF
Reactive dashboard’s using apache spark
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Presentacionpiramidedobleproposito
Elements of Data Documentation
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
Data documentation and contextual descriptions
Data Life Cycle
DMTool-ASERL-Webinar
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
JOSA TechTalk: Metadata Management
in Big Data
0104 abap dictionary
Data Lakes: 8 Enterprise Data Management Requirements
Implementing an REA Model in a Relational Database (Chapter 16:)
Sad format
Video Analysis in Hadoop
Data Lake: A simple introduction
Audit Documentation Presentation
Reactive app using actor model & apache spark
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Reactive dashboard’s using apache spark
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Ad

Similar to Documentation and Metdata - VA DM Bootcamp (20)

PDF
Data Management Lab: Session 2 slides
PDF
Data management: documentation and metadata
PDF
Preparing data and documentation for digital curation
PDF
Deposit data to data centre: ADP case
PPTX
Research data life cycle
PDF
Data Matters for AGU Early Career Conference
PPTX
Data Archiving and Sharing
PPTX
Data Literacy: Creating and Managing Reserach Data
PPTX
Good Practice in Research Data Management
PPTX
FSCI Data Discovery
PDF
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
PDF
Research Data Management and Sharing for the Social Sciences and Humanities
PPTX
CSU-ACADIS_dataManagement101-20120217
PPTX
Data 2014
PPT
Elag workshop sessie 1 en 2 v10
PDF
Researh data management
PDF
Intro to dh data management
PDF
Coping with Data for WHOI JP Students
PPTX
LIS 653, Session 11: Data Management & Curation
PDF
Data Stewardship for SPATIAL/IsoCamp 2014
Data Management Lab: Session 2 slides
Data management: documentation and metadata
Preparing data and documentation for digital curation
Deposit data to data centre: ADP case
Research data life cycle
Data Matters for AGU Early Career Conference
Data Archiving and Sharing
Data Literacy: Creating and Managing Reserach Data
Good Practice in Research Data Management
FSCI Data Discovery
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
Research Data Management and Sharing for the Social Sciences and Humanities
CSU-ACADIS_dataManagement101-20120217
Data 2014
Elag workshop sessie 1 en 2 v10
Researh data management
Intro to dh data management
Coping with Data for WHOI JP Students
LIS 653, Session 11: Data Management & Curation
Data Stewardship for SPATIAL/IsoCamp 2014

More from Sherry Lake (16)

PPTX
Planning for Libra Data
PDF
DMPTool Workshop University of Georgia
PDF
Federal funder mandates
PDF
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
PPTX
Data Management Planning for Engineers
PPTX
DMPTool Webinar Environmental Scan
PPTX
Lake dmp tool_i_conference
PPTX
Lake us-canada policesupdate
PPTX
Re tooling for data management-support
PPTX
Web links
PPTX
Dmp tool presentation
PPTX
Funder requirements for Data Management Plans
PPTX
Library support for life cycle
PPTX
Environmental scan - Keeping Updated
PPTX
Re tooling for data management-support
PPTX
Supporting research life cycle librarians
Planning for Libra Data
DMPTool Workshop University of Georgia
Federal funder mandates
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
Data Management Planning for Engineers
DMPTool Webinar Environmental Scan
Lake dmp tool_i_conference
Lake us-canada policesupdate
Re tooling for data management-support
Web links
Dmp tool presentation
Funder requirements for Data Management Plans
Library support for life cycle
Environmental scan - Keeping Updated
Re tooling for data management-support
Supporting research life cycle librarians

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Pharma ospi slides which help in ospi learning
PPTX
master seminar digital applications in india
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Types and Its function , kingdom of life
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Complications of Minimal Access Surgery at WLH
PDF
01-Introduction-to-Information-Management.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Sports Quiz easy sports quiz sports quiz
Pharma ospi slides which help in ospi learning
master seminar digital applications in india
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
Cell Types and Its function , kingdom of life
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Microbial diseases, their pathogenesis and prophylaxis
PPH.pptx obstetrics and gynecology in nursing
2.FourierTransform-ShortQuestionswithAnswers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Renaissance Architecture: A Journey from Faith to Humanism
O7-L3 Supply Chain Operations - ICLT Program
Complications of Minimal Access Surgery at WLH
01-Introduction-to-Information-Management.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Basic Mud Logging Guide for educational purpose
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Documentation and Metdata - VA DM Bootcamp

  • 1. Documentation and Metadata Sherry Lake Data Life Cycle Re-Purpose Re-Use Deposit Data Collection Data Analysis Data Sharing Proposal Planning Writing Data Discovery End of Project Data Archive Project Start Up Andrea Denton
  • 2. We’ll Explore • Why is documenting your research important? • What do you document (files? datasets? projects? Hands-on • What are the common types of documentation? • Metadata: What is it? Why is it important? Hands-on • Q & A
  • 3. You’re already documenting your data • Notebook – Paper – Digital – Lab • Folders with notes, text files • Sources, experiments or surveys, procedures, etc.
  • 4. Critical roles of data documentation • Data Use – To know enough details about how the how the data were collected and stored • Data Discovery – To be able to identify important data sets • Data Retrieval – To know how and where to access data • Data Archiving – Data can grow more valuable with time, but only if the critical information required to retrieve and interpret the data remains available
  • 5. Information EntropyInformationContentofDataandMetadata Time of data development Specific details about problems with individual items or specific dates are lost relatively rapidly General details about datasets are lost through time Accident or technology change may make data unusable Retirement or career change makes access to “mental storage” difficult or unlikely Loss of investigator leads to loss of remaining information TIME From Michener et al 1997 http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2)
  • 6. Elements of Documentation Good data documentation answers these basic questions: • Why were the data created? • What is the data about? • What is the content of the data? The structure? • Who created the data? • Who maintains it?
  • 7. Elements of Documentation, continued • How were the data created? • How were the data produced/analyzed? • Where was it collected (geographic location)? • When were the data collected? When were they published? • How should the data be cited?
  • 8. Documentation throughout your research Variable or Item Level File or Dataset Level Project or Study Level • Labels, codes, classifications • Missing values (and how they are represented) • Inventory of data files • Relationship between those files • Records, cases, etc. • What the study set out to do; research questions • How it contributes new knowledge to the field • Methodologies used, instruments and measures UK Data Service: http://ukdataservice.ac.uk/media/440277/documentingdata.pdf/
  • 9. Exercise 1: Exploring Documentation • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_1_Data_Documentation Worksheet – Or, you may have a handout “Exercise 1”
  • 10. Exercise 1: Exploring Documentation • For Column 1, take 2-3 minutes and, for each row, write down what general concept (who, what, when, where, how, or why, or a combination of these) that field describes about data, if applicable. • Now take 2-3 minutes to complete Column 2. Considering your research data, what information would you provide for each field? • Don’t have research data? Use the file DailyWeather to fill in Column 2.
  • 11. Exercise 1 continued • Take 2 minutes • There is a blank row under each category for any information specific to your field, e.g. latitude and longitude, species, etc. • Please share an example with the class in the Google doc “Questions: Ask them here”
  • 12. Wrapping up: elements of documentation • We’ve looked at commonly used fields • What does your discipline say about what you should document? • The answers you’ve provided could be used to create a data dictionary – we’ll examine next
  • 13. Types of Documentation • ReadMe File • Data Dictionary • Codebook
  • 14. ReadMe • Describes the core documentation about an investigation and its data files • Typically a simple text file • Can describe the individual file(s) and/or data package as a whole
  • 17. ReadMe Example - Dataset
  • 18. Data Dictionary • Provides definitions of the data fields in a data file • More details on the variables, observations of a file
  • 19. Data Dictionary • Used to understand the data and the databases that contain it • Identifies data elements and their attributes including names, definitions and units of measure and other information • Often they are organized as a table http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_ and_usage_version_1.1_2006-11-14.pdf
  • 20. Data Dictionary Example: the dataset http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf
  • 21. Data Dictionary Example: the dictionary
  • 22. Exercise 2: Data Dictionary • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_2_DataDictionaryTemplate – Or, you may have a handout “Exercise 2” • Open the file DailyWeather Weather data source: http://www.ncdc.noaa.gov/cdo- web/search?datasetid=GHCND
  • 23. • Use the Daily Weather dataset – Two worksheets (tabs) • Data • Definitions • Start by answering the questions • Fill out a data dictionary for this dataset Exercise 2: Data Dictionary Creation
  • 25. What is a Codebook? • Typical in social sciences research • Includes elements similar to readme and dictionary – Project level information (e.g. survey design and methodology) – Response codes for each variable – Codes used to indicate nonresponse and missing data http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is- codebook
  • 26. What is a Codebook? • Additionally, codebooks may also contain: – A copy of the survey questionnaire (if applicable) – Exact questions and skip patterns used in a survey – Frequencies of response • Quite long! http://www.icpsr.umich.edu/icpsrweb/ICPSR/s upport/faqs/2006/01/what-is-codebook
  • 29. Other Examples of Data Documentation • Lab notebooks • Software syntax • Programming code • Instrument settings and/or calibration • Provenance of sources of data • Embedded metadata (e.g. EXIF, FITS)
  • 30. Metadata • What is it? – Information that describes a resource – NISO: “metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” • Why is it important? – Enables a resource or data to be easily discovered – Good metadata will help others understand and use your data
  • 31. Metadata in Everyday Life DataONE Education Module: Metadata. DataONE. Retrieved Nov 12, 2012. From http://www.dataone.org/sites/all/documents/L07_Metadata.pptx Author(s) Boullosa, Carmen. Title(s) They're cows, we're pigs / by Carmen Boullosa Place New York : Grove Press, 1997. Physical Descr viii, 180 p ; 22 cm. Subject(s) Pirates Caribbean Area Fiction. Format Fiction
  • 32. Metadata Formats • Documentation for understanding & re-use – Readme File – Data Dictionary – Codebook • Structured documentation in XML format for use in programs (few examples) – DDI – FGDC – EML
  • 33. Exercise 3: XML File Creation • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_3_Weather-DDI-XML-FillinBlanks – Or, you may have a handout “Exercise 3”
  • 34. Exercise 3: XML File Creation • Take the file Weather-DDI-XML and fill in the blanks (as best you can) using: • the file DailyWeather • and/or Exercise 2 Data Dictionary
  • 38. Structured XML A Few Standard Schemes (XML) – DDI– Data Document Initiative http://www.ddialliance.org/ – FGDC– Geospatial Metadata Standard http://www.fgdc.gov/metadata/geospatial-metadata- standards – EML– Ecological Metadata Language http://knb.ecoinformatics.org/software/eml/
  • 40. Structured Metadata Tools Tools – Colectica add-on for Excel (DDI) – Nesstar (DDI) – Metavist (FGDC) – ArcGIS (FGDC) * – Morpho (EML) http://data.library.virginia.edu/data-management/plan/metadata/metadata-workshop/
  • 41. Example 1: Nesstar DDI Tool
  • 42. Example 2: Metavist FGDC Tool
  • 43. Metadata Concept Map by Amanda Tarbet is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 3.0 Unported License. Metadata Standards
  • 44. Metadata Wrap-up How to chose a metadata standard or documentation format? • What does your discipline use? • Look at what depositing repository requires
  • 45. Research Life Cycle Data Life Cycle Re- Purpose Re- Use Deposit Data Collection Data Analysis Data Sharing Proposal Planning Writing Data Discovery End of Project Data Archive Project Start Up

Editor's Notes

  • #4: In fact, you probably already have metadata in some form. You just may not recognize it as such. For instance, among your work records, you certainly have notebooks stuffed with color-coded pages or assorted keys to your data stored on your computer. Perhaps the most common form of metadata that you may already have is a file folder filled with notes on your data sources and the procedures that you used to build your data. However, unless you’ve been unusually diligent, your information is probably not organized so that a stranger could stroll into your office at any time, and read and understand it easily.
  • #5: From: EML Best Practices for LTER Sites – Oct. 2004 Identification:----locate Minimum content for adequate data set discovery in a general cataloging system or repository title creator contact publisher pubDate keywords abstract (recommended) dataset/distribution (i.e. url for general dataset information) Discovery: Level 1 content, plus coverage information to support targeted searches, adding elements: Geographic Coverage Taxonomic Coverage Temporal Coverage Evaluation Level 2 content, plus data set details to enable end-user evaluation of the methodology and data entities, adding elements: Intellectual Rights project methods dataTable/entityGroup dataTable/attributes Access Level 3 content plus data access details to support automated data retrieval, adding elements: access physical Integration: Level 4 content plus complete attribute and quality control details to support computer-assisted data integration and re-sampling, adding elements: Attribute List (full descriptions) Constraint Quality Control
  • #6: This graph illustrates the phenomenon of “information entropy”, associated with research. At the time of the research project, a scientists memory is fresh. Details about the development of the dataset are easily recalled, and it is a good time to document information about the process. Over time, memory of the details begins to fade. A variety of circumstances can intervene, and eventually detailed knowledge about the dataset fades. Without a metadata record, this data might be unusable. A dataset it not considered complete without a metadata record to accompany it. Michener, W. , et al. (1997). Nongeospatial Metadata for the Ecological Sciences. Ecological Applications, 7(1), 330–342.
  • #7: Good metadata answers a wide range of questions, including:
  • #8: Good metadata answers a wide range of questions, including:
  • #9: UK Data Service http://www.data-archive.ac.uk/create-manage/document MANTRA Project level: A complete academic thesis normally contains this information in detail, but a published article may not. If a dataset is shared, a detailed technical report will need to be included for the user to understand how the data were collected and processed. You should also provide a sample bibliographic citation to indicate how you would like secondary users of your data to cite it in any publications, etc.
  • #10: Explain instructions
  • #12: Fix this!!
  • #14: Part Two Three methods or ways to represent or describe your data. Text file. Next slide Hmm, can we really explain what “unstructured” is until we explain what “structured” is? Maybe not focus on that as much until later?
  • #20: A simple data dictionary is an organized collection of data element names and definitions, arranged in a table.
  • #21: Here is a dataset – what does it mean? What does each column represent?
  • #22: This dictionary helps interpret the data (spreadsheet) by providing the link between the variable names and what they represent (in the description). It also tells you about what type of data we should expect, and what the specific values might be.
  • #23: Explain instructions
  • #24: Bottom of worksheet is typical format for a data dictionary. Use the dataset to fill out the dictionary. Some of the answers to the questions may help them with the dictionary or vice versa
  • #25: This is a partial dataset of a more complete Daily Weather database held at the National Climatic Data Center. If you want to know more details about this partial data set, a file with complete documentation about The data collection and all the variables, you can look at the file “DailyWeather_Complete Documentation” on the libguide materials page. Need units to understand and compare or use w/ other data files PRCP: tenth of mm = .1 mm….. 147, 84…. 14.7mm (.6 in), 8.4 mm (.33 in) TMAX: Celsius to tenths, 94 = 9.4C … 49F, .6C…33F TMIN: Celsius to tenths, -93 = -9.3C.. 15F Without codes defined, impossible to deduce Column headings should be unique (Measurement Flag & Source Flag) As you look at this dataset, and other sources of data, think about what information you need to duplicate or use, or understand the meaning of the data (observations). Keep your answers for this exercise handy, as you will use them for exercise #3.
  • #26: Data dictionary is part of the codebook, along with… Typically Social Science Includes narrative about project level information (might be your readme!) Can be very long
  • #29: Below is a diagram of the details included in most codebooks. This is a simple example of a codebook. In a complex survey,there will be more details about the flow of questions asked and the electronic organization of the data. In addition to question text, the most important items are variable names, values, value labels, and column locations. Try to find them in the example below and familiarize yourself with the layout of a codebook. Here is an explanation of the codebook image above.The circled red numbers relate to each ofthe subject headings below. 1) Item or Variable Name This is usually a mnemonic, or nickname assigned to an individual question.2) Variable LabelA short summary or description of question content.3) Card and Column Locations Indicates the electronic location of numerically coded responses to questions.4) Question TextExact text of a question as delivered to a respondent.5) Values and Value LabelsDescribes the numeric and textual response options or categories to questions.6) Valid ResponsesIndicates the allowed numeric codes to question responses.7) BranchingIndicates the flow of the questionnaire.
  • #30: The Excel spreadsheet, your datasets might not be their data – e.g. other types that the data dictionary might not easily describe Your code, your SPSS
  • #31: With regards to research….. What is Metadata? Information about research/resource can also be in an unstructured format. In a “structual” format, I’m talking about a machine readable format, that search engines and other programs can read and interpret. It enables a resource or data to be easily discovered. As well as…. Help others understand and use your data. It doesn’t necessarily replace the types of documentation we have been talking about, those are more for “humans” for understandability (and are considered “metadata”…. Just with lots of words). Structural metadata has a different purpose.
  • #32: Metadata is all around us. . .from Mp3 players, to nutrition labels, to library card catalogues. For example, a card catalogue tell us more information than just the title of the book, they also tells the user: Who is the author? Who published the book? What subject area does the book fall in? And finally, where is it located in the library? Another example of metadata that we see in our daily lives is the nutrition and ingredient information on food labels. Nutrition labels answer questions such as: What ingredients were used? Who made the food? How many calories per serving? How many servings in the can? What percentage of daily vitamins are in each serving? And in case you didn’t know most of our productivity software (word, PDF files, iPhoto, etc.) creates (and allow you to add) metadata.
  • #33: The same information that goes into the documentation, goes into structured metadata. The format most widely used is in XML, at HTML-like (ASCII) file. These are 3 different standards, of many (I’ll talk a little bit about the differences in later slides) in 3 different disciplines – focused on the type of data generated: DDI for Social Science, more geared toward interviews, surveys, etc. FGDC for GIS (geospatial, map related) and EML – ecological metadata for life sciences. In addition to being used for searching, Also in this structured format allows programs to convert one version of the XML to another, call this cross-walk. Allows interdisciplinary work. Structured XML used for searching, cross-walk between metadata standards.
  • #34: We are now going to create (actually fillin the blanks) an XML file using the standard DDI. Use the file online, or handout
  • #35: You will need to use the DailyWeather file and/or the Data Dictionary you created in exercise #2 to fill in the blanks. You have 5 min.
  • #36: This XML file, as all XML files are text files, indentation does not matter, I just tried to indent to show the “levels”. Each metadata standard has fields, some required, some not. How easy was this to “fill in the blanks”? Would you be able to create this from scratch? How would you know what to include (mandatory) for the particular XML schema? I’ll add a file with the “answers” to this section on the libguide later this after noon.
  • #37: This section has more of the description details, subject (keyword), abstract, time coverage, geographical coverage. Not all fields are used for all data.
  • #38: How easy was this to “fill in the blanks”? Would you be able to create this from scratch? How would you know what to include (mandatory) for the particular XML schema? These last two sections are “file” related. About the software that generated the file, and about the particular variables in each file (the information that you would find in a data dictionary?) I’ll add a file with the “answers” to this section on the libguide later this after noon.
  • #39: Web page has examples of these xml files. Uses standards: An established standard provides common terms, definitions and structure that allow for consistent communication. The use of standards also support search and retrieval in automated systems. A Standard provides a structure to describe data with: Common terms to allow consistency between records Common definitions for easier interpretation Common language for ease of communication Common structure to quickly locate information In search and retrieval, standards provide: Documentation structure in a reliable and predictable format for computer interpretation A uniform summary description of the dataset Many standards collect similar information Factors to consider: Your data type: (GIS – rastor/vector, ecological); Organizations policies; available resources (tools).
  • #40: Already have DDI example (full XML file), put a few FGDC snippets here so you can compare the different “tags” or elements, or fields from DDI to FGDC.
  • #41: I asked this question before: Would you be able to create this from scratch? Well the answer is maybe you don’t have to. This is a short list of “free” metadata creation tools. Unfortunately, the top three do not work on Macs. ArcCatalog, part of the ArcGIS software is also a good tool to use. And other software you may already be using, SPSS, ArcGIS, SAS already have some sort of documentation/metadata capture (export) feature.
  • #42: Screen shot of the tool Nesstar to create DDI metadata. This is a fill in the blank for fields, citation (title – author) and for description (keywords, abstract). This tool as do all the others, have an export function that creates the XML file for you.
  • #43: Here is an example of the tool Metavist for creating FGDC metadata. Here each “section” is a tab across the top. With fill-in the blanks.
  • #44: As I said, each discipline has different and maybe various metadata standards. I would say the sciences have the “most”, but having many standards isn’t good.
  • #45: Knowing the requirements for documentation at the start will enable you to design your data collection materials for easier metadata creation and facilitate your documentation creation. ICPSR: Deposits should include all data and documentation necessary to independently read and interpret the data collection. To deposit, you need, of course the data file(s), documentation for those files and a study description. Open ICPSR walks you through w/ fill-in blanks to fields. Dryad:We strongly encourage submitters to include one or more ReadMe files that provide additional information to help users make sense of the files (e.g., instructions for use with software scripts, variable abbreviations, measurement units, and data codes). View additional guidance on ReadMe files. A ReadMe file is intended to help ensure that your data can be correctly interpreted and reanalyzed by others.
  • #46: It is important to begin to document your data at the very beginning of your research project and continue throughout the project. By doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project. Don’t wait until the end to start to document your research project and its data! In order for the data to be used properly once it’s been archived the data must be documented. Data documentation (otherwise known as Metadata) enables you to understand the data in detail, enable others to find it, use it and properly cite it. It’s all about re-use, for you or someone else: When you provide data to someone else, what types of information would you want to include with the data? When you receive a dataset from an external source, what types of details do you want to know about the data? Reproducibility! (Dryad) Submitters should aim to provide sufficient data and descriptive information such that another researcher would be able to evaluate the findings described in the publication. This will generally include any data that are used in statistical tests, as well the individual data points behind published figures and tables.