SlideShare a Scribd company logo
Best Practices
Creating Research Data




                         Sherry Lake
                         July 31, 2012 University of Florida Data Management Workshop
WHY?

Following these Best Practices…….
• Will improve the usability of the data by you
  or by others
• Your data will be “computer ready”
• Your data will be ready to share with others
Spreadsheet Examples
Spreadsheet Problems?
Problems

• Dates are not
  stored
  consistently
• Values are labeled inconsistently
• Data coding is inconsistent
• Order of values are different
Problems

• Confusion
  between
  numbers and
  text
• Different types of data are stored in the
  same columns
• The spreadsheet loses interpretability if it
  is sorted
Best Practices Data Organization
• Lines or rows of data should be complete
   – Designed to be machine readable, not human
     readable (sort)
Best Practices Data Organization


• Include a Header Line 1st line (or record)
• Label each Column with a short but
  descriptive name
  – Names should be unique
  – Use letters, numbers, or “_” (underscore)
  – Do not include blank spaces or symbols (+ - & ^ *)
Best Practices Data Organization


• Columns of data should be consistent
  – Use the same naming convention for text data
• Columns should include only a single kind of
  data
  – Text or “string” data
  – Integer numbers
  – Floating point or real numbers
Use Standardized Formats

• ISO 8601 Standard for Date and Time
  – YYYYMMDDThh:mmss.sTZD
               20091013T09:1234.9Z
       20091013T09:1234.9+05:00
• Spatial Coordinates for Latitute/Longitude
  – +/- DD.DDDDD
        -78.476 (longitude)
        +38.029 (latitude)
File Names
File Names
• Use descriptive names
• Not too long
• Don’t use spaces
• Try to include time,
  place & theme
• May use “-” or “_”
File Names

• String words together with
  Caps (VegBiodiv_2007)
• Think about using version
  numbers
• Don’t change default
  extensions (txt, jpg, csv,…)
Quantitative Assurance/Control
Dataset Creation & Integrity Errors
   • Use a data entry program
      – Program to catch typing errors
      – Program pull-down menu options
   • Perform double entry of the data
   • Manually check 5 – 10% of data records
   • Check for out-of-range values (plotting)
   • Check for missing or impossible values
   • Perform statistical summaries (random samples)
Analyzing Data - Notes
• Keep Original File
  – Uncorrected copy
  – Make “read-only”
• Make notes on transformations
• Any changes, save as a new file
• Use scripted code to transform and correct
  data
Analyzing Data
• Use a scripted program (R, SAS, SPSS, Matlab)
  – Steps are recorded in textual format
  – Can be easily revised and re-executed
  – Helps sharing and repetition
  – Easy to document
• GUI-bases analysis may be easier, but harder
  to reproduce
Document EVERYTHING!

• Create a Project Document File
  – More than a Lab Notebook
  – Data Management Plan
• Start at the beginning of the project and
  continue throughout data collection & analysis
  – Why you are collecting data
  – Exact details of methods of collecting & analyzing
Document EVERYTHING!
• Details such as:
  – Names of data & analysis files associated with
    study
  – Definitions for data and codes (include missing
    value codes, names) example
  – Units of measure (accuracy and precision)
  – Standards or instrument calibrations
Choosing File Formats

• Accessible Data (in the future)
  – Non-proprietary (software formats)
  – Open, documented standard
  – Common, used by the research community
  – Standard representation (ASCII, Unicode)
  – Unencrypted & Uncompressed
  – Media formats (hardware formats)
Preferred Format Choices
•   PDF, not Word
•   ASCII, not Excel
•   MPEG-4, not Quicktime
•   TIFF or JPEG2000, not GIF or JPG
•   XML or RDF, not RDBMS

Good if not software specific
Best Practices

1. Use Consistent Data Organization
2. Use Standardized Formats
3. Assign Descriptive File Names
4. Perform Basic Quality Assurance/ Quality Control
5. Use Scripted Program for Analysis and Keep Notes
6. Document EVERYTHING! (Define Contents of Data
   Files )
7. Use Consistent, Stable and Open File Formats
Best Practices Bibliography
Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some
   simple guidelines for effective data management. Bulletin of the Ecological
   Society of America, 90(2), 205-214.
Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E.
  (2010). Best Practices for Preparing Environmental Data Sets to Share and
  Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf)
  from Oak Ridge National Laboratory Distributed Active Archive Center, Oak
  Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.
Inter-university Consortium for Political and Social Research (ICPSR). (2012).
    Guide to social science data preparation and archiving: Best practices
    throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved
    05/31/2012, from
    http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.
Data Observation Network for Earth (DataONE). (2012). DataONE Best
   Practices database. Retrieved 07/21/12, from
   http://www.dataone.org/best-practices.
Questions? Discussion?

• Sherry Lake
  Senior Scientific Data Consultant, UVA Library
• shlake@virginia.edu
• Twitter: shlakeuva
• Slideshare: http://www.slideshare.net/shlake
• Web: http://www.lib.virginia.edu/brown/data




                                                   23

More Related Content

PDF
Three Big Data Case Studies
ODP
Big Data Presentation
PPT
Data Quality
PDF
The importance of data
PPTX
Information Management
PPTX
Introduction to data management
PPTX
Data Quality & Data Governance
PPT
Data extraction, cleanup & transformation tools 29.1.16
Three Big Data Case Studies
Big Data Presentation
Data Quality
The importance of data
Information Management
Introduction to data management
Data Quality & Data Governance
Data extraction, cleanup & transformation tools 29.1.16

What's hot (20)

PDF
Data Governance Strategies - With Great Power Comes Great Accountability
PPTX
final oracle presentation
PDF
DAS Slides: Data Quality Best Practices
PDF
Data Modeling & Metadata Management
PPTX
The importance of data
PPT
Datawarehousing
PPTX
Big Data - The 5 Vs Everyone Must Know
PDF
Implementing the Data Maturity Model (DMM)
PDF
Introduction to Data Governance
PPT
Data collection & management
PPTX
Data Governance That Drives the Bottom Line
PDF
Building a Data Governance Strategy
PPT
Gr. 8, Information System and Its Types
PPTX
Enterprise Data Management
PDF
Data Product Architectures
PPTX
Data governance
PDF
Gartner Study Current State Assessment
PPTX
Introduction of Data Science and Data Analytics
PPTX
Introduction to Data Visualization
PPTX
Data Quality Presentation
Data Governance Strategies - With Great Power Comes Great Accountability
final oracle presentation
DAS Slides: Data Quality Best Practices
Data Modeling & Metadata Management
The importance of data
Datawarehousing
Big Data - The 5 Vs Everyone Must Know
Implementing the Data Maturity Model (DMM)
Introduction to Data Governance
Data collection & management
Data Governance That Drives the Bottom Line
Building a Data Governance Strategy
Gr. 8, Information System and Its Types
Enterprise Data Management
Data Product Architectures
Data governance
Gartner Study Current State Assessment
Introduction of Data Science and Data Analytics
Introduction to Data Visualization
Data Quality Presentation
Ad

Similar to Best practices data collection (20)

PPTX
Best practices data management
PPTX
EDI Training Module 5: Creating Clean Data foro Publishing
PDF
Data Management Lab: Session 3 Data Entry Best Practices
PDF
Bren - UCSB - Spooky spreadsheets
PPTX
Introduction to Data Management
PPT
ManagingOrganizingData_ReusableSlides.ppt
PPTX
Good Practice in Research Data Management
PDF
Data Management Lab: Session 2 slides
PPTX
Data Management Crash Course
PDF
Escaping Datageddon
PPTX
Managing the research life cycle
PPTX
Responsible Conduct of Research: Data Management
PDF
Data Matters for AGU Early Career Conference
PDF
Best Practice in Data Management and Sharing
PDF
Data Management Lab: Session 3 Slides
PDF
Good (enough) research data management practices
PPTX
Data Management 101
PDF
Coping with Data for WHOI JP Students
PDF
Data Stewardship for SPATIAL/IsoCamp 2014
PPTX
Good data practices for graduate students
Best practices data management
EDI Training Module 5: Creating Clean Data foro Publishing
Data Management Lab: Session 3 Data Entry Best Practices
Bren - UCSB - Spooky spreadsheets
Introduction to Data Management
ManagingOrganizingData_ReusableSlides.ppt
Good Practice in Research Data Management
Data Management Lab: Session 2 slides
Data Management Crash Course
Escaping Datageddon
Managing the research life cycle
Responsible Conduct of Research: Data Management
Data Matters for AGU Early Career Conference
Best Practice in Data Management and Sharing
Data Management Lab: Session 3 Slides
Good (enough) research data management practices
Data Management 101
Coping with Data for WHOI JP Students
Data Stewardship for SPATIAL/IsoCamp 2014
Good data practices for graduate students
Ad

More from Sherry Lake (20)

PPTX
Planning for Libra Data
PPTX
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
PDF
Using a Case Study to Teach Data Management to Librarians
PPTX
Documentation and Metdata - VA DM Bootcamp
PPTX
Creating dmp
PDF
DMTool-ASERL-Webinar
PDF
DMPTool Workshop University of Georgia
PDF
Federal funder mandates
PDF
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
PPTX
Data Management Planning for Engineers
PPTX
DMPTool Webinar Environmental Scan
PPTX
Lake dmp tool_i_conference
PPTX
Lake us-canada policesupdate
PPTX
Why managedata
PPTX
Re tooling for data management-support
PPTX
Web links
PPTX
Dmp tool presentation
PPTX
Funder requirements for Data Management Plans
PPTX
Library support for life cycle
PPTX
Environmental scan - Keeping Updated
Planning for Libra Data
Virginia Data Management Bootcamp: Building the Research Data Community of Pr...
Using a Case Study to Teach Data Management to Librarians
Documentation and Metdata - VA DM Bootcamp
Creating dmp
DMTool-ASERL-Webinar
DMPTool Workshop University of Georgia
Federal funder mandates
DMPTool2 demo for DMPTool-DMPonline Workshop IDCC 2014
Data Management Planning for Engineers
DMPTool Webinar Environmental Scan
Lake dmp tool_i_conference
Lake us-canada policesupdate
Why managedata
Re tooling for data management-support
Web links
Dmp tool presentation
Funder requirements for Data Management Plans
Library support for life cycle
Environmental scan - Keeping Updated

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Complications of Minimal Access Surgery at WLH
PDF
Computing-Curriculum for Schools in Ghana
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
Classroom Observation Tools for Teachers
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PDF
Insiders guide to clinical Medicine.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Types and Its function , kingdom of life
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Complications of Minimal Access Surgery at WLH
Computing-Curriculum for Schools in Ghana
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharma ospi slides which help in ospi learning
Classroom Observation Tools for Teachers
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pre independence Education in Inndia.pdf
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
Insiders guide to clinical Medicine.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
VCE English Exam - Section C Student Revision Booklet
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Types and Its function , kingdom of life

Best practices data collection

  • 1. Best Practices Creating Research Data Sherry Lake July 31, 2012 University of Florida Data Management Workshop
  • 2. WHY? Following these Best Practices……. • Will improve the usability of the data by you or by others • Your data will be “computer ready” • Your data will be ready to share with others
  • 5. Problems • Dates are not stored consistently • Values are labeled inconsistently • Data coding is inconsistent • Order of values are different
  • 6. Problems • Confusion between numbers and text • Different types of data are stored in the same columns • The spreadsheet loses interpretability if it is sorted
  • 7. Best Practices Data Organization • Lines or rows of data should be complete – Designed to be machine readable, not human readable (sort)
  • 8. Best Practices Data Organization • Include a Header Line 1st line (or record) • Label each Column with a short but descriptive name – Names should be unique – Use letters, numbers, or “_” (underscore) – Do not include blank spaces or symbols (+ - & ^ *)
  • 9. Best Practices Data Organization • Columns of data should be consistent – Use the same naming convention for text data • Columns should include only a single kind of data – Text or “string” data – Integer numbers – Floating point or real numbers
  • 10. Use Standardized Formats • ISO 8601 Standard for Date and Time – YYYYMMDDThh:mmss.sTZD 20091013T09:1234.9Z 20091013T09:1234.9+05:00 • Spatial Coordinates for Latitute/Longitude – +/- DD.DDDDD -78.476 (longitude) +38.029 (latitude)
  • 12. File Names • Use descriptive names • Not too long • Don’t use spaces • Try to include time, place & theme • May use “-” or “_”
  • 13. File Names • String words together with Caps (VegBiodiv_2007) • Think about using version numbers • Don’t change default extensions (txt, jpg, csv,…)
  • 14. Quantitative Assurance/Control Dataset Creation & Integrity Errors • Use a data entry program – Program to catch typing errors – Program pull-down menu options • Perform double entry of the data • Manually check 5 – 10% of data records • Check for out-of-range values (plotting) • Check for missing or impossible values • Perform statistical summaries (random samples)
  • 15. Analyzing Data - Notes • Keep Original File – Uncorrected copy – Make “read-only” • Make notes on transformations • Any changes, save as a new file • Use scripted code to transform and correct data
  • 16. Analyzing Data • Use a scripted program (R, SAS, SPSS, Matlab) – Steps are recorded in textual format – Can be easily revised and re-executed – Helps sharing and repetition – Easy to document • GUI-bases analysis may be easier, but harder to reproduce
  • 17. Document EVERYTHING! • Create a Project Document File – More than a Lab Notebook – Data Management Plan • Start at the beginning of the project and continue throughout data collection & analysis – Why you are collecting data – Exact details of methods of collecting & analyzing
  • 18. Document EVERYTHING! • Details such as: – Names of data & analysis files associated with study – Definitions for data and codes (include missing value codes, names) example – Units of measure (accuracy and precision) – Standards or instrument calibrations
  • 19. Choosing File Formats • Accessible Data (in the future) – Non-proprietary (software formats) – Open, documented standard – Common, used by the research community – Standard representation (ASCII, Unicode) – Unencrypted & Uncompressed – Media formats (hardware formats)
  • 20. Preferred Format Choices • PDF, not Word • ASCII, not Excel • MPEG-4, not Quicktime • TIFF or JPEG2000, not GIF or JPG • XML or RDF, not RDBMS Good if not software specific
  • 21. Best Practices 1. Use Consistent Data Organization 2. Use Standardized Formats 3. Assign Descriptive File Names 4. Perform Basic Quality Assurance/ Quality Control 5. Use Scripted Program for Analysis and Keep Notes 6. Document EVERYTHING! (Define Contents of Data Files ) 7. Use Consistent, Stable and Open File Formats
  • 22. Best Practices Bibliography Borer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214. Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010. Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf. Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.
  • 23. Questions? Discussion? • Sherry Lake Senior Scientific Data Consultant, UVA Library • shlake@virginia.edu • Twitter: shlakeuva • Slideshare: http://www.slideshare.net/shlake • Web: http://www.lib.virginia.edu/brown/data 23

Editor's Notes

  • #2: Have you ever collected data and had trouble remembering what you did at the start?Tried to share your data with someone and they (or you) couldn’t understand itUsing “Best Practices” when you collect and record your data will improve future usability and may save time.Preparing your data using these “Best Practices”Following these best practices (guidelines) will help you Following these best practices will improve the usability of the data by you or by others … use it with other data.
  • #4: Spreadsheets are widely used for simple analyses They are easy to use BUT They allow (encourage) users to structure data in ways that are hard to use with other softwareYou can use them like Word, with columns. These spreadsheets (in this format) are good for “human” interpretation, not computers – and since you probably will need either Write a program or use a software package, then the “human” format is not best.These formats are good for presenting your findings such as publishing…. But it will be harder to use with other software later on (if you need to do any analysis).It is betterto store the data in ways that it can be used in automated ways, with minimal human intervention
  • #5: This is some well data measurements, where a salinity meter was used to measure the salinity (top and bottom) and the conductivity (Top & bottom)Take a look at this spreadsheet… What’s wrong with it?Could this be easily automated? Sorted?
  • #6: Dates are not stored consistentlySometimes date is stored with a label (e.g., “Date:5/23/2005”) sometimes in its own cell (10/2/2005)Values are labeled inconsistentlySometimes “Conductivity Top” others “conductivity_top”For Salinity sometimes two cells are used for top and bottom, in others they are combined in one cellData coding is inconsistentSometimes YSI_Model_30, sometimes “YSI Model 30”---- sort of can’t tell if it’s a “label” or a data valueTide State is sometimes a text description, sometimes a numberThe order of values in the “mini-table” for a given sampling date are different“Meter Type” comes first in the 5/23 table and second in the 10/2 table
  • #7: Confusion between numbers and textFor most software 39% or <30 are considered TEXT not numbers (what is the average of 349 and <30?)Different types of data are stored in the same columnsMany software products require that a single column contain either TEXT or NUMBERS (but not both!)The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Not sure why you would sort this.
  • #8: The spreadsheet loses interpretability if it is sortedDates are related to a set of attributes only by their position in the file. Once sorted that relationship is lost.Look what happens when we sort this….Look at the difference in this one… sort it..https://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdEZ2NzRhUWFLYy1nM2FMcDhaNGRVeWchttps://docs.google.com/spreadsheet/ccc?key=0Att-cHR6O7gCdHpTMC1kdWREbTNlanBwM3J5WVE3ZFE
  • #9: Standard convention for many software programs (usually a “check” yes,no) is for the 1st line (record) to be a header line… lists the names of variables in the file. Rest of records (lines) are data.Not too long some software programs may not work with long variable names
  • #10: We’ve seen that a spreadsheet or word processor can create datasets that can only be interpreted by human interventionThe “ugly spreadsheet” example would be hard to analyze even in a spreadsheet, except with lots case-by-case human decisionsBut what are some principles that characterize good archival data?Keep in mind that good data formats for data and sharing may not be the ones you prefer for viewing or analysis!Same naming convention for text data – use a vocabulary, keep same… “slack-high”…. Not “slack high”
  • #11: There are already standards for certain types of data (like date/time, spatial coords). Use them, don’t invent your own.Can you think of others?(am/pm NOT allowed) T appears literally in the string. Min. for date is YYYY.YYYY = four-digit yearMM = two-digit month (01=January, etc.) DD = two-digit day of month (01 through 31)hh = two digits of hour (00 through 23)mm = two digits of minute (00 through 59)ss = two digits of second (00 through 59) s = one or more digits representing a decimal fraction of a second TZD = time zone designator (Z or +hh:mm or -hh:mm) Vs. DMS degree minutes seconds important when data field could have more than one type of unit.
  • #12: Guidelines for filenames will only help you with your files/research. Once they are “archived” they will get new names that fit with the systems, usually a permanent name based on computer “locating” the file.Look at the file names……Context.txt, DataFile1.txt, DataFile2.txt, word6doc.zipLong ones….Safari, Ray… good date, placeNote “_” and “-” Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  • #13: File names easiest way to indicate the contents of the file, use terse but indicative of their content. Want to uniquely id the data file.Don’t’ make them too long, some scripting programs have a filename limit for file importing (reading)Don’t use blanks, some software may not be able to read file names with blanks.Think about how the name will look in a directory with lots of other files, want to be able to “pick it out”.
  • #14: Maybe use version numbers…. Don’t forget the extension (3 char.) used to tell the file type
  • #15: Data Quality control takes place at various stages during data collection, data entry, and data checking. The quality of the collection methods has direct correlation to the quality of the data.Quality of data collection methods used has a significant bearing on data quality.Quality includes: equipment calibration (use instrument calibration to check precision) allows other researchers to look at your data and compare to theirs need to validate transcriptionTrain coders (different people doing this) – create handbook.Can create (program) data entry interfaces and verify data entry, use lists to choose fromVerification: out-of range values, random samples, double checking entriesMinimize manual entryVisual Basic can create forms for Excel. Access form creationRandom sample of dataConsistency checkseach record is keyed in and then re-keyed against the original. Several standard packages offer this feature. In the re-entry process, the program catches discrepancies immediately. Start before data collection, define standards – document in handbook
  • #16: Don’t want to change something (or delete something) that could be important later.If use a scripted language you could re-run analyses
  • #17: Analysis “scripted” software: R, SAS, SPSS, MatlabAnalysis scripts are written records of the various steps involved in processing and analyzing data (sort of “analytical metadata”).Easily revised and re-executed at any time if needs to modify analysisVS. GUI (easier) but does not leave a clear accounting of exactly what you have doneDocument scripted code with comments on why data is being changed.
  • #18: Important to repeat!!!!More documentation: Documentation can also be called metadataDescription of the data file names (especially if using acronyms and abbreviations).Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  • #19: Can also be called metadataDescription of the data file names (especially if using acronyms and abbrevs.Record why you are collecting data, Details of methods of analysisNames of all data and analysis filesDefinitions for data (include coding keys)Missing value codesUnit of measures.Calibrations so others can compare their results with yours.Structured metadata (XML) format standards for discipline (Ecological Metadata language – EML)
  • #20: Spreadsheets are widely used for simple analysesBut they have poor archival qualities Different versions over time are not compatibleFormulas are hard to capture or displayPlan what type of data you will be collecting. Want to choose a file format that can be read well into the future and is independent of software changes.These are formats more likely to be accessible in the future. to replace old media, maintaining devices that can still read the proprietary formats or media typeFormat of the file is a major factor in the ability to use the data in the future. As technology changes, plan for software and hardware obsolescence. System files (SAS, SPSS) are compact and efficient, but not very portable. Use software to “export” data to a portable (or transport) file. Convert proprietary formats to non-proprietary. Check for data errors in conversion.
  • #21: Examples of preferred format choicesFormats for long-term digital preservation (open). Don’t expect you (won’t have time) or the archive to be able to convert older formats to new one.
  • #22: Remember create spreadsheet so it can be automated2. Date/Time standards, Geospatial coords, Species, other standards from discipline3. Descriptive File Names – File names can help id what’ inside 4. Quality Assurance – when planning on data entry can “program” data checks in forms (Access and Excel), create pick lists (codes), missing data values5. Make it easier to replicate data transformation, can be documented6. Document EVERYTHING, dataset details, database details, collection notes – conditions, You will not remember everything 20 years from now! What someone would need to know about your data to use it.7. Stable File Formats – easier if all files were same format, also knowing what formats are better in the long-term