Practical Data Management
ACRL DCIG Webinar
30 April 2014
Kristin Briney, PhD
andrius.v, https://www.flickr.com/photos/banditaz/6823875954 (CC BY-NC-SA)
Mr.TinDC, https://www.flickr.com/photos/mr_t_in_dc/5940438148 (CC BY-ND)
International Institute of Tropical Agriculture, https://www.flickr.com/photos/iita-media-library/8160877379 (CC BY-NC)
Musgo Dumio_Momio, https://www.flickr.com/photos/30976576@N07/2903662286 (CC BY-NC-SA)
Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
Data Management Basics
• Introduction to a few topics in data
management
– File organization and naming
– Documentation
– Storage and backups
– Future file usability
Data Management Basics
• Introduction to a few topics in data
management
– File organization and naming
– Documentation
– Storage and backups
– Future file usability
 Teach & Use
For each minute of planning at
beginning of a project, you will save
10 minutes of headache later
FILE ORGANIZATION & NAMING
Dan Zen, http://www.flickr.com/photos/danzen/5551831155/ (CC BY)
File Organization
• What?
– Keeping your files in order
File Organization
• Why?
– Easier to find and use data
– Tell, at a glance, what is done and what you have
yet to do
– Can still find and use files in the future
File Organization
• When?
– Always!
– Get in the habit of putting files in the right place
File Organization
• How?
– Any system is better than none
– Make your system logical for your data
• 80/20 Rule
– Possibilities
• By project
• By analysis type
• By date
• …
Example
• Thesis
– By chapter
• By file type (draft, figure, table, etc.)
• Data
– By researcher
• By analysis type
– By date
File Naming Conventions
• What?
– Consistent naming for files
http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-
in-coding/
File Naming Conventions
• Why?
– Make it easier to find files
– Avoid duplicates
– Make it easier to wrap up a project because you
know which files belong to it
File Naming Conventions
• When?
– For a group of related files (3 to 1000+)
– May need different conventions for different
groups
File Naming Conventions
• How?
– Pick what is most important for your name
• Date
• Site
• Analysis
• Sample
• Short description
File Naming Conventions
• How?
– Files should be named consistently
– Files names should be descriptive but short (<25
characters)
– Use underscores instead of spaces
– Avoid these characters: “ /  : * ? ‘ < > [ ] & $
– Use the dating convention: YYYY-MM-DD
Example
• YYYYMMDD_site_sampleNum
– 20140422_PikeLake_03
– 20140424_EastLake_12
• Analysis-sample-concentration
– UVVis-stilbene-10mM
– IR-benzene-pure
DOCUMENTATION
Brady, https://www.flickr.com/photos/freddyfromutah/4424199420 (CC BY)
What would someone unfamiliar
with your data need in order to find,
evaluate, understand, and reuse
them?
Documentation
• Why?
– Data without notes are unusable
– Because you won’t remember everything
– For others who may need to use your files
Documentation
• When?
– Always
– Documentation needs will vary between files
Documentation
• How?
– Take good notes
– Metadata schemas
• http://www.dcc.ac.uk/resources/metadata-standards
Documentation
• How?
– Methods
• Protocols
• Code
• Survey
• Codebook
• Data dictionary
• Anything that lets someone reproduce your results
Documentation
• How?
– Templates
• Like structured metadata but easier
• Decide on a list of information before you collect data
– Make sure you record all necessary details
– Takes a few minutes upfront, easy to use later
• Print and post in prominent place or use as worksheet
Example
• I need to collect:
– Date
– Experiment
– Scan number
– Powers
– Wavelengths
– Concentration (or sample weight)
– Calibration factors, like timing and beam size
Documentation
• How?
– README.txt
• For digital information, address the questions
– “What the heck am I looking at?”
– “Where do I find X?”
• Use for project description in main folder
• Use to document conventions
• Use where ever you need extra clarity
Example
• Project-wide README.txt
– Basic project information
• Title
• Contributors
• Grant info
• etc.
– Contact information for at least one person
– All locations where data live, including backups
Example
“Talk_v1: rough outline of talk
Talk_v2: draft of talk
Talk_v3: updated 2014-01-15 after feedback”
“ ‘Data’ folder contains all raw data files by date
‘Analysis’ has analyzed data and plots
‘Paper’ has drafts of article on this work”
grover_net, http://www.flickr.com/photos/9246159@N06/599820538/ (CC BY-ND)
STORAGE AND BACKUPS
Storage
• Why?
– Need good storage practices to prevent loss
– Keep data secure
Storage
• How?
– Library motto: Lots of Copies Keeps Stuff Safe!
– Rule of 3: 2 onsite, 1 offsite
Storage
• How?
– Computer
– External hard drive
– Shared drives/servers
– Tape backup
– Cloud storage*
– CDs/DVDs
– USB flash drive
Erica Wheelan, https://www.flickr.com/photos/reinventedwheel/5985479866 (CC BY)
*Cloud Storage
• Read the Terms of Service!
• Eg. Google Drive
– “When you upload or otherwise submit content to our Services,
you give Google (and those we work with) a worldwide license
to use, host, store, reproduce, modify, create derivative works
(such as those resulting from translations, adaptations or other
changes we make so that your content works better with our
Services), communicate, publish, publicly perform, publicly
display and distribute such content. The rights you grant in this
license are for the limited purpose of operating, promoting, and
improving our Services, and to develop new ones”
Backups
http://toystory.disney.com/
Backups
• How?
– Any backup is better than none
– Automatic backup is better than manual
– Your work is only as safe as your backup plan
Backups
• How?
– Check your backups
• Backups only as good as ability to recover data
• Test your backups periodically
– Preferably a fixed schedule
– 1 or 2 times a year may be enough
– Bigger/more complex backups should be checked more often
• Test your backup whenever you change things
Example
• I keep my data
– On my computer
– Backed up manually on shared drive
• I set a weekly reminder to do this
– Backed up automatically via SpiderOak cloud
storage
FUTURE FILE USABILITY
Ian, http://www.flickr.com/photos/ian-s/2152798588/ (CC BY-NC-ND)
Future File Usability
• What?
– Can you read your files from 10 years ago?
– Data needs to be
• Accessible
• Interpretable
• Readable
lukasbenc, https://www.flickr.com/photos/lukasbenc/3493808772 (CC BY-NC-SA)
Future File Usability
• Why?
– You may want to use the data in 5 years
– PI sometimes keeps data and notes
– Prep for data sharing
– Per OMB Circular A-110, must retain data at least
3 years post-project
• Better to retain for >6 years
Future File Usability
• When?
– When you wrap up a project
– (As you work on a project)
Future File Usability
• How?
– Back up written notes
• People always forget this one
• Difficult to interpret data without notes
• Options
– Digitally scan (recommended with digital data)
– Photocopies
Future File Usability
• How?
– Convert file formats
• Can you open digital files from 10 years ago?
• Use open, non-proprietary formats that are in wide use
– .docx  .txt
– .xlsx  .csv
– .jpg  .tif
• Save a copy in the old format, just in case
• Preserve software if no open file format
Future File Usability
• How?
– Move to new media
• Hardware dies and becomes obsolete
– Floppy disks!
• Expect average lifetime to be 3-5 years
• Keep up with technology
WHERE TO GO FROM HERE
Center for Teaching Vanderbilt University, https://www.flickr.com/photos/vandycft/8244800868 (CC BY-NC)
easylocum, https://www.flickr.com/photos/easylocum/2921542814 (CC BY)
Chris Hoving, https://www.flickr.com/photos/pcrucifer/2433274595 (CC BY-ND)
Resources
• Data Ab Initio blog
– http://dataabinitio.com/
• eScience Portal
– http://esciencelibrary.umassmed.edu/
• DataONE Best Practices
– http://www.dataone.org/best-practices
Steal My Slides
• Slides + recording available
– http://connect.ala.org/node/220603
• Slides available
– http://www.slideshare.net/kbriney
Thank You!
• This presentation available under a Creative
Commons Attribution (CC-BY) license
• Some content courtesy of Dorothea Salo
– http://www.graduateschool.uwm.edu/research/resear
cher-central/proposal-development/data-plan/boot-
camp/ (CC BY)

More Related Content

PPTX
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
PPTX
Creating a Data Management Plan
PPTX
Breaking the Data Management Barrier
PPTX
Data Management 101 (2015)
PPTX
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
PPTX
NCURA Webinar on Open Data
PPT
Data management plans
PPTX
Organizing Your Research Data
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
Creating a Data Management Plan
Breaking the Data Management Barrier
Data Management 101 (2015)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
NCURA Webinar on Open Data
Data management plans
Organizing Your Research Data

What's hot (20)

PPT
Digital Destiny
PPT
Ownership, intellectual property, and governance considerations for academic ...
PDF
Research Data Management and Sharing for the Social Sciences and Humanities
PPTX
Responsible Conduct of Research: Data Management
PPT
Data Management for Undergraduate Researchers (updated - 02/2016)
PPTX
How and Why to Share Your Data
PPTX
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
PPTX
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
PPTX
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
PPTX
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
PPTX
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
PPTX
Writing a successful data management plan with the DMPTool
PPTX
Ala cspace aspace rep services demo 2015
PDF
Preventing data loss
PPTX
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
PPT
Data Management for Undergraduate Research
PDF
Escaping Datageddon
PPTX
Introduction to data management
PPTX
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
PPTX
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Digital Destiny
Ownership, intellectual property, and governance considerations for academic ...
Research Data Management and Sharing for the Social Sciences and Humanities
Responsible Conduct of Research: Data Management
Data Management for Undergraduate Researchers (updated - 02/2016)
How and Why to Share Your Data
Preparing Your Research Material for the Future - 2017-02-22 - Humanities Div...
Preparing Your Research Material for the Future - 2018-06-08 - Humanities Div...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Preparing Your Research Material for the Future - 2016-11-16 - Humanities Div...
Writing a successful data management plan with the DMPTool
Ala cspace aspace rep services demo 2015
Preventing data loss
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Data Management for Undergraduate Research
Escaping Datageddon
Introduction to data management
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Ad

Similar to Practical Data Management - ACRL DCIG Webinar (20)

PPTX
Data Management 101
PPTX
Data Management 101
PPTX
Data Management Crash Course
PPTX
Data management for TA's
PPTX
Good Practice in Research Data Management
PPTX
Research Data Management Fundamentals for MSU Engineering Students
PDF
Data Storage & Preservation
PDF
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
PDF
Data Management Lab: Session 2 slides
PPTX
DataManagement_EMPSL_2014Fall for Files and Data
PPTX
Managing Your Research Data
PDF
Practical Best Practices for Data Management
PPTX
Best practices data management
PPTX
Introduction to Data Management
PPTX
Data Life Cycle
PPTX
Data managementbasics issr_20130301
PPTX
Impact Metrics, Data & You
PPTX
Research Data Curation _ Grad Humanities Class
PPTX
Disk Image!...and then what? Strategies for sustainable long-term storage an...
PPTX
Data Management and Horizon 2020
Data Management 101
Data Management 101
Data Management Crash Course
Data management for TA's
Good Practice in Research Data Management
Research Data Management Fundamentals for MSU Engineering Students
Data Storage & Preservation
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
Data Management Lab: Session 2 slides
DataManagement_EMPSL_2014Fall for Files and Data
Managing Your Research Data
Practical Best Practices for Data Management
Best practices data management
Introduction to Data Management
Data Life Cycle
Data managementbasics issr_20130301
Impact Metrics, Data & You
Research Data Curation _ Grad Humanities Class
Disk Image!...and then what? Strategies for sustainable long-term storage an...
Data Management and Horizon 2020
Ad

More from Kristin Briney (15)

PPTX
Internet Privacy
PDF
Leveling Up Data Management
PPTX
Twitter For Academics
PPTX
TEDxUWMilwaukee: Rethinking Research Data
PPTX
Measuring Research Impact
PPTX
Retaining Your Old Research Data
PPTX
Documenting Your Research Data
PPTX
Storing Your Research Data
PPTX
Research Data & Digital Preservation - CUWL Conference 2014
PPTX
Electronic Laboratory Notebooks
PDF
Data Management Tips Handout
PDF
Data Management Plan Checklist
PPTX
Data Services
PPTX
Electronic Lab Notebooks
PPTX
Lab Notebooks: A Librarian's Primer
Internet Privacy
Leveling Up Data Management
Twitter For Academics
TEDxUWMilwaukee: Rethinking Research Data
Measuring Research Impact
Retaining Your Old Research Data
Documenting Your Research Data
Storing Your Research Data
Research Data & Digital Preservation - CUWL Conference 2014
Electronic Laboratory Notebooks
Data Management Tips Handout
Data Management Plan Checklist
Data Services
Electronic Lab Notebooks
Lab Notebooks: A Librarian's Primer

Recently uploaded (20)

PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Introduction to Data Science and Data Analysis
PPTX
Steganography Project Steganography Project .pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPT
Predictive modeling basics in data cleaning process
PPTX
New ISO 27001_2022 standard and the changes
PDF
Introduction to the R Programming Language
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Leprosy and NLEP programme community medicine
DOCX
Factor Analysis Word Document Presentation
PDF
Microsoft Core Cloud Services powerpoint
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
CYBER SECURITY the Next Warefare Tactics
Navigating the Thai Supplements Landscape.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Introduction to Data Science and Data Analysis
Steganography Project Steganography Project .pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Topic 5 Presentation 5 Lesson 5 Corporate Fin
[EN] Industrial Machine Downtime Prediction
retention in jsjsksksksnbsndjddjdnFPD.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Predictive modeling basics in data cleaning process
New ISO 27001_2022 standard and the changes
Introduction to the R Programming Language
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
SAP 2 completion done . PRESENTATION.pptx
Leprosy and NLEP programme community medicine
Factor Analysis Word Document Presentation
Microsoft Core Cloud Services powerpoint
DU, AIS, Big Data and Data Analytics.ppt
CYBER SECURITY the Next Warefare Tactics

Practical Data Management - ACRL DCIG Webinar

Editor's Notes

  • #3: I’m excited to be speaking today about practical data management because it is a topic near and dear to me. 5 years ago I worked in a place like this, when I was a chemistry researcher doing laser spectroscopy. My favorite part was working with my data, but it was also one of the more frustrating aspects of being a researcher. I had no training in data management, so I made things up (not always successfully). I also spent a year reproducing another person’s results and nothing shows just how inadequate most data practices are quite like working with someone else’s data.
  • #4: Now, I focus on helping researchers with their data management at my current place of work, the University of Wisconsin-Milwaukee. This webinar, in fact, is based on the workshop I teach to my users.
  • #5: But data management is not just for researchers. Librarians need to know these skills, in particular - those who want to curate research datasets. I’m really glad to be doing an ACRL DCIG (digital curation interest group) webinar because I think there is a strong correlation between data management and data curation.
  • #6: The connection between data management and data curation was apparent at the recent Research Data Access and Preservation conference during the panel on “learning to curate”. This slide from the Emory group sums up the issue nicely in that the major challenges with curating research datasets are not preservation issues but rather data management issues. So if we want to easily curate research datasets, we need to work with researchers on data management so that data comes to us in a form that can be easily curated. Plus, data management is a skill that most researchers need, allowing us to provide a direct benefit to researchers while furthering our curation goals.
  • #17: Consistent and correct naming schemes are important, as evidenced by this recent retraction for “error in coding”. Mislabelling meant that the analysis was done on the wrong samples, affecting the results of the paper. So naming is very important.
  • #39: So many [lack of] backup horror stories. Toy Story 2 has one of the best ones. See video: https://www.youtube.com/watch?v=EL_g0tyaIeE&amp;feature=player_detailpage
  • #45: This one has affected me personally because I no longer have access to my PhD data, even though it is &lt;5 years old. The reason is that my files are locked up in a proprietary format and I don’t have access to the necessary software after I left the lab. If I had done a little work ahead of time, I wouldn’t be in this position.
  • #52: I encourage you to teach and share these data management strategies with your users. My slide are available under a CC-BY license, so feel free to modify and reuse.
  • #53: Also, dive into these practices for yourself. They will help you manage your own data.
  • #54: Remember that good data management is the accumulation of many small practices. The best way to improve your practices is to make one small change at a time. Any small improvement makes it easier to work with your data. I challenge you to take one of the practices outlined in this talk and adopt it to improve your digital file practices.