Notes for Inroads into Data

Lecture

Slide 1 Good afternoon. My name is Margaret Henderson and I was pleased to be
asked to speak with you today about getting involved in data at your institution. If you
want to learn more about data, be sure to review the webinars and other learning
opportunities available through the Southeastern Atlantic region and the and the other
National Network of Libraries of Medicine regions, and the Medical Library Association.
In fact, the resources slide at the end of this deck has a link to resource list created by
Abigail Goben and Rebecca Raszewski for a data webinar that took place on Monday in
the Greater Midwest Region.

Slide 2
I’d like to take this quote as my starting point, since I agree with what the great medical
librarian Lucretia McClure said:

“I believe that knowledge rather than the format or container should drive our work.”
Lucretia McClure, 1997

Today I hope to show you that you are probably more ready to work with data than you
think. If we apply our skills to working with data as knowledge, we can easily add data
services to our menu of library services.

Case 1
As I show you the case studies on the next few slides, I’ll give you a few seconds write
down what you think is a problem, sometimes there will be more than one, and what
could be done to avoid or fix the problem. Later, after I go over the various skills
needed, we’ll revisit these cases and talk about how to help.

In this first case, a researcher can’t share their data because they don’t remember
which files contain the data.

Case 2
A research group has had 2 papers retracted because there were statistical issues and
the original data can’t be found to back up the papers. The first paper shown here is
from 2013, so not that long ago.

Case 3
This could also have been a problem with poor writing, but either way, the data is
inaccessible.

Case 4
This poor student had to settle for a Master’s degree because he lost almost 4 years of
work when his computer was stolen.

Datathesaurus
I’ve found that many researchers don’t realize that what they collect is data. I’m sure
there are many healthcare professionals who think the same way. So it is a good idea to
figure out, What is Data? before you start talking with them.

What is Data?
There are lots of things that can be considered data. I’m sure you can think of many
more than this list. Merriam Webster dictionary defines data as: “factual information (as
measurements or statistics) used as a basis for reasoning, discussion, or calculation”
There are arguments about whether data is singular or plural, but we’ll leave those
discussions for another day.
Of course we can’t talk about data without mentioning big data. Big data is what
Amazon and Google use to figure out our preferences based on past sales or searches.
And of course, Electronic Health Records in the Enterprise Data Warehouses hospitals
are setting up fall into this category as well. The usual definitions of big data say that it
must have the 5 Vs Variety, Volume, Veracity, Velocity and Value

Big Data
But I prefer this definition. I was at this talk where Donald Brown said used this definition
because he wanted all faculty to feel that they were stakeholders in the new institute
that was being created at UVa. I should also point out that Nate Silver, the statistician
who had so much success predicting the outcome of the 2008 Presidential election,
said in a talk last month that big data is starting to peak, and pointed out that “Just
collecting more data can get you more ways to fool yourself.”

You are Here
First, it is a good idea to figure out where your organization stands with regards to data.
I recommend doing and environmental scan.

Reference
But before that, I want to make a plug for my favourite librarian skill the reference
interview. I attended the 2013 Association of Southeastern Research Libraries
Summertime Summit on open access and data management. Sayeed Choudhury from
Johns Hopkins, where the have all the Sloan Sky Survey data, was the opening keynote

speaker. He talked about setting up the research data management program at the
Sheridan Libraries . My favourite  part of his talk was when he mentioned that the
“reference interview” was still needed with research data.  He said that there were times
when a researcher started out thinking one thing about their data and ended up in a
totally different place once they had an interview with him.  Since I think the skill of
interviewing is one of the great superpowers of librarians, I couldn’t agree more. My
reference professor in library school, Catherine Ross, was excellent, I learned all about
open, closed, and neutral questioning, active listening, and sensemaking research. I
have talked with some librarians who did not have the advantage of such a wonderful
professor, so I recommend the book, Conducting the Reference Interview: A
howtodoit Manual for Librarians, which was cowritten by Catherine Ross, Kirsti
Nilsen, and Marie Radford.

Environmental Scan
When you read about environmental scanning, it is usually done by a business or
organization to assess the outside world. In this case, you want to assess the
environment outside your library. Any sort of assessment structure is fine if you want to
get formal, but reviewing the departments and services of your organization who have
anything to do with data is a start. I did a SWOT analysis as part of the Duraspace
eScience Institute when I started the research data management services here at VCU,
and it was helpful to be forced to think about weaknesses, opportunities, and threats
when assessing the data landscape here.

Potential Departments
Armed with good reference interview skills, you should be able to interview the people
connected with the various resources as you do your environmental scan so you can
learn about what they do, how they use data, what services they can provide to others,
anything that will help you get a handle on what they are doing.
You’ll probably find there are departments that deal with administrative and financial
data, and they might even have analysts who help them understand and use that data.
Since this is a National Network of Libraries of Medicine regional webinar, I’m assuming
you have patient data of all sorts health and administrative scattered around in multiple
departments of the hospital. If you are at an educational institute, there is student data,
tracking courses and marks as well as financial records. In all cases, human resources
will  have personnel data that includes qualifications and licenses to practice. Find out
where and what data exists, and look for the departments and people who are already
caring for and using that data.

Stakeholders
Most of the departments and people you learn about will be stakeholders you need to
consider when thinking about starting data services. They may affect, be affected by, or
perceive themselves to be affected by some aspect of data at your institution. Some of
them will be potential collaborators so reassure them that you are just exploring and you
want to fill in the gaps, if there are any, and act as a referral service. Other stakeholders
could be groups that don’t have anything to do with data but can help you, such as
administration, or work with you, maybe other librarians at your library. Make note of all
these different groups and keep them in the loop as you start to think about providing
data services.

DCC Data Life Cycle
Usually, when people start to talk about data services they pull out a data life cycle
model of some sort. It does make it easier when you can visualize things. There are
many data life cycles out there, but this one is very good. The Digital Curation Center
support data in higher education in the UK and it is an amazing resource. But, when you
are just starting out, this model is intimidating, and, it takes time and many people to
create a service that can take care of all aspects of this life cycle. Rather than starting
with this model, I suggest thinking of a simplified model.

Simplified Data Lifecycle
Here is a simplified data life cycle model, with some skills that are reasonable first steps
to getting involved in data. While I am most familiar with research data, so my examples
will come from there, I have had some experience with administrative data, and I think
many of the same steps apply, although with different motivation and regulations in
most cases.

Plan
So first thing that must be done is a little planning. Planning needs to be done first but
requires understanding of all the steps in the lifecycle so it is a good idea to learn a little
about each step, which I hope you’ll do today.

DMPs
Like any project or endeavor, it is helpful to have a plan before starting. A data
management plan makes clear how things will be done, where they will go, what will be
done, and who is in charge. In some cases a data management plan must be submitted
with a grant. Different funders have different templates for creating DMPs. A few tools
have been developed, in the US and in other countries, that use the agency templates

to walk people through the planning process, with boxes to fill in and hints along the
way.

DMPTool
At VCU the Libraries have set up DMPTool. DMPTool was developed back in 2011 as
a collaborative effort by 8 institutions as an National Science Foundation funded project.
Today there are close to 200 partners using and contributing to DMPTool. Partners help
to create new templates and update templates, and they can customize templates for
their users. But even if you aren’t a partner, you can still set up a DMPTool account and
create DMPs from any publicly available template. And the Public DMPs link in the top
bar links to a list of examples of data management plans that have been shared
publicly. They are not necessarily perfect, but they might be of some indication of what
needs to be included in data management plans.

Diversion
I’m afraid I have to take a diversion here to explain why specifically formatted data
management plans are important. Hint the Office of Science and Technology Policy
aka OSTP is responsible.

NIH Policies
Right now, NIH grants are working under these policies. You should be aware of the
NIH Public Access Policy that mandates deposit of NIH funded articles into PubMed
Central. But the NIH also has data sharing policies for larger grants and now for genetic
data, and when a researcher writes a grant, they need to address these data sharing
requirements.

NSF Policies
NSF has policies for data sharing and data management plans. Both the NIH and the
NSF require sharing of more than data, things like samples or animals or software are
included in their sharing policies.

OSTP Mordor
And now we have the OSTP memo that rules over all federal agencies. Well at least the
ones with over $100 million in annual research and development expenditures which is
20 of them.

OSTP memo
This memo came out in 2013 in an effort to increase access to publicly funded research
in hopes that the articles and data being made available through public access policies

will maximize the impact and accountability of the Federal research investment. The
government hopes that the policies each agency puts in place will accelerate scientific
breakthroughs and innovation, promote entrepreneurship, and enhance economic
growth and job creation.

DMPs
Each of the agencies have plans that require a DMP. Some have already started
requiring a DMP on some grants, most will start in 2016. The third principle of the Dept
of Energy noted here is mentioned in some of the other plans. As long as the DMP
explains why data is not being saved or shared, it isn’t necessary to save and share
everything. So far, only a couple of agencies have mentioned that they will actually
assess plans. Informally, those who work with grants have found that NSF does send
back grants with no DMP and a grant with a good DMP will get funded over a grant with
a poor one, all the rest of the grant being equal. So we expect the other agencies will be
similar.

Data Sharing
The data sharing provision in the OSTP memo specifically refers to digital data. Some
grants will be specific about where data should be stored, but so far, there is no easy
answer to this part of the memo. At the bottom is an answer from a NASA FAQ about
the OSTP policy. The whole thing is a rather entertaining read, but the OSTP memo
acknowledged that there would be some data that could not be shared, in the case of
this comment, the International Traffic in Arms Regulations or ITAR restricts exportation
of defense related technology. But, the grant writer need to explain why in the data
management plan. We’ll discuss more about sharing later, but I wanted to include the
sharing portion of the OSTP memo in this diversion.

Now Back to Our Regularly Scheduled Program
So, back to where we were discussing planning.

Ownership
While working on the DMP, it is a good time to learn about ownership of the data. At
most places with federal grants, the institution owns the data thanks to the BayhDole
act of 1980. But there may be policies or contracts signed that change this, so
investigate this at your place of work. Data cannot be copyrighted but table and figures
that contain that data can be. Data can be licensed. You can use Creative Commons
licenses, or others that are specialized for data. The DCC guide shows the many ways
this can be done. Patient Records are another area where ownership is an issue. It
varies by state, and the link here has a map and listing of the laws in various states. In

Virginia, where I live and work, there are two laws that state health records are the
property of healthcare providers or their employers.

Collect
Next is the beginning of the project, the actual collection of data.

Organize fortune PAUSE
Organizing and naming things well as they are collected is very important during the
collection phase.

Organize
A researcher will work best if they can organize their data and other resources in a way
that makes sense to them. Maybe they want all their tiff files together, so they organize
by file type. Maybe the date makes sense if they are doing a longitudinal study.
Whatever makes sense for the person and the work. A primary investigator needs to
teach this organization to students and others in the lab. The PIs are the ones
responsible for the data in the end, so they need to feel confident that the data is all
there. A working group may want to work together to come up with a structure that
works for the group. The main thing is to write out the structure and make sure
everyone uses it. ( hopefully this part reminds you a bit of Medical Subject Headings).

Naming
Naming conventions, like organizing, needs to be written out and understood by
everyone. You can see in the example that a clear name and organizing structure will
make it easier to find data later.

Possible elements for file names
There are many potential identifiers that could be used in a file name, depending on the
work being done. For example, a researcher with a large group of students and
assistants may want to include initials in file names.

Describe
Once data is being collected, information about the data needs to be documented.

Metadata is a love note to the future.
This quote by Jason Scott is perfect. One of the questions that works well with
researchers is to ask them if they will be able to figure out what their data means in five
years. When they think about it, they realize that documentation is important.

Metadata
Metadata is structured information about an item. Something like a MARC record with
lots of labelled fields. Metadata is used to describe the object, give administrative
information such as ownership and rights, and describe the structure of an item, which
can be very important when there are multiple files. There are specialty metadata
systems for different types of data and different subjects. The Metadata Standards
Directory can help you find the standards you need. Usually metadata of some sort is
required when uploading data to a repository, and in many cases, it will be a form that
just needs filling in so no programming skills are required, but the necessary information
needs to be collected.

Readme
A readme file in each folder is a slightly easier way than metadata to document your
work. Rather than lots of specific fields in XML or some other specialized format, you
just put basic information into a text file. It is good to include basics about who, what,
where, when, why, and how, but the main thing is to make sure others can tell what is in
the file or folder. This is really the minimum necessary documentation for most data.
You may have seen readme files connected to software you are installing, serving the
same purpose of giving you information to help with the use of the connected files.

Example Chimp Dryad Record
So here is an example of something near and dear to my heart. Since I was a child, I
wanted to grow up and be a zoologist working with Jane Goodall and the chimpanzees
in Gombe, and I can now look at some of the data collected there and available in
Dryad. This data package has the data used for a publication, which is listed in the
record. Note the CC0 and Open Data badges associated with this file on female
chimpanzee competition. As well as the Excel spreadsheet, there is a Readme doc and
the Details file leads to the metadata.

example metadata
This is the metadata record for the data package. You can see some of the labeled
fields in the record. There are actually more but this is what fits on the screen and you
can see from this how much it looks like a cataloging record.

example readme
This is a readme document from the same dataset. There aren’t so many set fields
here, and there is narrative to help explain some of the methods.

Example, actual spreadsheets
And you can see from these spreadsheets that having all that extra information is
necessary in order to use the files.

Data Dictionary
There is another type of descriptive tool that can be used with data, especially with
surveys and spreadsheets. A data dictionary describes each variable and how it is
measured or collected or obtained. This is especially helpful if members of a group don’t
regularly collect data, but it can also be helpful if a researcher needs to use data
collected in the past and doesn’t quite remember what things mean.

Data dictionary example
In this example of a data dictionary, the column headings for a spreadsheet on OSTP
policies are explained. This was especially important for this project because there were
many people from around the country working on the spreadsheet, so we had to make
sure we all understood what the terms meant as we read policies.

Data replication cartoon
I think this scenario speaks for itself. (take a sip of water, breathe)

Process and Analyze
Cleaning up data, which usually involves some sort of programming skills, and
analyzing data, which includes things like statistics and visualizing data, are skills that
require lots of training and practice in order to help others. There are librarians who
have done statistics and analysis in previous jobs or for library assessment or have had
education in the area. But there are quite a few people who don’t have these skills
either, so this may be a harder service to offer. Some faculty think statistics and
assessment should be taught by the departments where these skills are needed. But a
researcher who is comfortable doing stats for their work may not be comfortable
teaching the skill to students. The fact remains, that just like information literacy skills,
these data skills are needed but not always taught in regular courses, so the library has
a chance to step in and teach.

Unicorn
I’m going to do a quick diversion here, since we have already covered half the steps in
the data life cycle. We need to beware of unicorn job ads. Job ads that list too many
skills and duties are not doing anyone a favor. I point this out here because while some
librarians have analysis skills, there are many who don’t, and hiring a person to be a

data librarian and expecting them to do everything related to data is unrealistic,
especially if, as in many job ads, they are also expected to have liaison or teaching
duties. In the context of discussing what you can do at your place of work, it means that
you don’t need to do it all. Find out who can help with all these different aspects of the
data life cycle, and refer people to the specialists. If you have these skills, but not the
software at your place of work, or if you want to try learning these skills, I have a few
suggestions on the next slides.

Tools for Data Cleaning
There are some tools that are free that you can suggest to people, or use if you know
how, to help with these areas. For processing data, which means cleaning it up to
remove duplicates, standardize null cells, etc., OpenRefine has been around a while, it
used to be Google Refine, and Trifecta Wrangler used to be a university based tool, but
it now has a more robust paid version of the program if the free version isn’t enough.
Coursera has some course on data science, and the Johns Hopkins series has a unit on
data cleaning. For sensitive data, there are various anonymization programs, but just
last night I saw there is a new one, so it isn’t on this slide, but will be on the SlideShare
version. NLMScrubber is a clinical text deidentification tool designed and developed at
the National Library of Medicine. It is still in beta but looks useful.

Analysis and Visualization
There are also free analysis and visualization tools online. R and Tableau are quite
popular right now. R is a programming language and software environment, good for
statistics and text mining, which is publicly licensed. There are many user created
packages that increase the features of the program. Tableau Public doesn’t require
using a programming language, but unless you want to pay for the desktop version, the
visualizations you create will be publicly available. Flowing Data is fun to follow for the
interesting visualizations and the subjects covered, plus the helpful tutorials.

Tableau example
This Tableau Public created map is the Global Hunger Index for 2014 from the
International Food Policy Research Institute. It is interactive so you can actually change
the map by selecting different variables. For some data, it helps to have the ability to
adjust variables, so Tableau can be a very effective tool.

Publish and Share
Now that the data is collected and analyzed, researchers need to show their research
and data to get credit for their work and share their ideas.

Sharing Data
There are many reasons to share data. From the funder point of view, sharing data
helps spur on further research and shows them their money has gone towards
something useful. The researcher gets credit and hopefully more citations to their paper
and data, and by seeing the results of others, they might not have to duplicate research.
There are many groups pushing for more transparency in research and finding out how
much is actually reproducible using the methods in the original paper.

Ways to Share Data
Depositing in an open repository is a good way to make data open and publicly
available. There are quite a few free repositories and many places have institutional
repositories that take data. They generally have data size limits, but they usually have
paid subscriptions as well. Open Science Framework also has project space for open
science, collecting data and collaborating in real time, not just the final, cleaned
datasets. The Search Registry of Research Data Repositories is the best place to
search for subject repositories. If a specialized repository is available, it is the most
sensible place to put data because it is more likely to be found by the people who would
like to reuse it, which will increase citation counts to the data or the articles that derive
from that data.

Ways 2
Sharing by adding data to a journal article is another way to go. After all, the OSTP
memo requires public access to digital data that is “necessary to validate research
findings including datasets used to support scholarly publication”, so not all raw data
needs to be shared. But, as noted here, be sure to have the author read the contracts,
not only do researchers need to be sure their articles are public access if they are
federally funded, they need to make sure the data is not going to be controlled by some
other entity. And researchers don’t own their data, so they must be careful what they
sign. If you help with, or know people who are involved with, institutional data policy,
now is the time to be sure the policy allows for the public access required by OSTP
policies.
Researchers need to be aware that even if they haven’t published in a journal that
requires linked data, some journal policies require sharing of data for published articles
when asked. A recent blog post by Deevy Bishop, a professor of developmental
neurobiology at Oxford, tells the story of a young scientist who was contacted by PLOS
when she wouldn’t share data after the requester complained to the journal. So once
again, read the contract.

And of course, there is still the age old option of sending data when asked or more
recently, putting it on a personal web site. Personal web sites can be hard to find and
URLs can change, so they aren’t the best.

Sharing Sensitive Data
So now we come to sensitive data. Much of the data in healthcare is sensitive, usually
covered by Health Insurance Portability and Accountability Act ( HIPAA), but in order to
learn from clinical trials and studies of electronic health records this data needs to be
made available in some way. This graphic is based on the recommendations in an IOM
report released earlier this year, Sharing Clinical Trial Data: Maximizing Benefits,
Minimizing Risk. The IOM feel that sharing data is in the public interest, but a
multistakeholder effort is needed to develop a culture, infrastructure, and policies that
will foster responsible sharing. Keeping up with reports like this is important if you want
to be able help with sensitive data. It is also helpful to find out who deals with computer
security at your organization. Usually they are involved in protecting sensitive data and
can help you learn about the different options at your place of work. There are probably
encrypted servers dedicated to sensitive data, and encrypted email options. Encrypted
jump drives and laptops are usually available for those who need to transport patient
data. So ask around.

Controlled Access
Providing access to qualified researchers allows data to be scrutinized and supports the
efforts of the NIH and other organizations to make research transparent and
reproducible. Controlled access databases, such as registries, make sensitive data
available. If you really want to help with patient data, you will need to seek out those
already working with it and learn. There is room for a knowledgeable person who can
formulate a PICO question to make searching an Enterprise Data Warehouse easier,
but be prepared to get CITI training for human subjects research work.

Clinical Trials example
For example, the European Federation of Pharmaceutical Industries and Associations
(EFPIA) actually has a Clinical Trial Data Portal Gateway that links to portals at various
drug companies where researchers can apply to get anonymised patient level data
and/or supporting documents from clinical studies to conduct further research.

Preserve
The final part of the data lifecycle is preserving the data once the project is finished.

Storage vs Backup
First some clarification of terms. This is always important because most researchers just
think they are always storing their data, but data management plans need to mention
backups and long term preservation to be complete. So, storage is the working files
used everyday. Backup is regular copying of all the data to make sure it is there in case
of loss. If the data is sensitive, the locations for storage and backup need to be secure
enough for sensitive data.

Rule of 3
Kristin Briney at the University of Wisconsin Milwaukee has written an excellent blog
post explaining the rule of three for backing up data, the link is on this slide. Her other
blog posts and her book are worthwhile reading as well. Even the best of us forget to
backup our data or our documents, but the more effort that goes into data or writing, the
more you have to lose if you don’t backup. And don’t forget,back in October there was a
day when Google Drive was unavailable because of issues with Google Cloud services,
so a third backup is important.

Preservation
Data in paper notebooks is still usable because people can still read what was written.
In fact just last week I helped transcribe a page of a manuscript that is held by the
Folger Shakespeare Library. So, assuming the notebook isn’t lost and it has been kept
in conditions that don’t destroy the paper or vellum, the information is still there. But,
with digital data, this isn’t always the case. Digital records need to be read by a program
and if the program has changed or is no longer available, the data could be lost. So
upgrading file formats and other digital preservation tasks must be carried out by
repository staff regularly.

Considerations
When thinking about preserving digital data there are two things you need to take into
account:
How long must the data be kept? This will differ depending on the grant requirements
but there might also be organizational or other policies that dictate the length of time.
My university is a state institution, and state records management policy dictates we
keep data and other records associated with a project for 5 years after the end of the
project or grant.
The other consideration is the potential for reuse and value of the data. Just like
archivists must use varying criteria to decide what materials to buy, keep, collect, and
store in the archives (which always has limited space), data repositories can’t keep
everything and must decide which dataset should be preserved.

Appraisal
Just like there should be a collections policy to help with making decisions about what to
purchase for the library, there are various appraisal guidelines for deciding what data
should be kept. This list, from guidelines developed at the Digital Curation Center, gives
a good overview of how to evaluate data. Being interesting or fun is not enough.
Datasets need to be have value in some way to the group doing the preservation as
well as to future research. Notice point 7 Full Documentation. Without that, the data is
useless.

Where to Preserve Data
Some researchers may want to set up their own archive and reformat their data into
open, standardized formats that can be opened by most programs in the future, but
most will not. Many of the same repositories that help with sharing will also be good
places for long term preservation. There will likely be some cost for this, which can be
included in a grant. Look for date limits in the contracts. If there is no guarantee that the
data will be preserved for as long as is required, other options will need to be
considered. And as with so many other things, check the grant that supports the
research in case the data must be loaded into a specific repository, for example the
National Oceanic and Atmospheric Administration requires climate and ocean data from
research to be put in their repositories.

Don’t Forget Print
While the OSTP public access policies refer to digital data, other sharing policies and
data management plans recognize print data as well. Print research data should also be
backed up regularly, scanning is a easy and convenient in most places, and has the
added benefit of allowing students to send weekly scans rather than find a time to show
a lab book to their PI. Print data should also be stored with the same level of security
as digital data. For instance, a locked room or cabinet for sensitive data.

Reuse
There is one other step in the life cycle, reuse and repurposing, the whole reason for the
public data requirement of the OSTP. Most of the repositories that allow data deposit,
also allow for data to be downloaded and reused, plus there is lots of government data
available. I’ve taken RML classes on finding public health data and finding data through
the CDC. If dealing with research data seems intimidating or isn’t an option where you
are, learning how to find data to support the work of your organization is another option.
There are also many databases that can be purchased to allow access to data that is
proprietary or has government data that has been cleaned up and combined with other
useful data. These are excellent resources as well.

Data Information Literacy
Another area for potential service is data information literacy. Students, faculty, staff,
researchers, healthcare providers, and administrators all need to understand data and
data information literacy education fits well with the information literacy instruction many
librarians are already doing. The Data Management Literacy project that created the
stages of a literacy program shown here, has some teaching materials and articles on
developing curricula. DataOne and The New England Collaborative Data Management
Curriculum (NECDMC) both have modules you can download and use for teaching.

Dog
I’m sure at this point you are exhausted just listening to all this, but we’re not quite done.
I hope you haven’t forgotten the cases we started with.

Case 1
Since this person doesn’t know what the Excel spreadsheet contains, a data dictionary
seems in order, but I imagine a Readme file would be helpful too. Depending on the
grant or where this person’s article was published, they should have realized they
needed to be prepared to share.

Case 2
The lost data should be in other locations, so the Rule of 3 applies here. I imagine some
statistics instruction would be helpful too. And, as I mentioned, this paper is from 2013,
so it has probably been less than 5 years since this project was finished, so legally, they
should have still had their data saved.

Case 3
Somebody needs to be in charge, so in this case, the Primary Investigator or lead
researcher, should have been checking the notebook and required the person to
conform to group standards, which should have been explained at the very beginning by
the PI.

Case 4
This is a definite Rule of 3 case. A laptop should not be the only place your data is
stored.

Librarians and Data
So I hope you can see how all the same services and tasks that librarians have been
providing their users with for books, articles, audiovisual materials, etc., are applicable

to data. Data and data sets need to be collected, acquired, described, and organized.
Users need to find data pertinent to their questions, gain access, and have a way to cite
that data. Researchers need to organize and find their own data by learning the same
skills librarians already know. Students need to learn how to find and organize data in a
similar way to current information literacy instruction that teaches finding and organizing
resources for essays, research papers, lab reports, and dissertations. Libraries don’t
collect and organize traditional materials in a vacuum. Librarians work with
stakeholders to make sure collections and instruction serve the needs of their target
population and support the goals of their organization, so working with other groups to
collect, organize, sort, and preserve data is just an extension of current practice.
I hope you are inspired by at least one of the different areas I have talked about today.
But just as I warned against job descriptions that were asking for unicorns, you
shouldn’t try to do it all either, especially if you are working alone. An Association of
European Research Libraries working group report that included ten recommendations
to get started with RDM concluded: "There is no need for research libraries to start with
all recommendations or to try to deliver a full spectrum of data services at once. Small
steps will do."
I attended the Midwest Data Librarians Symposium back in October and it was filled
with wonderful ideas for data services. During the excellent wrap up session, Jamene
Brooks-Kieffer suggested a few things, and I’d like to share a couple of her slides and
ideas here.

Gardens
The important thing to remember is that every library is different and the needs of our
users are different.

Elephants
Not every idea will work in every place. There are different growing conditions.

A garden is
So like a gardener, we need to think about our local conditions, we need to cultivate the
people and resources that are available, and we need to do this intentionally with goal
of creating a lovely garden.

What is your local
So as you think about what your small steps will be, think about your local conditions.

And now I’m happy to answer any questions you might have.

Notes for Inroads into Data

More Related Content

Similar to Notes for Inroads into Data (20)

More from Margaret Henderson (15)

Recently uploaded (20)

Notes for Inroads into Data