SlideShare a Scribd company logo
LIBBIE STEPHENSON, DATA ARCHIVIST (RETIRED)
UCLA SOCIAL SCIENCE DATA ARCHIVE
LIBBIE@G.UCLA.EDU
HTTPS://DATAVERSE.HARVARD.EDU/DATAVERSE/SSDA_UCLA
Data Curation for Quantitative
Social Science Research:
A Case Study
NISO Virtual Conference: Data
Curation – Cultivating Past Research
Data for Future Consumption
August 31, 2016
DISCLAIMER
 I am retired from UCLA so my
comments reflect my own experience
and expertise. They do not necessarily
reflect the ideas, opinions or practices
of anyone at UCLA.
 These materials are free for you to
use, but please cite accordingly.
NISO - AUGUST 31, 2016
2
OVERVIEW
 About the Archive
 About the data we manage
 What we are trying to do
 What we actually do
 Some illustrations
NISO - AUGUST 31, 2016
3
ABOUT THE ARCHIVE
 Operating since 1964 -- before email, PC’s, Internet,
laptops, smart phones; Manage survey/quantitative
data stored on media from punch cards to cloud
 Staff have library science degrees; statistical and
technical expertise; quantitative social science
background
 Serve all UCLA quantitative researchers: Provide
reference, cataloging/metadata, long term archiving;
support in data rescue, management, security.
NISO - AUGUST 31, 2016
4https://dataverse.harvard.edu
/dataverse/ssda_ucla
SURVEY/QUANTITATIVE
RESEARCH
 Carried out in the U.S. since 1940’s -- post
WW2
 1960’s -70’s -- ICPSR & academic archives
 1970’s -- growth of data oriented professional
associations (IASSIST, APDU, IFDO, CESSDA)
 Focused on society and social norms
 Predict outcomes; test assumptions; study
change over time; run experiments
NISO - AUGUST 31, 2016
5
Note: in any
discipline we
also need to
understand
the work
flow of the
research and
the way
individuals
approach
their work.
CURATION GOALS
 Researcher driven philosophy of open access,
data sharing, reuse
 Collaborative, multi-unit or multi-institutional
 Ensure data conservation and long term usability,
as well as discovery and access
 Processes and work flows support disaster
planning
 Use of best and trusted digital repository
policies, models, practices, and work flows
 Reflect values of accountability and integrity
NISO - AUGUST 31, 2016
6
POLICIES SUPPORT PRACTICE
 Foundational, essential to a strong data curation
infrastructure.
 Encompasses what is acquired/collected, curation
levels and scope, ensures long term usability, drives
processes and work flows
 Social Science Data Archive policy
 TOOL : Policy-making for Research Data in
Repositories by Ann Green, Stuart Macdonald and
Robin Rice.
NISO - AUGUST 31, 2016
7
OUR STEPS IN CURATION
 Initial contact
 Data Quality Review and Appraisal
 Ingest
Verification
Metadata
Physical storage
 Access
 Preservation
NISO - AUGUST 31, 2016
8
INITIAL CONTACT
 Data Curation Profile
 Data Management Plan
 Guide to Social Science Data Preparation
and Archiving
NISO - AUGUST 31, 2016
9
APPRAISAL
 Archival Collection Policy
 Also depends on:
 Resources to process
 Long term resources
 Fitness, usefulness
 Data Deposit Form signatures and
completeness; commitment to share
data; privacy and confidentiality
NISO - AUGUST 31, 2016
10
DATA QUALITY REVIEW
Use of statistical packages, emulator, Adobe Pro, Excel,
Colectica, Text editor
 Verify deposit package, check sums, freq’s,
compare data to documentation
 Completeness of codebook, question text,
sampling, weighting, recodes, methods
 Disclosure analysis, check for personal identifiers
and assess privacy/confidentiality of respondents
 Documentation converted to PDF/A
11
NISO - AUGUST 31, 2016
EXAMPLE: WHAT KIND OF DATA?
NISO - AUGUST 31, 2016
12
CODEBOOK DOCUMENTS THE
COLUMNS
NISO - AUGUST 31, 2016
13
5002 01 01 302000 001 101 10004B121068965
Each item is
called a variable.
We refer to the
numeric content
of each item as a
value.
COMPARE FREQS TO CODEBOOK
NISO - AUGUST 31, 2016
14
VALUES
VALUE LABELS
VARIABLE
RUN MARGINALS/FREQUENCIES
NISO - AUGUST 31, 2016
15
Sex of Respondent
Frequency Percent Valid Percent Cumulative Percent
Valid MALE 856 45.1 45.1 45.1
FEMALE 1041 54.9 54.9 100.0
Total 1897 100.0 100.0
What is your race - ethnicity
Frequency Percent Valid Percent Cumulative Percent
Valid White 618 32.6 32.6 32.6
Hispanic 475 25.0 25.0 57.6
Black 474 25.0 25.0 82.6
Asian or Pacific Islander 282 14.9 14.9 97.5
Native American or Alaskan native 17 .9 .9 98.4
Identifies more than one of the above groups 20 1.1 1.1 99.4
DON'T KNOW 2 .1 .1 99.5
REFUSED 9 .5 .5 100.0
Total 1897 100.0 100.0
INGEST – PHYSICAL FORMATS
 Virus check, run check sums, address
versioning, fixity, file naming conventions
 Convert files to archival formats if required
 Back copies to external media
 Copy datasets to Dataverse; Safe Archive tool
 Use of secure file transfer client
 SQL/PHP scripts for local holdings file
 Compression software (7-zip)
NISO - AUGUST 31, 2016
16
Address
disaster plan
and file
access
(public and
local);
Security
requirements;
LOCKSS
INGEST– BIBLIOGRAPHIC METADATA
Bibliographic metadata enables search and
discovery:
 Establish bibliographic-level identity for unique
items
 Bibliographic record to WorldCat/Voyager
 Add record to holdings database (SQL)
 Create Dataverse record; Assign persistent
identifier
NISO - AUGUST 31, 2016
17
Produce and review with investigator
WHAT ELSE DO WE NEED TO
KNOW ABOUT THE DATA?
 Description of the study
 Citation
 Funding source
 Methodology
 Sampling
 Publications
NISO - AUGUST 31, 2016
18
EXAMPLE - DATAVERSE
NISO - AUGUST 31, 2016
19
Links to tools to
manage collections
Navigate to and
search for studies
Studies can be downloaded or
analyzed online
VARIABLE LEVEL SEARCH
CAPABILITIES
 Enables searching across many studies at
once.
 Enables searching shared catalogs of multiple
archives
 TOOLS: Colectica Repository and NESSTAR
 Requires local or remote hosting of software.
 Can share the metadata files for repurposing.
NISO - AUGUST 31, 2016
20
DATA DOCUMENTATION
INITIATIVE
Document, Discover, and Interoperate
 “International standard for describing data
that result from observational methods in
the social, behavioral, economic, and health
sciences”
 “Facilitates interpretation and understanding
-- both by humans and computers”
NISO - AUGUST 31, 2016
21
http://www.ddialliance.org/
INGEST-VARIABLE LEVEL METADATA
Descriptive metadata of detailed information about the
data enables understandability and reuse:
 Create variable-level metadata, using Colectica or
NESSTAR to produce standardized metadata records
 Create DDI record; full DDI codebook
 Migrate DDI to Colectica Repository
NISO - AUGUST 31, 2016
22
Produce and review with investigator
NESSTAR
EXAMPLE - IMPORTING DATA
 Use the
Data tab
to import
files from
SPSS or
STATA
formats.
NISO - AUGUST 31, 2016
23
Label
Question
text
Numeric
values
Variable Details include variable name,
label, description or question text, and
types of coding.
NISO - AUGUST 31, 2016
24
EXAMPLE DDI FROM COLECTICA
NISO - AUGUST 31, 2016
25
DDI fields are in
red; used to
create
documentation;
can be
repurposed
PRESERVATION AND CURATION
 Continuous monitoring of file formats; migrate to new formats
when:
New operating system; New version of statistical software
New mode of file transfer; Code change
 Monitoring of database function; software updates or redesigns
 Monitoring of servers, external media health; replace as needed
 Data forensics; check sums; validation; authentication; version
control; format migration; refresh media; record preservation
metadata -- DDI
 Review disaster plan and collection policy at regular intervals
 Review new or revised regulations for intellectual property;
security; data producers/distributors; funding agencies
 Review with original depositor, their data management plans,
changes in access or user permissions
26
Focus is on functional-level preservation and long term
usability through use of DDI and continuous review.
UNCOMFORTABLE TRUTHS
 Data management in institutions requires
high level administrative participation;
new, sustained funding; and differently
trained staff
 Data management planning is not a static
event but a continuous process to ensure
long term independently understandable
informed reuse of research
 There is an urgent need for standards, tools,
and best practice models for many different
file formats and disciplines
NISO - AUGUST 31, 2016
27
NEXT STEPS FOR PRACTITIONERS
“Crucial metadata about data are not always
being captured or created and linked to data in
repositories. Storage and persistence of data
submissions isn't enough. We need data
archivists and librarians to commit to partnering
with researchers to curate data -- to review
incoming data for usability, confidentiality, and
completeness of descriptive information.”
NISO - AUGUST 31, 2016
28
Ann Green (2016) Email communication
Used with permission
ANY QUESTIONS?
THANK YOU!
 Social Science Data Archive, UCLA
 Box 951484
Los Angeles, CA 90095-1484
310-825-0716
NISO - AUGUST 31, 2016
29
LINKSSocial Science Data Archive dataverse.harvard.edu/dataverse/ssda_ucla
Data Seal of Approval www.datasealofapproval.org/en/
National Digital Stewardship Alliance
ndsa.org/activities/levels-of-digital-preservation/
Open Archival Information System
www.oclc.org/research/publications/library/2000/lavoie-oais.html
Social Science Data Archive Policy
data-archive.library.ucla.edu/SSDA_collectionAndArchivingPolicy.pdf?_ga=
1.3255478.786669706.1378228281
Data Curation Profile datacurationprofiles.org/
Data Management Planning at ICPSR
www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/index.html
ICPSR Guide to Data Preparation
www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
Colectica www.colectica.com/
NESSTAR www.nesstar.com/index.html
DDI www.ddialliance.org/
Dataverse dataverse.org/
NISO - AUGUST 31, 2016

More Related Content

PDF
Levine - Data Curation; Ethics and Legal Considerations
PDF
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
PPTX
Tijerina-RDA-NISO-Task Groups-sept11
PDF
Baker - Evolution of Data Products and Designated Audiences
PDF
Carpenter - Privacy Implications Research Data - Intro
PDF
Johnston - How to Curate Research Data
PPTX
Authority files - Jisc Digital Festival 2014
PDF
Borgman - Privacy, Policy and Data Governance in the University
Levine - Data Curation; Ethics and Legal Considerations
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Tijerina-RDA-NISO-Task Groups-sept11
Baker - Evolution of Data Products and Designated Audiences
Carpenter - Privacy Implications Research Data - Intro
Johnston - How to Curate Research Data
Authority files - Jisc Digital Festival 2014
Borgman - Privacy, Policy and Data Governance in the University

What's hot (20)

PDF
Lee - The Data Lifecycle: Curating Partners to Curate Data
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
Allard - Research Data Services in Libraries
PPTX
NISO Training Thursday Crafting a Scientific Data Management Plan
PDF
Wheeler & Benedict -- Enabling the Preservation Relay
PPTX
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPTX
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
PPTX
Publishing perspectives on data management & future directions
PPTX
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
Why does research data matter to libraries
PPTX
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
PDF
No more waiting! Tools that work Today to reveal dataset use
PDF
RDAP14: Learning to Curate Panel
PPTX
Manage your online profile: Maximize the visibility of your work and make an ...
PPTX
The Data Management Ecosystem
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Lee - The Data Lifecycle: Curating Partners to Curate Data
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Allard - Research Data Services in Libraries
NISO Training Thursday Crafting a Scientific Data Management Plan
Wheeler & Benedict -- Enabling the Preservation Relay
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Managing, Sharing and Curating Your Research Data in a Digital Environment
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
Publishing perspectives on data management & future directions
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Why does research data matter to libraries
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
No more waiting! Tools that work Today to reveal dataset use
RDAP14: Learning to Curate Panel
Manage your online profile: Maximize the visibility of your work and make an ...
The Data Management Ecosystem
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Ad

Similar to Stephenson - Data Curation for Quantitative Social Science Research (20)

PDF
Natasha intro to rdm c3 dis may 2018.pptx
PDF
Research Integrity Advisor and Data Management
PPTX
Research Data Management and Reproducibility
PPTX
Ucla july 2018 natasha simons
PDF
FAIR-4-GSC-Sansone-Aug23.pdf
PDF
2021-01-27--biodiversity-informatics-gbif-(52slides)
PDF
My FAIR share of the work - Diamond Light Source - Dec 2018
PPTX
Fsci 2018 monday30_july_am6
PPTX
FAIR for the future: embracing all things data
PDF
12.10.14 Slides, “Roadmap to the Future of SHARE”
PDF
Holmes "Institutional Infrastructure for Data Sharing"
PDF
Alain Frey Research Data for universities and information producers
PPTX
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
PDF
Data publication: Discover, Explore, Visualise
PDF
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
PDF
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
PDF
INSERM - Data Management & Reuse of Health Data - May 2017
PDF
Open Access Week - Oxford, 20-24 Oct 2014
PPTX
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
PPTX
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Natasha intro to rdm c3 dis may 2018.pptx
Research Integrity Advisor and Data Management
Research Data Management and Reproducibility
Ucla july 2018 natasha simons
FAIR-4-GSC-Sansone-Aug23.pdf
2021-01-27--biodiversity-informatics-gbif-(52slides)
My FAIR share of the work - Diamond Light Source - Dec 2018
Fsci 2018 monday30_july_am6
FAIR for the future: embracing all things data
12.10.14 Slides, “Roadmap to the Future of SHARE”
Holmes "Institutional Infrastructure for Data Sharing"
Alain Frey Research Data for universities and information producers
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Data publication: Discover, Explore, Visualise
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
INSERM - Data Management & Reuse of Health Data - May 2017
Open Access Week - Oxford, 20-24 Oct 2014
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...

Recently uploaded (20)

PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Insiders guide to clinical Medicine.pdf
PDF
Classroom Observation Tools for Teachers
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
RMMM.pdf make it easy to upload and study
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Basic Mud Logging Guide for educational purpose
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Structure & Organelles in detailed.
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
VCE English Exam - Section C Student Revision Booklet
Insiders guide to clinical Medicine.pdf
Classroom Observation Tools for Teachers
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharma ospi slides which help in ospi learning
RMMM.pdf make it easy to upload and study
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Renaissance Architecture: A Journey from Faith to Humanism
Basic Mud Logging Guide for educational purpose
2.FourierTransform-ShortQuestionswithAnswers.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Anesthesia in Laparoscopic Surgery in India
Cell Structure & Organelles in detailed.

Stephenson - Data Curation for Quantitative Social Science Research

  • 1. LIBBIE STEPHENSON, DATA ARCHIVIST (RETIRED) UCLA SOCIAL SCIENCE DATA ARCHIVE LIBBIE@G.UCLA.EDU HTTPS://DATAVERSE.HARVARD.EDU/DATAVERSE/SSDA_UCLA Data Curation for Quantitative Social Science Research: A Case Study NISO Virtual Conference: Data Curation – Cultivating Past Research Data for Future Consumption August 31, 2016
  • 2. DISCLAIMER  I am retired from UCLA so my comments reflect my own experience and expertise. They do not necessarily reflect the ideas, opinions or practices of anyone at UCLA.  These materials are free for you to use, but please cite accordingly. NISO - AUGUST 31, 2016 2
  • 3. OVERVIEW  About the Archive  About the data we manage  What we are trying to do  What we actually do  Some illustrations NISO - AUGUST 31, 2016 3
  • 4. ABOUT THE ARCHIVE  Operating since 1964 -- before email, PC’s, Internet, laptops, smart phones; Manage survey/quantitative data stored on media from punch cards to cloud  Staff have library science degrees; statistical and technical expertise; quantitative social science background  Serve all UCLA quantitative researchers: Provide reference, cataloging/metadata, long term archiving; support in data rescue, management, security. NISO - AUGUST 31, 2016 4https://dataverse.harvard.edu /dataverse/ssda_ucla
  • 5. SURVEY/QUANTITATIVE RESEARCH  Carried out in the U.S. since 1940’s -- post WW2  1960’s -70’s -- ICPSR & academic archives  1970’s -- growth of data oriented professional associations (IASSIST, APDU, IFDO, CESSDA)  Focused on society and social norms  Predict outcomes; test assumptions; study change over time; run experiments NISO - AUGUST 31, 2016 5 Note: in any discipline we also need to understand the work flow of the research and the way individuals approach their work.
  • 6. CURATION GOALS  Researcher driven philosophy of open access, data sharing, reuse  Collaborative, multi-unit or multi-institutional  Ensure data conservation and long term usability, as well as discovery and access  Processes and work flows support disaster planning  Use of best and trusted digital repository policies, models, practices, and work flows  Reflect values of accountability and integrity NISO - AUGUST 31, 2016 6
  • 7. POLICIES SUPPORT PRACTICE  Foundational, essential to a strong data curation infrastructure.  Encompasses what is acquired/collected, curation levels and scope, ensures long term usability, drives processes and work flows  Social Science Data Archive policy  TOOL : Policy-making for Research Data in Repositories by Ann Green, Stuart Macdonald and Robin Rice. NISO - AUGUST 31, 2016 7
  • 8. OUR STEPS IN CURATION  Initial contact  Data Quality Review and Appraisal  Ingest Verification Metadata Physical storage  Access  Preservation NISO - AUGUST 31, 2016 8
  • 9. INITIAL CONTACT  Data Curation Profile  Data Management Plan  Guide to Social Science Data Preparation and Archiving NISO - AUGUST 31, 2016 9
  • 10. APPRAISAL  Archival Collection Policy  Also depends on:  Resources to process  Long term resources  Fitness, usefulness  Data Deposit Form signatures and completeness; commitment to share data; privacy and confidentiality NISO - AUGUST 31, 2016 10
  • 11. DATA QUALITY REVIEW Use of statistical packages, emulator, Adobe Pro, Excel, Colectica, Text editor  Verify deposit package, check sums, freq’s, compare data to documentation  Completeness of codebook, question text, sampling, weighting, recodes, methods  Disclosure analysis, check for personal identifiers and assess privacy/confidentiality of respondents  Documentation converted to PDF/A 11 NISO - AUGUST 31, 2016
  • 12. EXAMPLE: WHAT KIND OF DATA? NISO - AUGUST 31, 2016 12
  • 13. CODEBOOK DOCUMENTS THE COLUMNS NISO - AUGUST 31, 2016 13 5002 01 01 302000 001 101 10004B121068965 Each item is called a variable. We refer to the numeric content of each item as a value.
  • 14. COMPARE FREQS TO CODEBOOK NISO - AUGUST 31, 2016 14 VALUES VALUE LABELS VARIABLE
  • 15. RUN MARGINALS/FREQUENCIES NISO - AUGUST 31, 2016 15 Sex of Respondent Frequency Percent Valid Percent Cumulative Percent Valid MALE 856 45.1 45.1 45.1 FEMALE 1041 54.9 54.9 100.0 Total 1897 100.0 100.0 What is your race - ethnicity Frequency Percent Valid Percent Cumulative Percent Valid White 618 32.6 32.6 32.6 Hispanic 475 25.0 25.0 57.6 Black 474 25.0 25.0 82.6 Asian or Pacific Islander 282 14.9 14.9 97.5 Native American or Alaskan native 17 .9 .9 98.4 Identifies more than one of the above groups 20 1.1 1.1 99.4 DON'T KNOW 2 .1 .1 99.5 REFUSED 9 .5 .5 100.0 Total 1897 100.0 100.0
  • 16. INGEST – PHYSICAL FORMATS  Virus check, run check sums, address versioning, fixity, file naming conventions  Convert files to archival formats if required  Back copies to external media  Copy datasets to Dataverse; Safe Archive tool  Use of secure file transfer client  SQL/PHP scripts for local holdings file  Compression software (7-zip) NISO - AUGUST 31, 2016 16 Address disaster plan and file access (public and local); Security requirements; LOCKSS
  • 17. INGEST– BIBLIOGRAPHIC METADATA Bibliographic metadata enables search and discovery:  Establish bibliographic-level identity for unique items  Bibliographic record to WorldCat/Voyager  Add record to holdings database (SQL)  Create Dataverse record; Assign persistent identifier NISO - AUGUST 31, 2016 17 Produce and review with investigator
  • 18. WHAT ELSE DO WE NEED TO KNOW ABOUT THE DATA?  Description of the study  Citation  Funding source  Methodology  Sampling  Publications NISO - AUGUST 31, 2016 18
  • 19. EXAMPLE - DATAVERSE NISO - AUGUST 31, 2016 19 Links to tools to manage collections Navigate to and search for studies Studies can be downloaded or analyzed online
  • 20. VARIABLE LEVEL SEARCH CAPABILITIES  Enables searching across many studies at once.  Enables searching shared catalogs of multiple archives  TOOLS: Colectica Repository and NESSTAR  Requires local or remote hosting of software.  Can share the metadata files for repurposing. NISO - AUGUST 31, 2016 20
  • 21. DATA DOCUMENTATION INITIATIVE Document, Discover, and Interoperate  “International standard for describing data that result from observational methods in the social, behavioral, economic, and health sciences”  “Facilitates interpretation and understanding -- both by humans and computers” NISO - AUGUST 31, 2016 21 http://www.ddialliance.org/
  • 22. INGEST-VARIABLE LEVEL METADATA Descriptive metadata of detailed information about the data enables understandability and reuse:  Create variable-level metadata, using Colectica or NESSTAR to produce standardized metadata records  Create DDI record; full DDI codebook  Migrate DDI to Colectica Repository NISO - AUGUST 31, 2016 22 Produce and review with investigator NESSTAR
  • 23. EXAMPLE - IMPORTING DATA  Use the Data tab to import files from SPSS or STATA formats. NISO - AUGUST 31, 2016 23
  • 24. Label Question text Numeric values Variable Details include variable name, label, description or question text, and types of coding. NISO - AUGUST 31, 2016 24
  • 25. EXAMPLE DDI FROM COLECTICA NISO - AUGUST 31, 2016 25 DDI fields are in red; used to create documentation; can be repurposed
  • 26. PRESERVATION AND CURATION  Continuous monitoring of file formats; migrate to new formats when: New operating system; New version of statistical software New mode of file transfer; Code change  Monitoring of database function; software updates or redesigns  Monitoring of servers, external media health; replace as needed  Data forensics; check sums; validation; authentication; version control; format migration; refresh media; record preservation metadata -- DDI  Review disaster plan and collection policy at regular intervals  Review new or revised regulations for intellectual property; security; data producers/distributors; funding agencies  Review with original depositor, their data management plans, changes in access or user permissions 26 Focus is on functional-level preservation and long term usability through use of DDI and continuous review.
  • 27. UNCOMFORTABLE TRUTHS  Data management in institutions requires high level administrative participation; new, sustained funding; and differently trained staff  Data management planning is not a static event but a continuous process to ensure long term independently understandable informed reuse of research  There is an urgent need for standards, tools, and best practice models for many different file formats and disciplines NISO - AUGUST 31, 2016 27
  • 28. NEXT STEPS FOR PRACTITIONERS “Crucial metadata about data are not always being captured or created and linked to data in repositories. Storage and persistence of data submissions isn't enough. We need data archivists and librarians to commit to partnering with researchers to curate data -- to review incoming data for usability, confidentiality, and completeness of descriptive information.” NISO - AUGUST 31, 2016 28 Ann Green (2016) Email communication Used with permission
  • 29. ANY QUESTIONS? THANK YOU!  Social Science Data Archive, UCLA  Box 951484 Los Angeles, CA 90095-1484 310-825-0716 NISO - AUGUST 31, 2016 29
  • 30. LINKSSocial Science Data Archive dataverse.harvard.edu/dataverse/ssda_ucla Data Seal of Approval www.datasealofapproval.org/en/ National Digital Stewardship Alliance ndsa.org/activities/levels-of-digital-preservation/ Open Archival Information System www.oclc.org/research/publications/library/2000/lavoie-oais.html Social Science Data Archive Policy data-archive.library.ucla.edu/SSDA_collectionAndArchivingPolicy.pdf?_ga= 1.3255478.786669706.1378228281 Data Curation Profile datacurationprofiles.org/ Data Management Planning at ICPSR www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/index.html ICPSR Guide to Data Preparation www.icpsr.umich.edu/icpsrweb/content/deposit/guide/ Colectica www.colectica.com/ NESSTAR www.nesstar.com/index.html DDI www.ddialliance.org/ Dataverse dataverse.org/ NISO - AUGUST 31, 2016