SlideShare a Scribd company logo
IUPUI University Library Center for Digital Scholarship
Data Management Lab: Spring 2014
Data Entry Best Practices
Data Entry
1. Dataset creation and integrity
a. Separate the coding and data entry tasks as much as possible
b. Coding should be performed so that distractions to coding tasks are minimized
c. Arrange for particularly complex tasks to be carried out by people specially trained for
the task
d. Use a data-entry program that is designed to catch typing errors (i.e., one that's pre-
programmed to detect out of range values)
e. Perform double entry of data
f. Carefully check the first 5-10 percent of the data records created, then choose random
records to quality-control checks throughout the process
g. Let the computer do complex coding and recoding, if possible
2. Things to check
a. Wild codes and out-of-range values
b. Consistency checks - comparisons across variables
c. Record matches and counts - relevant in longitudinal studies where subjects may have
more than one record and varying numbers of records
3. Variable names
a. Prefix, root, suffix systems is a systematic approach (compared to one-up numbers,
question numbers, and mnemonic names)
4. Variable labels
a. Should provide three pieces of information
i. The item or question number in the original data collection instrument
ii. A clear indication of the variable's content
iii. An indication of whether the variable is constructed from other items
5. Variable groups
a. Groups are recommended if a dataset contains a large number of variables
b. Can effectively organize a dataset an enable secondary analysts get an overview of a
dataset quickly
6. Over the long-term, store data in a consistent format
References
1. ICPSR. (2012). Guide to Social Science Data Preparation and Archiving, University of Michigan,
Ann Arbor, MI. From http://www.icpsr.umich.edu/files/deposit/dataprep.pdf.
2. Scott, T. 2012. Guidelines for data collection and entry.
From http://www.mc.vanderbilt.edu/gcrc/workshop_files/2012-09-07.pdf
3. DataONE Education Module: Data Entry and Manipulation. DataONE.
From http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx
Heather Coates, 2013

More Related Content

PPTX
A Typical Day in the life of a data manager.
PPTX
PDF
2015 GU-ICBI Poster (third printing)
PPTX
Data and Donuts: Data organization
PPTX
How Bird Atlas Data is getting used
PPTX
Pine education-platform
PPT
NISO Webinar on Usage Data: An Overview of Recent Usage Data Research
A Typical Day in the life of a data manager.
2015 GU-ICBI Poster (third printing)
Data and Donuts: Data organization
How Bird Atlas Data is getting used
Pine education-platform
NISO Webinar on Usage Data: An Overview of Recent Usage Data Research

What's hot (7)

PPTX
Transparency and reproducibility in research
PPT
Warm Up 08-18
PPTX
eSource: A Clinical Data Manager's Tale of Three Studies
PDF
rOpenGov: an R ecosystem for open government data and computational social sc...
DOCX
Transparency and reproducibility in research
Warm Up 08-18
eSource: A Clinical Data Manager's Tale of Three Studies
rOpenGov: an R ecosystem for open government data and computational social sc...
Ad

Similar to Data Management Lab: Session 3 Data Entry Best Practices (20)

PDF
Bi4101343346
PDF
An Empirical Study of the Applications of Classification Techniques in Studen...
PDF
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
PDF
Indexing based Genetic Programming Approach to Record Deduplication
PPTX
Pemanfaatan Big Data Dalam Riset 2023.pptx
PPTX
Trends and innovations in database course
PDF
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
PDF
Data Management Lab: Session 3 Data Coding Best Practices
PDF
Predicting students' performance using id3 and c4.5 classification algorithms
PDF
Role of computers in research
PDF
A Survey on the Classification Techniques In Educational Data Mining
PPTX
Exam Questions
PDF
Data Wrangling with Python_ Cleaning and Preparing Datasets for Analysis.pdf
PPTX
1-Characteristics-of-Quantitative-Research-New-September-8.pptx
PDF
Role of Computers in Research, Data Processing, Data Analysis
PDF
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...
DOCX
Student database management system
PDF
Read Between The Lines: an Annotation Tool for Multimodal Data
PPTX
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
Bi4101343346
An Empirical Study of the Applications of Classification Techniques in Studen...
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Indexing based Genetic Programming Approach to Record Deduplication
Pemanfaatan Big Data Dalam Riset 2023.pptx
Trends and innovations in database course
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
Data Management Lab: Session 3 Data Coding Best Practices
Predicting students' performance using id3 and c4.5 classification algorithms
Role of computers in research
A Survey on the Classification Techniques In Educational Data Mining
Exam Questions
Data Wrangling with Python_ Cleaning and Preparing Datasets for Analysis.pdf
1-Characteristics-of-Quantitative-Research-New-September-8.pptx
Role of Computers in Research, Data Processing, Data Analysis
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...
Student database management system
Read Between The Lines: an Annotation Tool for Multimodal Data
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
Ad

More from IUPUI (20)

PPTX
Altmetrics 101 - Altmetrics in Libraries
PPTX
Gather evidence to demonstrate the impact of your research
PPTX
Managing data responsibly to enable research interity
PPTX
Case studies for open science
PPTX
Midwest Medical Library Association 2015 Big Data Panel
PPTX
Gathering Evidence to Demonstrate Impact
PDF
Citation & altmetrics - a comparison
PPTX
Altmetrics for Team Science
PDF
Ensuring data quality
PDF
Preventing data loss
PDF
Practical Data Management Plans
PDF
Teaching data management in a lab environment (IASSIST 2014)
PPTX
Building the Future of Research Together
PDF
NIH Data Sharing Plan Workshop - Handout
PDF
NIH Data Sharing Plan Workshop - Slides
PDF
Data Management Lab: Session 4 Slides
PDF
Data Management Lab: Session 4 Review Outline
PDF
Data Management Lab: Session 3 Slides
PDF
Data Management Lab: Session 3 Data Review Checklist
PDF
Data Management Lab: Session 2 slides
Altmetrics 101 - Altmetrics in Libraries
Gather evidence to demonstrate the impact of your research
Managing data responsibly to enable research interity
Case studies for open science
Midwest Medical Library Association 2015 Big Data Panel
Gathering Evidence to Demonstrate Impact
Citation & altmetrics - a comparison
Altmetrics for Team Science
Ensuring data quality
Preventing data loss
Practical Data Management Plans
Teaching data management in a lab environment (IASSIST 2014)
Building the Future of Research Together
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Slides
Data Management Lab: Session 4 Slides
Data Management Lab: Session 4 Review Outline
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 Data Review Checklist
Data Management Lab: Session 2 slides

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
RMMM.pdf make it easy to upload and study
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Lesson notes of climatology university.
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Basic Mud Logging Guide for educational purpose
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
master seminar digital applications in india
PPTX
PPH.pptx obstetrics and gynecology in nursing
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
TR - Agricultural Crops Production NC III.pdf
Microbial diseases, their pathogenesis and prophylaxis
RMMM.pdf make it easy to upload and study
STATICS OF THE RIGID BODIES Hibbelers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
human mycosis Human fungal infections are called human mycosis..pptx
Sports Quiz easy sports quiz sports quiz
Lesson notes of climatology university.
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Basic Mud Logging Guide for educational purpose
Microbial disease of the cardiovascular and lymphatic systems
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
master seminar digital applications in india
PPH.pptx obstetrics and gynecology in nursing

Data Management Lab: Session 3 Data Entry Best Practices

  • 1. IUPUI University Library Center for Digital Scholarship Data Management Lab: Spring 2014 Data Entry Best Practices Data Entry 1. Dataset creation and integrity a. Separate the coding and data entry tasks as much as possible b. Coding should be performed so that distractions to coding tasks are minimized c. Arrange for particularly complex tasks to be carried out by people specially trained for the task d. Use a data-entry program that is designed to catch typing errors (i.e., one that's pre- programmed to detect out of range values) e. Perform double entry of data f. Carefully check the first 5-10 percent of the data records created, then choose random records to quality-control checks throughout the process g. Let the computer do complex coding and recoding, if possible 2. Things to check a. Wild codes and out-of-range values b. Consistency checks - comparisons across variables c. Record matches and counts - relevant in longitudinal studies where subjects may have more than one record and varying numbers of records 3. Variable names a. Prefix, root, suffix systems is a systematic approach (compared to one-up numbers, question numbers, and mnemonic names) 4. Variable labels a. Should provide three pieces of information i. The item or question number in the original data collection instrument ii. A clear indication of the variable's content iii. An indication of whether the variable is constructed from other items 5. Variable groups a. Groups are recommended if a dataset contains a large number of variables b. Can effectively organize a dataset an enable secondary analysts get an overview of a dataset quickly 6. Over the long-term, store data in a consistent format References 1. ICPSR. (2012). Guide to Social Science Data Preparation and Archiving, University of Michigan, Ann Arbor, MI. From http://www.icpsr.umich.edu/files/deposit/dataprep.pdf. 2. Scott, T. 2012. Guidelines for data collection and entry. From http://www.mc.vanderbilt.edu/gcrc/workshop_files/2012-09-07.pdf 3. DataONE Education Module: Data Entry and Manipulation. DataONE. From http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx Heather Coates, 2013