Part A. EpiData Entry
Part A: EpiData Entry
Exercise 1 A data documentation sheet for a simple questionnaire
Exercise 2 The QES-REC-CHK triplet
Exercise 3 Derived fields and Check file commands unrelated to a specific field
Exercise 4 Data entry and validation
Exercise 5 Using an external file for Labelblocks
Exercise 6 Dealing with incomplete dates
Exercise 7 Keeping track of data entry time
Exercise 8 Safely backing up and encrypting your data
Acknowledgments:
We thank Ajay M V Kumar who has made valuable suggestions to improve the structure and
flow of argumentation of Part A.
Before we proceed to learn the details, let me provide an overview of EpiData entry.
The first step in the process is to prepare a plan for data entry. This plan is called the data
documentation sheet. This should not be confused with data collection form which is the
proforma used for collecting the data from study participants or extracted out of the
programme records. Data documentation sheet is a codebook containing the details of all the
variables (like names, labels, type, length, possible values and value labels) to be entered and
the check rules to be applied during the process of data entry. Like Epi Info 6, EpiData Entry
uses the same principle of what we call the QES-REC-CHK (pronounced “Ques-Rec-Check”)
Original Data Source (Register,
Treatment card, Questionnaire, Data
collection Proforma)
Prepare a Data Documentation Sheet
(Codebook) – Roadmap for
developing EpiData Triplet files
Make a QES File (QuEStionnaire file
which defines the structure of the
database and layout of data entry)
Make a REC File (RECord file which
holds the entered data and into which
data is entered)
Make a CHK File (also called CHecK
which holds the data checking rules
during data entry)
Double Data Entry and Validation
with appropriate documentation
Analyze data in EpiData or Export
data to other formats
Exercise 1: A data documentation sheet for a simple questionnaire
At the end of this exercise you should be able to:
a. Define the different types of fields/variables (text, numeric, date) and know
when to use them.
b. Create a data documentation sheet from a simple questionnaire
course_a_ex01_task
Page 2 of 7
files principle. First we create a QES file. This file defines the structure of the database and
the layout for data entry including field names, field labels, field type, and field length. From
the QES file we then create a REC file (data entry file which will contain all the data), and
finally we create a so-called CHK file (which contains and applies the rules of data entry)
linked to the data entry file to control data entry. These are referred to as EpiData triplet files
and are identified by their file extensions (.qes, .rec and .chk). Double data entry and
validation is considered a benchmark in assuring data quality and we fully subscribe to this
idea. Once QES-REC-CHK files are created, data are entered twice, independently by two
persons and compared with each other to identify discrepancies. These are then corrected by
referral to the original data source and saved as a final file which is used for analysis. The
rationale of double data entry is that the probability of committing the same error in the same
field twice, when entered by two independent data entry operators is small and hence data
entry errors remaining after validation will be minimal.
Note: Please do not worry if you are not able to understand all the terms at this point in time.
Be assured that you will appreciate these as you go along.
But let us proceed step by step and say that we have the following questionnaire:
This might present a typical simple questionnaire as used by an interviewer. Often such
questionnaires are first completed on paper. This is actually an excerpt from the Tuberculosis
Laboratory Register proposed by The Union:
We will use this register as the basis for this course. For the time being, you plan to write a
short and concise electronic data capture form, retaining only variables that are easy to
capture and are likely to be useful for the analysis. Please note this as a first principle in
being efficient – capture only those variables which you will use for analysis!
Each of the questions can be conceived of as a variable and the answer to the question as the
value that the variable takes for a particular individual. Variables are also referred to as
‘Fields’ in EpiData – both mean the same and will be used interchangeably. We will give
Laboratory serial number: ___
Date specimen received (dd/mm/yyyy): ___/___/_____
Sex: ___
Age in years: ____
Reason for examination: ___
Result of specimen 1: ___
Result of specimen 2: ___
Result of specimen 3: ___
course_a_ex01_task
Page 3 of 7
each variable a unique name. A completely entered data form for one study subject is called a
‘Record’. A set of such records is called a ‘File’ (REC file). The REC file thus contains
several records and each record contains information about one individual with respect to
several variables. We are going to describe each variable with respect to several attributes in
the data documentation sheet. Let us now understand some terminology we are going to use.
• Field name: This is the name of the variable and in EpiData, there are certain rules to
be followed in arriving at this name. We will come to these rules in a short while.
• Field Label: This is the descriptive name for the variable and contains a more detailed
description than the variable name can convey.
• Field Type: This describes the type of the variable – text, numeric or date being the
major types.
• Field length: This describes the number of characters that a value can take.
• Field values: This describes the possible values that a variable can take.
• Value labels: These are descriptive names for the values. For categorical variables
which are numerically coded, it is always useful to label them so that it is easier to
read and understand what each of the codes mean.
“Labels” are also called “metadata” or “data about data”. They play a key role in data files.
We may have entered a value “9” for a given field, but this number remains meaningless for
everyone without clearly specifying for what this value stands. It is important to get
acquainted to these terms and understand them clearly since we will be using them frequently.
We will be using several examples later in this chapter to clarify these terms.
Field name: Now, let us understand some rules in naming a variable.
First, it has to be single word that has not more than ten characters. This means that you
cannot use a space in the name as a space makes it more than a word. Also, you cannot use
any special characters like comma, semicolon, full-stop or underscore.
Second, use a name which is intuitive to understand what it means instead of generic field
names like v1, v2 and so on.
Third, it may be a good practice to keep the field names in lower case. This can be forced by
setting up an option. We have already done it and you may verify the set-up in “File” |
“Options” | “Create Data File”. Though EpiData is not case-sensitive, some other software is.
So, a field name of ‘sex’ (lower case), ‘SEX’ (Upper case) and ‘Sex’ (sentence case) are
understood differently. If you later plan to export your EpiData files for analysis to such a
software package (two examples are Stata and R, both of which are “case-sensitive”), it may
be a problem. Hence, the recommendation to keep it uniformly, “lower case”.
Do not start the variable name with a number. It cannot be ‘1v’, but it can be ‘v1’
The following words ‘date’, ‘month’, and ‘year’ are functions in EpiData and are
reserved names. Hence they cannot be used as variable names.
Note that some other analysis software may accept only a field length of eight
characters. If you later plan to export your EpiData files for analysis to such a
software package and you had used the full field length of ten, then your field
names get truncated.
course_a_ex01_task
Page 4 of 7
Field label: This is the full description of the variable and can be more than a word. Anything
that is written between the Field name and the field definition in the QES file is considered as
field label.
Field Type: There are different types of entry fields for the variables (we will follow the
EpiData Entry notation and call them “Fields”):
• Text fields: These fields take letters or numbers or a combination of these as possible
values, like PETER, KOCH1882, giraffe, 45677 etc. You can type anything on
the keyboard into this field. If you enter a number into such a field, it is accepted but
you will not be able to make any calculation with it. These fields are also sometimes
designated as character or alphanumeric fields, or most simply “string” (denoted by
S) fields as they take any string of characters.
• Numeric fields: These are numbers. The numbers might be integers (denoted by I)
like 885, 33, 1235 or real numbers like 3.4, 6.88, and 66.5 (also called floats
and denoted by ‘F’). You can make calculations with such fields.
• Date fields: (denoted by ‘D’) In different countries, different ways of writing dates are
used and this can be confusing for people from another culture. Some write 5 March
2005, others March 5 2005, and again others 2005 March 5. EpiData Entry lets you
choose the type of date you wish to take. In this course we will use European dates,
i.e. dates of the format 5 March 2005 or symbolized with DD/MM/YYYY.
• One other type of variables is called “logic” or “Boolean” variables. This is
sometimes used in food-borne outbreak investigations. There, answers to questions on
food items eaten is limited to “yes” and “no” and “missing”. In EpiData Entry, this
type of field accepts only the values Y, N, 0, 1, and space. There is no need for using
this field type, and we actually discourage its use as it might pose problems in analysis.
The alternative is a numeric field with a label block.
While you are asked to limit the length of the field name, you have much more flexibility with
the length of the value a field can take (up to a field length of 80), but we will try to use it as
efficiently as possible, that is we will limit the value length to the minimum needed.
Data Documentation Sheet
It is good practice to write what we call a data documentation sheet before you make your
actual EpiData Entry QES file. As mentioned earlier, EpiData refers to this as Codebook.
In the past, fields like SEX were commonly made text field with values “F” or “M” denoting
Females and Males. It is efficient as it uses only a length of 1. Things would get less efficient,
if we would have to code treatment outcome with the possible values “cured”, “completed”,
“died”, “failed”, “lost from follow-up”, “transferred out”, and “outcome not recorded”.
Note: If the Field label begins with a word that is identical to the Field name, you
will note later in EpiData Analysis, that this word will be truncated from the Field
label. For instance, if your Field name was SEX, and you used ‘SEX OF
EXAMINEE’ as your Field label, this would be truncated to ‘OF EXAMINEE’.
While this can be fixed easily in EpiData Analysis, it is preferable to prevent it by
choosing an alternative Field label during questionnaire design.
course_a_ex01_task
Page 5 of 7
Numeric coding is much simpler as there are up to ten possible values with a field length of
just 1:
1 Cured
2 Treatment completed
3 Died of any cause
4 Failed bacteriologically
5 Lost from follow-up
6 Transferred out
9 Outcome not recorded
You will also realize later in the analysis that this will be very convenient to apply the
selection criteria when you want to select a subset of data and undertake analysis only on the
subset. Of course, a prerequisite is that the link between the numeric value and the text label
is unambiguously clear. The role of labels is of enormous importance, they are also called
meta-data or “data about data”. We are going to make full use of numeric coding of field
values and using explicit text as value labels.
Let us now go through a few examples of the variables (from the tuberculosis laboratory
register example) and describe the various attributes of the variables in the data
documentation sheet. As you go through, you will note that preparation of data
documentation sheet requires thinking and knowledge of study data.
course_a_ex01_task
Page 6 of 7
This is how we would write such a data documentation sheet:
Field
name
Field label Field
type
Field
length
Field values Value labels Comment
serno Laboratory serial number * I 4 1-9000,
9001, 9002,
Serial number starting with 1 each year
Enter 9001, 9002,… if serial number is
not unique or missing, and write a data
entry note (use F5 to open a note file)
regdate Registration date D 10 01/01/2000-31/12/2005,
01/01/1800
Range of legal registration dates
Enter 01/011800 if no date recorded
sex Examinee’s sex I 1 1
2
9
Female sex
Male sex
Sex not recorded
Task:
o Complete the data documentation sheet for all fields in the questionnaire. Note that you should always define a value if no answer was
provided to a question.
o Think of the most efficient ways to code reason for examination and results of microscopic examination
* Note: Commonly, it will be preferable to make the identifier a text field. If it is a number, as in this case here with the laboratory
serial number, precautions must be taken to distinguish e.g. “0001” from “1”, requiring that the numeric value is entered into one
field, and another field, the actual identifier field, is automatically correctly calculated to add leading zeros where appropriate.
This will actually be done in a later exercise.
course_a_ex01_task.doc
Page 7 of 7

More Related Content

PDF
Epidata lecture note
PDF
toolkit13_sec9.pdf
PDF
epidataguideziad.pdf grguhireghuhgreuigrguirigig
PDF
IGCSE ICT (0417/0983) - Databases - Ajiro Tech
PDF
Introduction to Data Science With R Notes
PDF
G0361034038
PDF
Testing Different Log Bases for Vector Model Weighting Technique
PDF
Testing Different Log Bases for Vector Model Weighting Technique
Epidata lecture note
toolkit13_sec9.pdf
epidataguideziad.pdf grguhireghuhgreuigrguirigig
IGCSE ICT (0417/0983) - Databases - Ajiro Tech
Introduction to Data Science With R Notes
G0361034038
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Technique

Similar to course31_a_ex01_task.pdf (20)

PDF
CIS 1403 Lab 2- Data Types and Variables
PDF
A-Study_TopicModeling
PPTX
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
PPT
Data Archiving and Processing
PPTX
Data Science Process.pptx
PDF
Get Solution Manual for COMP 3, 3rd Edition Free All Chapters Available
DOCX
DATABASE MANAGEMENT SYSTEM
PDF
Stata tutorial university of princeton
PDF
Enhancing the labelling technique of
PDF
PDF
Download full ebook of Solution Manual for COMP 3, 3rd Edition instant downlo...
PDF
A Document Exploring System on LDA Topic Model for Wikipedia Articles
PDF
D-5436
PDF
Download full ebook of Solution Manual for COMP 3, 3rd Edition instant downlo...
PPTX
EDI Training Module 12: An Introduction to Metadata and Data Repositories
PPTX
EDI Training Module 6: Creating Quality Metadata
PDF
Fundamentals_of_Data__Structure_in_C.pdf
PDF
Bi4101343346
PPTX
PDF
Solution Manual for COMP 3, 3rd Edition all chapter instant download
CIS 1403 Lab 2- Data Types and Variables
A-Study_TopicModeling
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Archiving and Processing
Data Science Process.pptx
Get Solution Manual for COMP 3, 3rd Edition Free All Chapters Available
DATABASE MANAGEMENT SYSTEM
Stata tutorial university of princeton
Enhancing the labelling technique of
Download full ebook of Solution Manual for COMP 3, 3rd Edition instant downlo...
A Document Exploring System on LDA Topic Model for Wikipedia Articles
D-5436
Download full ebook of Solution Manual for COMP 3, 3rd Edition instant downlo...
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 6: Creating Quality Metadata
Fundamentals_of_Data__Structure_in_C.pdf
Bi4101343346
Solution Manual for COMP 3, 3rd Edition all chapter instant download

Recently uploaded (20)

PPTX
Vaccines and immunization including cold chain , Open vial policy.pptx
PDF
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
PDF
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
PDF
OSCE Series Set 1 ( Questions & Answers ).pdf
PDF
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
PPT
Opthalmology presentation MRCP preparation.ppt
PPT
Dermatology for member of royalcollege.ppt
PPT
Rheumatology Member of Royal College of Physicians.ppt
PPTX
thio and propofol mechanism and uses.pptx
PPTX
merged_presentation_choladeck (3) (2).pptx
PPT
nephrology MRCP - Member of Royal College of Physicians ppt
PPTX
ANESTHETIC CONSIDERATION IN ALCOHOLIC ASSOCIATED LIVER DISEASE.pptx
PDF
Lecture 8- Cornea and Sclera .pdf 5tg year
PDF
Glaucoma Definition, Introduction, Etiology, Epidemiology, Clinical Presentat...
PPTX
4. Abdominal Trauma 2020.jiuiwhewh2udwepptx
PPT
Infections Member of Royal College of Physicians.ppt
PPTX
SHOCK- lectures on types of shock ,and complications w
PPTX
Physiology of Thyroid Hormones.pptx
PPTX
Approach to chest pain, SOB, palpitation and prolonged fever
PDF
OSCE Series ( Questions & Answers ) - Set 6.pdf
Vaccines and immunization including cold chain , Open vial policy.pptx
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
OSCE Series Set 1 ( Questions & Answers ).pdf
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
Opthalmology presentation MRCP preparation.ppt
Dermatology for member of royalcollege.ppt
Rheumatology Member of Royal College of Physicians.ppt
thio and propofol mechanism and uses.pptx
merged_presentation_choladeck (3) (2).pptx
nephrology MRCP - Member of Royal College of Physicians ppt
ANESTHETIC CONSIDERATION IN ALCOHOLIC ASSOCIATED LIVER DISEASE.pptx
Lecture 8- Cornea and Sclera .pdf 5tg year
Glaucoma Definition, Introduction, Etiology, Epidemiology, Clinical Presentat...
4. Abdominal Trauma 2020.jiuiwhewh2udwepptx
Infections Member of Royal College of Physicians.ppt
SHOCK- lectures on types of shock ,and complications w
Physiology of Thyroid Hormones.pptx
Approach to chest pain, SOB, palpitation and prolonged fever
OSCE Series ( Questions & Answers ) - Set 6.pdf

course31_a_ex01_task.pdf

  • 1. Part A. EpiData Entry Part A: EpiData Entry Exercise 1 A data documentation sheet for a simple questionnaire Exercise 2 The QES-REC-CHK triplet Exercise 3 Derived fields and Check file commands unrelated to a specific field Exercise 4 Data entry and validation Exercise 5 Using an external file for Labelblocks Exercise 6 Dealing with incomplete dates Exercise 7 Keeping track of data entry time Exercise 8 Safely backing up and encrypting your data Acknowledgments: We thank Ajay M V Kumar who has made valuable suggestions to improve the structure and flow of argumentation of Part A.
  • 2. Before we proceed to learn the details, let me provide an overview of EpiData entry. The first step in the process is to prepare a plan for data entry. This plan is called the data documentation sheet. This should not be confused with data collection form which is the proforma used for collecting the data from study participants or extracted out of the programme records. Data documentation sheet is a codebook containing the details of all the variables (like names, labels, type, length, possible values and value labels) to be entered and the check rules to be applied during the process of data entry. Like Epi Info 6, EpiData Entry uses the same principle of what we call the QES-REC-CHK (pronounced “Ques-Rec-Check”) Original Data Source (Register, Treatment card, Questionnaire, Data collection Proforma) Prepare a Data Documentation Sheet (Codebook) – Roadmap for developing EpiData Triplet files Make a QES File (QuEStionnaire file which defines the structure of the database and layout of data entry) Make a REC File (RECord file which holds the entered data and into which data is entered) Make a CHK File (also called CHecK which holds the data checking rules during data entry) Double Data Entry and Validation with appropriate documentation Analyze data in EpiData or Export data to other formats Exercise 1: A data documentation sheet for a simple questionnaire At the end of this exercise you should be able to: a. Define the different types of fields/variables (text, numeric, date) and know when to use them. b. Create a data documentation sheet from a simple questionnaire course_a_ex01_task Page 2 of 7
  • 3. files principle. First we create a QES file. This file defines the structure of the database and the layout for data entry including field names, field labels, field type, and field length. From the QES file we then create a REC file (data entry file which will contain all the data), and finally we create a so-called CHK file (which contains and applies the rules of data entry) linked to the data entry file to control data entry. These are referred to as EpiData triplet files and are identified by their file extensions (.qes, .rec and .chk). Double data entry and validation is considered a benchmark in assuring data quality and we fully subscribe to this idea. Once QES-REC-CHK files are created, data are entered twice, independently by two persons and compared with each other to identify discrepancies. These are then corrected by referral to the original data source and saved as a final file which is used for analysis. The rationale of double data entry is that the probability of committing the same error in the same field twice, when entered by two independent data entry operators is small and hence data entry errors remaining after validation will be minimal. Note: Please do not worry if you are not able to understand all the terms at this point in time. Be assured that you will appreciate these as you go along. But let us proceed step by step and say that we have the following questionnaire: This might present a typical simple questionnaire as used by an interviewer. Often such questionnaires are first completed on paper. This is actually an excerpt from the Tuberculosis Laboratory Register proposed by The Union: We will use this register as the basis for this course. For the time being, you plan to write a short and concise electronic data capture form, retaining only variables that are easy to capture and are likely to be useful for the analysis. Please note this as a first principle in being efficient – capture only those variables which you will use for analysis! Each of the questions can be conceived of as a variable and the answer to the question as the value that the variable takes for a particular individual. Variables are also referred to as ‘Fields’ in EpiData – both mean the same and will be used interchangeably. We will give Laboratory serial number: ___ Date specimen received (dd/mm/yyyy): ___/___/_____ Sex: ___ Age in years: ____ Reason for examination: ___ Result of specimen 1: ___ Result of specimen 2: ___ Result of specimen 3: ___ course_a_ex01_task Page 3 of 7
  • 4. each variable a unique name. A completely entered data form for one study subject is called a ‘Record’. A set of such records is called a ‘File’ (REC file). The REC file thus contains several records and each record contains information about one individual with respect to several variables. We are going to describe each variable with respect to several attributes in the data documentation sheet. Let us now understand some terminology we are going to use. • Field name: This is the name of the variable and in EpiData, there are certain rules to be followed in arriving at this name. We will come to these rules in a short while. • Field Label: This is the descriptive name for the variable and contains a more detailed description than the variable name can convey. • Field Type: This describes the type of the variable – text, numeric or date being the major types. • Field length: This describes the number of characters that a value can take. • Field values: This describes the possible values that a variable can take. • Value labels: These are descriptive names for the values. For categorical variables which are numerically coded, it is always useful to label them so that it is easier to read and understand what each of the codes mean. “Labels” are also called “metadata” or “data about data”. They play a key role in data files. We may have entered a value “9” for a given field, but this number remains meaningless for everyone without clearly specifying for what this value stands. It is important to get acquainted to these terms and understand them clearly since we will be using them frequently. We will be using several examples later in this chapter to clarify these terms. Field name: Now, let us understand some rules in naming a variable. First, it has to be single word that has not more than ten characters. This means that you cannot use a space in the name as a space makes it more than a word. Also, you cannot use any special characters like comma, semicolon, full-stop or underscore. Second, use a name which is intuitive to understand what it means instead of generic field names like v1, v2 and so on. Third, it may be a good practice to keep the field names in lower case. This can be forced by setting up an option. We have already done it and you may verify the set-up in “File” | “Options” | “Create Data File”. Though EpiData is not case-sensitive, some other software is. So, a field name of ‘sex’ (lower case), ‘SEX’ (Upper case) and ‘Sex’ (sentence case) are understood differently. If you later plan to export your EpiData files for analysis to such a software package (two examples are Stata and R, both of which are “case-sensitive”), it may be a problem. Hence, the recommendation to keep it uniformly, “lower case”. Do not start the variable name with a number. It cannot be ‘1v’, but it can be ‘v1’ The following words ‘date’, ‘month’, and ‘year’ are functions in EpiData and are reserved names. Hence they cannot be used as variable names. Note that some other analysis software may accept only a field length of eight characters. If you later plan to export your EpiData files for analysis to such a software package and you had used the full field length of ten, then your field names get truncated. course_a_ex01_task Page 4 of 7
  • 5. Field label: This is the full description of the variable and can be more than a word. Anything that is written between the Field name and the field definition in the QES file is considered as field label. Field Type: There are different types of entry fields for the variables (we will follow the EpiData Entry notation and call them “Fields”): • Text fields: These fields take letters or numbers or a combination of these as possible values, like PETER, KOCH1882, giraffe, 45677 etc. You can type anything on the keyboard into this field. If you enter a number into such a field, it is accepted but you will not be able to make any calculation with it. These fields are also sometimes designated as character or alphanumeric fields, or most simply “string” (denoted by S) fields as they take any string of characters. • Numeric fields: These are numbers. The numbers might be integers (denoted by I) like 885, 33, 1235 or real numbers like 3.4, 6.88, and 66.5 (also called floats and denoted by ‘F’). You can make calculations with such fields. • Date fields: (denoted by ‘D’) In different countries, different ways of writing dates are used and this can be confusing for people from another culture. Some write 5 March 2005, others March 5 2005, and again others 2005 March 5. EpiData Entry lets you choose the type of date you wish to take. In this course we will use European dates, i.e. dates of the format 5 March 2005 or symbolized with DD/MM/YYYY. • One other type of variables is called “logic” or “Boolean” variables. This is sometimes used in food-borne outbreak investigations. There, answers to questions on food items eaten is limited to “yes” and “no” and “missing”. In EpiData Entry, this type of field accepts only the values Y, N, 0, 1, and space. There is no need for using this field type, and we actually discourage its use as it might pose problems in analysis. The alternative is a numeric field with a label block. While you are asked to limit the length of the field name, you have much more flexibility with the length of the value a field can take (up to a field length of 80), but we will try to use it as efficiently as possible, that is we will limit the value length to the minimum needed. Data Documentation Sheet It is good practice to write what we call a data documentation sheet before you make your actual EpiData Entry QES file. As mentioned earlier, EpiData refers to this as Codebook. In the past, fields like SEX were commonly made text field with values “F” or “M” denoting Females and Males. It is efficient as it uses only a length of 1. Things would get less efficient, if we would have to code treatment outcome with the possible values “cured”, “completed”, “died”, “failed”, “lost from follow-up”, “transferred out”, and “outcome not recorded”. Note: If the Field label begins with a word that is identical to the Field name, you will note later in EpiData Analysis, that this word will be truncated from the Field label. For instance, if your Field name was SEX, and you used ‘SEX OF EXAMINEE’ as your Field label, this would be truncated to ‘OF EXAMINEE’. While this can be fixed easily in EpiData Analysis, it is preferable to prevent it by choosing an alternative Field label during questionnaire design. course_a_ex01_task Page 5 of 7
  • 6. Numeric coding is much simpler as there are up to ten possible values with a field length of just 1: 1 Cured 2 Treatment completed 3 Died of any cause 4 Failed bacteriologically 5 Lost from follow-up 6 Transferred out 9 Outcome not recorded You will also realize later in the analysis that this will be very convenient to apply the selection criteria when you want to select a subset of data and undertake analysis only on the subset. Of course, a prerequisite is that the link between the numeric value and the text label is unambiguously clear. The role of labels is of enormous importance, they are also called meta-data or “data about data”. We are going to make full use of numeric coding of field values and using explicit text as value labels. Let us now go through a few examples of the variables (from the tuberculosis laboratory register example) and describe the various attributes of the variables in the data documentation sheet. As you go through, you will note that preparation of data documentation sheet requires thinking and knowledge of study data. course_a_ex01_task Page 6 of 7
  • 7. This is how we would write such a data documentation sheet: Field name Field label Field type Field length Field values Value labels Comment serno Laboratory serial number * I 4 1-9000, 9001, 9002, Serial number starting with 1 each year Enter 9001, 9002,… if serial number is not unique or missing, and write a data entry note (use F5 to open a note file) regdate Registration date D 10 01/01/2000-31/12/2005, 01/01/1800 Range of legal registration dates Enter 01/011800 if no date recorded sex Examinee’s sex I 1 1 2 9 Female sex Male sex Sex not recorded Task: o Complete the data documentation sheet for all fields in the questionnaire. Note that you should always define a value if no answer was provided to a question. o Think of the most efficient ways to code reason for examination and results of microscopic examination * Note: Commonly, it will be preferable to make the identifier a text field. If it is a number, as in this case here with the laboratory serial number, precautions must be taken to distinguish e.g. “0001” from “1”, requiring that the numeric value is entered into one field, and another field, the actual identifier field, is automatically correctly calculated to add leading zeros where appropriate. This will actually be done in a later exercise. course_a_ex01_task.doc Page 7 of 7