1. Designing and Conducting Health
Systems Research Projects
Part II: Data Analysis and Report
Writing
Module 22
DESCRIPTION OF VARIABLES
2. OBJECTIVES
At the end of this session you should be able to:
Describe data in terms of frequency distributions,
percentages, and proportions.
Use figures to present data.
Explain the difference between mean, median and
mode.
Calculate the frequencies, percentages, proportions,
ratios, rates, means, medians, and modes for the major
variables in your study that require such calculations.
Identify other independent variables (in addition to the
ones identified during the first workshops), if any, that
are necessary in the analysis of your data.
3. Sequence of Presentation
I. Introduction
II. Frequency distributions
III. Percentages, proportions, ratios, and rates
IV. Figures
V. Measures of central tendency
4. INTRODUCTION
You selected the variables for your study during
proposal development
Dependent Variables - define your problem
Independent variables - these were contributory
factors to your problem
The purpose of data analysis is to identify
whether these assumptions were correct or not,
The ultimate purpose of analysis is to answer
the research questions outlined in the
objectives with your data.
5. INTRODUCTION
Before we look at how variables may be
affecting one another, we need to
summarise the information obtained on
each variable in simple tabular form or
in a figure.
6. Types of Data
Numerical data (Quantitative),
Categorical data (Qualitative).
In analysing your data, it is important first of
all to determine the type of data that we are
dealing with.
This is crucial because the type of data
determines the type of statistical techniques
that should be used to test whether the
results of the study are significant
8. NOMINAL DATA
In NOMINAL DATA, the variables are divided
into a number of named categories.
These categories, however, cannot be
ordered one above another (as they are not
greater or lesser than each other)
9. Examples of Nominal Categorical
Data
Sex
Male
Female
Marital Status
single,
married,
widowed,
separated/divorced
10. Ordinal Data
In ORDINAL DATA, the variables are also
divided into a number of categories, but these
can be ordered one above another, from
lowest to highest or vice versa
11. Examples of Ordinal Categorical Data
Level of knowledge
Good,
average,
Poor
Opinion on a statement
Fully agree,
agree,
doubt,
disagree,
totally disagree
12. Numerical data
NUMERICAL DATA are obtained from
variables that are expressed in numbers
There are two types of numerical data;
Discrete
Continuous
13. Discrete Data
DISCRETE DATA are a distinct series of
numbers
Examples of Discrete Data
Number of motor vehicle accidents
Number of clinic visits
Number of pregnancies per woman
14. CONTINUOUS DATA
CONTINUOUS DATA come from variables that
can be measured with greater precision,
Depending on the accuracy of the measuring
instrument, and each value can increase or
decrease without limit
Examples of Continuous Variables
Height
Temperature
Age
15. Presentation of Numerical Data
Numerical data can be presented as:
Frequency distributions
Percentages, proportions, ratios and rates
Figures
Measures of central tendency
16. Frequency distributions
A FREQUENCY DISTRIBUTION is a
description of data presented in tabular form
It gives the frequency with which a particular
value appears in the data.
Frequency Distribution can be made for;
Categorical Data
Nominal
Ordinal
Numerical Data
17. Frequency Distribution for categorical
data
Define the variable
Determine the categories
A frequency distribution is calculated by
simply totalling the number of responses in
each category
We usually express frequency distributions in
percentages
18. Preferred method of Contraception Among
Teenagers in Ngara District 2002
Method Number Percentage
Abstinence 5 4.5%
Condom 20 18.2%
Injectables 25 22.7%
Pills 60 54.6%
Total 110 100%
19. Frequency Distribution for Numerical Data
Procedures for making frequency distributions
of numerical data are very similar to those for
categorical data,
Except that now the data have to be grouped
in categories
The steps involved in making a frequency
distribution are as follows:
Elect groups for grouping the data.
Count the number of measurements in each
group.
Add up and check the results.
20. Rules for Grouping Numerical Data
The groups must not overlap, otherwise there is
confusion concerning in which group a measurement
belongs.
There must be continuity from one group to the next,
which means that there must be no gaps. Otherwise
some measurements may not fit in a group.
The groups must range from the lowest measurement to
the highest measurement so that all of the
measurements have a group to which they can be
assigned.
The groups should normally be of an equal width, so that
the counts in different groups can easily be compared.
21. Rules for Grouping Numerical Data
When you start summarising data it is better to
make too many groups than too few. This is
because during data analysis you can combine
groups to form new categories without having to
go through all your data again
As a general rule choose round numbers for the
lower values of the group limits
22. 1. PERCENTAGES
A PERCENTAGE is the number of units in the
sample with a certain characteristic, divided by
the total number of units in the sample and
multiplied by 100
Percentages may also be called RELATIVE
FREQUENCIES.
Percentages standardise the data, which
means that they make it easier to compare
them with similar data obtained in another
sample of different size or origin.
23. 2. PROPORTIONS
Sometimes relative frequencies are
expressed in proportions instead of
percentages.
A PROPORTION is a numerical expression
that compares one part of the study units to
the whole; A proportion can be expressed as
a FRACTION or in DECIMALS
Note that when a proportion expressed in
decimals is multiplied by 100, the value
obtained is a percentage
24. 3. RATIOS
A RATIO is a numerical expression which
indicates the relationship in quantity, amount
or size between two or more parts
In a sample where there are 22 male and 23
female, the ratio of male to female is 22:23
or 2:3
25. 4. RATES
A RATE is the quantity, amount or degree of
a disease or event measured over a specified
period of time
Commonly used rates in the health sector
are:
Birth Rate
Death Rate
Infant Mortality Rate (IMR)
Incidence Rate
Prevalence Rate
26. 5. FIGURES
The most frequently used figures for presenting
data include:
1. Bar charts
2. Pie charts
These two are for Categorical data
3. Histograms
4. Line graphs
5. Scatter diagrams
6. Maps
These four are for Numerical data
27. 1. Bar Chart
Health personnel from 148 different rural health
institutions were asked the following question:
How often have you run out of drugs for the
treatment of malaria in the past two years? This
was a closed question with the following possible
answers: never, 1 to 2 times (rarely), 3 to 5 times
(occasionally), more than 5 times (frequently).
The number of responses in each category were
totalled to give the following frequency distribution:
29. Relative frequency of shortage of anti-malaria
drugs in rural health institutions (n=148)
30. 2. Pie Chart
A pie chart can be used for the same set of
data, providing the reader with a quick
overview of the data presented in a different
form. A pie chart illustrates the relative
frequency of a number of items. All the
segments of the pie chart should add up to
100%.
31. Relative frequency of shortage of anti-malaria
drugs in rural health institutions (n=148)
32. 3. Histograms
Numerical data are often presented in
histograms, which are very similar to the bar
charts which are used for categorical data.
An important difference however is that in a
histogram the ‘bars’ are connected (as long
as there is no gap between the data),
whereas in a bar chart the bars are not
connected, as the different categories are
distinct entitles
33. Distribution of clinics according to number of
patients treated for malaria in one month.
Number of
patients
Number of
clinics
Relative
frequency
0 - 19
20 - 39
40 - 59
60 - 79
80 - 99
100 -119
120 -139
140 -159
25
3
5
11
19
10
4
3
31%
4%
6%
14%
24%
12%
5%
4%
Total 80 100%
34. Percentage of clinics treating different numbers
of malaria patients in one month (n=80).
35. 4. Line graphs
A line graph is particularly useful for
numerical data if you wish to show a trend
over time.
36. Daily and weekly summaries of malaria cases in health
centres in District X.
Day 1 9 cases
Day 2 12
Day 3 11
Day 4 13
Day 5 14
Day 6 13
Day 7 16
Week 1 88 cases
Day 8 16 cases
Day 9 16
Day 10 18
Day 11 19
Day 12 16
Day 13 21
Day 14 25
Week 2 131 cases
Day 15 28
Day 16 28
Day 17 28
Day 18 32
Day 19 21
Day 20 19
Week 3 168 cases
37. Daily number of malaria patients at
the health centres in District X.
38. 5. Scatter diagrams
Scatter diagrams are useful for showing
information on two variables which are
possibly related
An Example is the relationship between child
weight and Annual family income
40. 6. MAPS
In addition to the figures, the use of maps may be
considered to present information.
For instance, the area where a study was carried out can
be shown in a map.
If the study explored the epidemiology of cholera, a map
could be produced showing the geographical distribution of
cholera cases, together with the distribution of protected
water sources
If the study related to vaccination coverage, a map could be
developed to indicate the clinic sites and the vaccination
coverage among under-fives in each village,
41. 7. MEASURES OF CENTRAL
TENDENCY
if one wants to further summarise a set of
observations, it is often helpful to use a
measure which can be expressed in a single
number.
First of all, one would like to have a measure
for the centre of the distribution.
The three measures used for this purpose are
the MEAN,
the MEDIAN and
the MODE.
42. The Mean
The MEAN (or arithmetic mean) is also
known as the AVERAGE.
It is calculated by totalling the results of all
the observations and dividing by the total
number of observations.
Note that the mean can only be calculated for
numerical data.
43. The Median
The MEDIAN is the value that divides a
distribution into two equal halves.
The median is useful when some
measurements are much bigger or much
smaller than the rest.
The mean of such data will be biased toward
these extreme values.
Thus the mean is not a good measure of the
centre of the distribution in this case
44. The Median. cont
The median is not influenced by extreme values.
The median value, also called the central or
halfway value, is obtained in the following way:
List the observations in order of magnitude (from
the lowest to the highest value or vice versa).
Count the number of observations (n).
The median value is the value belonging to
observations number (n + 1) / 2 if n is odd or the
average of the middle two numbers.
45. The Mode
The MODE is the most frequently occurring
value in a set of observations.
The mode is not very useful for numerical
data that are continuous. It is most useful for
numerical data that have been grouped.
The mode can also be used for categorical
data, whether they are nominal or ordinal.
46. SUMMARY
In summary, the mean, the median and the
mode are all measures of central tendency.
The mean is most widely used as it contains
more information because the value of each
observation is taken into account in its
calculation.
However, the mean is strongly affected by
values far from the centre of the distribution,
while the median and the mode are not.
The calculation of the mean forms the
beginning of more complex statistical
procedures to describe and analyse data.
47. Figure 22.6 shows a distribution curve in which the mean,
the median and the mode have different values.