Module 22_ Decscription of variables.ppt

Designing and Conducting Health
Systems Research Projects
Part II: Data Analysis and Report
Writing
Module 22
DESCRIPTION OF VARIABLES

OBJECTIVES
At the end of this session you should be able to:
 Describe data in terms of frequency distributions,
percentages, and proportions.
 Use figures to present data.
 Explain the difference between mean, median and
mode.
 Calculate the frequencies, percentages, proportions,
ratios, rates, means, medians, and modes for the major
variables in your study that require such calculations.
 Identify other independent variables (in addition to the
ones identified during the first workshops), if any, that
are necessary in the analysis of your data.

Sequence of Presentation
I. Introduction
II. Frequency distributions
III. Percentages, proportions, ratios, and rates
IV. Figures
V. Measures of central tendency

INTRODUCTION
 You selected the variables for your study during
proposal development
 Dependent Variables - define your problem
 Independent variables - these were contributory
factors to your problem
 The purpose of data analysis is to identify
whether these assumptions were correct or not,
 The ultimate purpose of analysis is to answer
the research questions outlined in the
objectives with your data.

INTRODUCTION
Before we look at how variables may be
affecting one another, we need to
summarise the information obtained on
each variable in simple tabular form or
in a figure.

Types of Data
 Numerical data (Quantitative),
 Categorical data (Qualitative).
 In analysing your data, it is important first of
all to determine the type of data that we are
dealing with.
 This is crucial because the type of data
determines the type of statistical techniques
that should be used to test whether the
results of the study are significant

Categorical data
 There are two types of categorical data
 Nominal
 Ordinal

NOMINAL DATA
 In NOMINAL DATA, the variables are divided
into a number of named categories.
 These categories, however, cannot be
ordered one above another (as they are not
greater or lesser than each other)

Examples of Nominal Categorical
Data
 Sex
 Male
 Female
 Marital Status
 single,
 married,
 widowed,
 separated/divorced

Ordinal Data
 In ORDINAL DATA, the variables are also
divided into a number of categories, but these
can be ordered one above another, from
lowest to highest or vice versa

Examples of Ordinal Categorical Data
 Level of knowledge
 Good,
 average,
 Poor
 Opinion on a statement
 Fully agree,
 agree,
 doubt,
 disagree,
 totally disagree

Numerical data
 NUMERICAL DATA are obtained from
variables that are expressed in numbers
 There are two types of numerical data;
 Discrete
 Continuous

Discrete Data
 DISCRETE DATA are a distinct series of
numbers
 Examples of Discrete Data
 Number of motor vehicle accidents
 Number of clinic visits
 Number of pregnancies per woman

CONTINUOUS DATA
 CONTINUOUS DATA come from variables that
can be measured with greater precision,
 Depending on the accuracy of the measuring
instrument, and each value can increase or
decrease without limit
 Examples of Continuous Variables
 Height
 Temperature
 Age

Presentation of Numerical Data
Numerical data can be presented as:
 Frequency distributions
 Percentages, proportions, ratios and rates
 Figures
 Measures of central tendency

Frequency distributions
 A FREQUENCY DISTRIBUTION is a
description of data presented in tabular form
 It gives the frequency with which a particular
value appears in the data.
 Frequency Distribution can be made for;
 Categorical Data
 Nominal
 Ordinal
 Numerical Data

Frequency Distribution for categorical
data
 Define the variable
 Determine the categories
 A frequency distribution is calculated by
simply totalling the number of responses in
each category
 We usually express frequency distributions in
percentages

Preferred method of Contraception Among
Teenagers in Ngara District 2002
Method Number Percentage
Abstinence 5 4.5%
Condom 20 18.2%
Injectables 25 22.7%
Pills 60 54.6%
Total 110 100%

Frequency Distribution for Numerical Data
 Procedures for making frequency distributions
of numerical data are very similar to those for
categorical data,
 Except that now the data have to be grouped
in categories
 The steps involved in making a frequency
distribution are as follows:
 Elect groups for grouping the data.
 Count the number of measurements in each
group.
 Add up and check the results.

Rules for Grouping Numerical Data
 The groups must not overlap, otherwise there is
confusion concerning in which group a measurement
belongs.
 There must be continuity from one group to the next,
which means that there must be no gaps. Otherwise
some measurements may not fit in a group.
 The groups must range from the lowest measurement to
the highest measurement so that all of the
measurements have a group to which they can be
assigned.
 The groups should normally be of an equal width, so that
the counts in different groups can easily be compared.

Rules for Grouping Numerical Data
 When you start summarising data it is better to
make too many groups than too few. This is
because during data analysis you can combine
groups to form new categories without having to
go through all your data again
 As a general rule choose round numbers for the
lower values of the group limits

1. PERCENTAGES
 A PERCENTAGE is the number of units in the
sample with a certain characteristic, divided by
the total number of units in the sample and
multiplied by 100
 Percentages may also be called RELATIVE
FREQUENCIES.
 Percentages standardise the data, which
means that they make it easier to compare
them with similar data obtained in another
sample of different size or origin.

2. PROPORTIONS
 Sometimes relative frequencies are
expressed in proportions instead of
percentages.
 A PROPORTION is a numerical expression
that compares one part of the study units to
the whole; A proportion can be expressed as
a FRACTION or in DECIMALS
 Note that when a proportion expressed in
decimals is multiplied by 100, the value
obtained is a percentage

3. RATIOS
 A RATIO is a numerical expression which
indicates the relationship in quantity, amount
or size between two or more parts
 In a sample where there are 22 male and 23
female, the ratio of male to female is 22:23
or 2:3

4. RATES
 A RATE is the quantity, amount or degree of
a disease or event measured over a specified
period of time
 Commonly used rates in the health sector
are:
 Birth Rate
 Death Rate
 Infant Mortality Rate (IMR)
 Incidence Rate
 Prevalence Rate

5. FIGURES
 The most frequently used figures for presenting
data include:
1. Bar charts
2. Pie charts
These two are for Categorical data
3. Histograms
4. Line graphs
5. Scatter diagrams
6. Maps
These four are for Numerical data

1. Bar Chart
 Health personnel from 148 different rural health
institutions were asked the following question:
How often have you run out of drugs for the
treatment of malaria in the past two years? This
was a closed question with the following possible
answers: never, 1 to 2 times (rarely), 3 to 5 times
(occasionally), more than 5 times (frequently).
 The number of responses in each category were
totalled to give the following frequency distribution:

Categories Number %
Never
Rarely
Occasionally
Frequently
47
71
24
6
32
48
16
4
Total 148 100

Relative frequency of shortage of anti-malaria
drugs in rural health institutions (n=148)

2. Pie Chart
 A pie chart can be used for the same set of
data, providing the reader with a quick
overview of the data presented in a different
form. A pie chart illustrates the relative
frequency of a number of items. All the
segments of the pie chart should add up to
100%.

3. Histograms
 Numerical data are often presented in
histograms, which are very similar to the bar
charts which are used for categorical data.
 An important difference however is that in a
histogram the ‘bars’ are connected (as long
as there is no gap between the data),
whereas in a bar chart the bars are not
connected, as the different categories are
distinct entitles

Distribution of clinics according to number of
patients treated for malaria in one month.
Number of
patients
Number of
clinics
Relative
frequency
0 - 19
20 - 39
40 - 59
60 - 79
80 - 99
100 -119
120 -139
140 -159
25
3
5
11
19
10
4
3
31%
4%
6%
14%
24%
12%
5%
4%
Total 80 100%

Percentage of clinics treating different numbers
of malaria patients in one month (n=80).

4. Line graphs
 A line graph is particularly useful for
numerical data if you wish to show a trend
over time.

Daily and weekly summaries of malaria cases in health
centres in District X.
Day 1 9 cases
Day 2 12
Day 3 11
Day 4 13
Day 5 14
Day 6 13
Day 7 16
Week 1 88 cases
Day 8 16 cases
Day 9 16
Day 10 18
Day 11 19
Day 12 16
Day 13 21
Day 14 25
Week 2 131 cases
Day 15 28
Day 16 28
Day 17 28
Day 18 32
Day 19 21
Day 20 19
Week 3 168 cases

Daily number of malaria patients at
the health centres in District X.

5. Scatter diagrams
 Scatter diagrams are useful for showing
information on two variables which are
possibly related
 An Example is the relationship between child
weight and Annual family income

Weight of five-year-olds according
to annual family income

6. MAPS
 In addition to the figures, the use of maps may be
considered to present information.
 For instance, the area where a study was carried out can
be shown in a map.
 If the study explored the epidemiology of cholera, a map
could be produced showing the geographical distribution of
cholera cases, together with the distribution of protected
water sources
 If the study related to vaccination coverage, a map could be
developed to indicate the clinic sites and the vaccination
coverage among under-fives in each village,

7. MEASURES OF CENTRAL
TENDENCY
 if one wants to further summarise a set of
observations, it is often helpful to use a
measure which can be expressed in a single
number.
 First of all, one would like to have a measure
for the centre of the distribution.
 The three measures used for this purpose are
the MEAN,
the MEDIAN and
the MODE.

The Mean
 The MEAN (or arithmetic mean) is also
known as the AVERAGE.
 It is calculated by totalling the results of all
the observations and dividing by the total
number of observations.
 Note that the mean can only be calculated for
numerical data.

The Median
 The MEDIAN is the value that divides a
distribution into two equal halves.
 The median is useful when some
measurements are much bigger or much
smaller than the rest.
 The mean of such data will be biased toward
these extreme values.
 Thus the mean is not a good measure of the
centre of the distribution in this case

The Median. cont
 The median is not influenced by extreme values.
 The median value, also called the central or
halfway value, is obtained in the following way:
List the observations in order of magnitude (from
the lowest to the highest value or vice versa).
Count the number of observations (n).
The median value is the value belonging to
observations number (n + 1) / 2 if n is odd or the
average of the middle two numbers.

The Mode
 The MODE is the most frequently occurring
value in a set of observations.
 The mode is not very useful for numerical
data that are continuous. It is most useful for
numerical data that have been grouped.
 The mode can also be used for categorical
data, whether they are nominal or ordinal.

SUMMARY
 In summary, the mean, the median and the
mode are all measures of central tendency.
 The mean is most widely used as it contains
more information because the value of each
observation is taken into account in its
calculation.
 However, the mean is strongly affected by
values far from the centre of the distribution,
while the median and the mode are not.
 The calculation of the mean forms the
beginning of more complex statistical
procedures to describe and analyse data.

Figure 22.6 shows a distribution curve in which the mean,
the median and the mode have different values.

Module 22_ Decscription of variables.ppt

More Related Content

Similar to Module 22_ Decscription of variables.ppt (20)

More from Francis452087 (15)

Recently uploaded (20)

Module 22_ Decscription of variables.ppt