RMBS M1 Lecture 1a.pptx

PROF DR WAQAR AHMED AWAN
PhD In Rehabilitation Sciences
RESEARCH METHODOLOGY &
BIOSTATISTICS

STATISTICS
 In the investigation of most clinical research questions, some form of
quantitative data will be collected.
 Initially these data exist in raw form, which means that they are nothing
more than a compilation of numbers representing empirical observations
from a group of individuals.
 For these data to be useful, must be organized, summarized, and
analyzed, so that their meaning can be communicated.
 These are the functions of the branch of mathematics called statistics.

BIOSTATISTICS
 Is the application of statistics to a wide range of topics in biology.
 It encompasses the design of biological experiments, especially in
medicine, pharmacy, agriculture and fishery; the collection,
summarization, and analysis of data from those experiments; and
the interpretation of, and inference from, the results.
 A major branch is medical biostatistics, which is exclusively
concerned with medicine and health.

RESEARCH METHOD
 Is a systematic plan for conducting research.
 Researcher draw on a variety of both qualitative and quantitative
research methods, including experiments, survey research, participant
observation, and secondary data.
 Quantitative methods aim to classify features, count them, and create
statistical models to test hypotheses and explain observations.
 Qualitative methods aim for a complete, detailed description of
observations, including the context of events and circumstances.

DESCRIPTIVE STATISTICS
 Descriptive statistics are used to characterize the shape, central
tendency, and variability within a set of data, often with the intent to
describe a population.
 Measures of population characteristics are called parameters.
 A descriptive index computed from sample data is called a statistic.

Distribution
• The total set of scores for a particular variable
is called a distribution.
• This table presents a set of hypothetical scores
of 48 therapists on a test of attitudes toward
working with geriatric clients.
• For this example, a maximum score of 20
indicates an overall positive attitude; zero
indicates a strong negative bias.
• The total number of scores in the distribution is
given the symbol n.
• In this sample, n= 48.

Frequency Distribution
• A frequency distribution is a table of rank
ordered scores that shows the number of
times each value occurred, or its frequency
• The first two columns in Table 17.1B show
the frequency distribution for the attitude
scores.
• We can see the lowest and highest scores,
where the scores tend to cluster, and which
scores occurred most often.

Percentages
 Sometimes frequencies are more meaningfully expressed as percentages
of the total distribution.
 The percentage represented by each score in the distribution, or at the
cumulative percentage obtained by adding the percentage value for
each score to all percentages that fall below that score.
 For example, it may be useful to know that 18.8% of the sample had a
score of 15 or that 56.3% of the sample had scores of 15 and below.
 Percentages are useful for describing distributions because they are
independent of sample size.

For example
 suppose we tested another sample with 150 therapists, and
found that 84 individuals obtained a score of 15.
 Although there are more people in this second sample with this
score than in the first sample, they both represent the same
percentage of the total sample (56%).

Grouped Frequency
Distributions
• If researchers will find that very few subjects, if any,
obtain the exact same score.
• Consider a hypothetical sample of 30 patients for whom
we obtained measurements of shoulder abduction
range of motion, shown in Table 17.2A.
• Obviously, creating a frequency distribution is a useless
process if almost every score has a frequency of one.
• In this situation, a grouped frequency distribution can be
constructed by grouping the scores into classes, or
intervals, where each class represents a unique range
of scores within the distribution.
• Frequencies are then assigned to each interval.

• The classes represent ranges of 10 degrees.
• The classes are mutually exclusive (no overlap) and
exhaustive within the range of scores obtained.
• The choice of the number of classes to be used and the
range within each class is an arbitrary decision.
• It depends on the overall range of scores, the number of
observations, and how much detail is relevant for the
intended audience.
• Although information is inherently lost in grouped data, this
approach is often the only feasible way to present
comprehensible data when large amounts of information are

Graphing Frequency Distributions
 Graphic representation of data often communicates information
about trends and general characteristics of distributions more
clearly than a tabular frequency distribution.
 The most common methods of graphing frequency distributions are
 Stem-and-leaf Plot,
 Histogram, And
 Frequency Polygon.

Stem-and-Leaf Plot
 The stem-and-leaf plot is a refined grouped frequency distribution that is
most useful for presenting the pattern of distribution of a continuous
variable.
 The pattern is derived by separating each score into two parts.
 The leaf consists of the last or rightmost single digit of each score,
 the stem consists of the remaining leftmost digits.
 A stem-and-leaf plot for the shoulder range of motion data.
 The scores have left most digits of 6 through 13. These values become the

 To read the stem-and-leaf plot, we look across each row, attaching each
single leaf digit to the stem.
 Therefore, the first row represents the scores 60 and 68; the second row,
72, 77 and 77; the third row, 80, 82, 84, 85 and 86; and so on.
 This display provides a concise summary of the data, while
maintaining the integrity of the original data.
 If we compare this plot with the grouped frequency distribution, it is
clear how much more information is provided by the stem-and-leaf
plot in a small space, and
 how it provides elements of both tabular and graphic displays.

Histogram
• A histogram is a bar graph, composed of a
series of columns, each representing one
score or class interval. Figure 17.1A is a
histogram showing the distribution of attitude
scores given in Table 1 7.1.
• The frequency for each score is plotted on the
Y-axis (vertical), and the measured variable, in
this case attitude score, is on the X-axis
(horizontal).
• The bars are centered over the scores.

Frequency Polygon
• A frequency polygon is a line plot, where
each point on the line represents
frequency or percentage.
• When grouped data are used, the dots in
the graph are located at the midpoint of
each class interval to represent the
frequency in that class.

Shapes Of Distribution
• Some distributions are symmetrical; that is,
each half is a mirror image of the other.
• Curves A and B in Figure 17.2 are
symmetrical.
• When scores are equal throughout the
distribution, the shape is described as uniform,
or rectangular, as shown in Curve A.
• Curve B represents a special case of the
symmetrical distribution called the normal
distribution.
• In statistical terminology, "normal" refers to a
specific type of bell-shaped distribution where
most of the scores fall in the middle of the

• A skewed distribution is asymmetrical.
• The degree to which the distribution deviates from
symmetry is its skewness.
• A figure positively skewed, or skewed to the right,
because most of the scores cluster at the low end
and only a few scores at the high end have
caused the tail of the curve to point toward the
right.
• When the curve "tails off" to the left, the

For Example
• If we were to plot a distribution for annual family income in the
United States, it would be positively skewed, because most
families have low to moderate incomes.
• When the curve "tails off" to the left, the distribution is negatively
skewed, or skewed to the left, We might see a negatively skewed
distribution if we plotted exam scores for an easy test, on which
relatively few students achieved a low score.

VARIABLES
 Variables: the events, characteristics, behaviors, or conditions that
researchers measure and study.
 A variable is either a result of some factor (dependent) or is itself
the factor (independent) that causes a change in another variable
e.g. treatment, range of motion, pain intensity etc.

Independent Variable
 Independent variables are aspect (factor) of a study which a
practitioner can control or choose
 They are called independent variables because they do not depend
on other variables for change
 These variables are cause of outcome of a research
 Treatment is an independent variable in rehabilitation sciences

Dependent Variables
 These are the factors of a study that will change as a result of change in
an independent variable
 These are the factors which a clinician measure or observe
 Range of motion and pain rating are common dependent variables in
physical therapy.
 e.g. Treatment of wrist drop will prevent overstretching of wrist extensor,
maintain ROM and maintain muscle strength etc
 A dependent variable must be defined operationally.

Controlled or Constant Variables
 These are the factors or conditions of a study that are kept
unchanged in an experiment.
 There can be more than one controlled variables in an experiment
 Age will be a constant factor if the participants belong to same age
group e.g. participants are 40 years old

Confounding Variables
 An extraneous variable in a research that correlates (directly or
inversely) with independent variable and distorts the results
 Age can be among confounding factors in rehabilitation sciences
while studying osteoarthritis
 Limited ROMs among diabetic patients

Types of Variables with Respect to
MEASUREMENT
• Variables can be classified with respect to measurement into
– Categorical Variable
– Numerical Variable

• A categorical variable is one for which the observations
recorded result in a set of categories
• There is a distinct demarcation between the categories
• For example:
– Gender (male and female)
– Recovery from treatment (not recovered, partially recovered and
completely recovered)
• Categorical variables are often referred to as qualitative
variables
Types of Variables with Respect to Measurement
Categorical Variable

• A numerical variable is one for which the observations
are recorded in numerical values such as, age, height,
etc.
• Numerical variable has further two types i.e., Discrete
and Continuous
• Numerical variable is often referred to as a quantitative
variable
Numerical Variable

• Discrete Variable
– A variable that is capable of taking a set of discrete
numerical values such as 10, 15, 1, 199, etc., but not every
possible value between two given numbers
– For example, The number of heart beats in a fixed time
period, number of successful operations in a hospital;
number of cases reported at a casualty
Numerical Variable

Numerical Variable
 Continuous Variable
• A variable, which is capable of taking every possible value
between two given number is termed as a continuous variable.
• Age, weight, length, etc. are a few examples of continuous
variables

data take many different forms: categorical variable and numerical variable

Exercise
Variable How it will be measures Variables Type Variable Subtype
Gender Male & Female
Marital Status Divorced, Widowed, Single,
Married
Blood Pressure In mmHg
Hypertension Mild, Moderate, Severe
Age In Years
Age Class <40, 40 -60, >60
Weight In Kg
Height In Inches
Number of Children Frequency
Number of school days Frequency
Depression Mild, Moderate & Severe
Anxiety Yes & No
Quality of Life Bad, average, good

Exercise
Variable How it will be measures Variables Type Variable Subtype
Gender Male & Female Categorical Data Dichotomous/binary
Marital Status Divorced, Widowed, Single, Married Categorical Data Nominal
Blood Pressure In mmHg Numerical Data Continuous
Hypertension Mild, Moderate, Severe Categorical Data Ordinal
Age In Years Numerical Data Continuous
Age Class <40, 40 -60, >60 Categorical Data Ordinal
Weight In Kg Numerical Data Continuous
Height In Inches Numerical Data Continuous
Number of Children Frequency Categorical Data Discrete
Number of school days Frequency Categorical Data Discrete
Depression Mild, Moderate & Severe Categorical Data Ordinal
Anxiety Yes & No Categorical Data Dichotomous/binary
Quality of Life Bad, average, good Categorical Data Ordinal

MEASURES OF CENTRAL
TENDENCY
 Although frequency distributions enable us to order data and
identify group patterns, they do not provide a practical quantitative
summary of a group's characteristics.
 Numerical indices are needed to describe the "typical" nature of the
data and to reflect different concepts of the "center" of a
distribution.
 These indices are called measures of central tendency, or averages
 The term average can denote three different measures of central
tendency:
 The mode,
 The median, and
 The mean.

Mean
• The mean is the sum of a set of scores divided by the
number of scores, n .
• This is the value most people refer to as the "average."
• The symbol used to represent the mean of a population is
the Greek letter μ mu (µ), and the mean of a sample is
represented by X.
• The bar above the X indicates that the value is an average
score.
• The formula for calculation of the sample mean from raw

• This is read, "the mean equals the sum of X
divided by n, " where X represents each
individual score in the distribution.
• For example, we can apply this formula to
the ROM scores shown in Table 17.2. In this
distribution of thirty scores, the sum of
scores is 2,848. Therefore, X = 2,848/30 =
94.9.

Median
• The median of a series of observations is that value above which
there are as many Scores as below it
• it divides a rank-ordered distribution into two equal halves.
• When a distribution contains an odd number of scores, such as
4, 5, 6, 7, 8, the middle score, 6, is the median.
• With an even number of scores, the midpoint between the two
middle scores is the median, so that for the series 4, 5, 6, 7, 8, 9,
the median lies halfway between 6 and 7. Therefore, the median
equals 6.5.
• For the distribution of attitude scores given in Table, with n = 48,

Mode
• The mode is the score that occurs most frequently in a
distribution.
• It is most easily determined by inspection of a frequency
distribution.
• When class intervals are used, the mode is taken as
the midpoint of the interval with the largest frequency.
• When more than one score occurs with the highest
frequency, a distribution is considered bimodal (with two
modes) or multimodal (with more than two modes).
• Many distributions of continuous variables do not have a
mode.
• The mode has only limited application as a measure of
central tendency for continuous data, but can be useful in
the assessment of categorical variables.

Advantage Of Median
 The advantage of the median as a measure of central tendency is
that it is unaffected by the value of extreme scores.
 It is an index of average position in a distribution, so useful
measure in describing skewed distributions.
 For instance, the average cost of a house is usually cited in terms
of the median, because the distribution tends to be skewed to the
right.

Comparing Measures of Central Tendency
 All three measures of central tendency can be applied to variables
on the interval or ratio scales, although the mean is most useful.
 For data on the nominal scale, only the mode is meaningful.
 If data are ordinal, both the median and mode can be applied.
 the mean is considered the most stable; that is, if we were to
repeatedly draw random samples from a population, the means of
those samples would fluctuate less than the mode or median.

• We can also consider the utility of the three measures of
central tendency for describing distributions of different
shapes.
• With uniform and normal distributions, any of the three
averages can be applied with validity.
• With skewed distributions, however, the mean is limited as
a descriptive measure because, unlike the median and
mode, it is affected by the quantitative value of every score
in a distribution and can be biased by extreme scores.
• For instance, in the previous example of ROM scores, if
the first subject obtained a score of 20 instead of 60,
the mean would decrease from 94.9 to 93.6. The
median and mode would be unaffected by this change.

• The curves in Figure illustrate how measures of
central tendency are affected by skewness.
• The median will typically fall between the mode
and the mean in a skewed curve, and the mean
will be pulled toward the tail.
• Because of these properties, the choice of which
index to report with skewed distributions
depends on what facet of information is
appropriate to the analysis.
• It is often reasonable to report all three values, to

MEASURES OF VARIABILITY
 The shape and central tendency of a distribution are useful but incomplete
descriptors of a sample.
 If we were to describe these two distributions using measures of central
tendency only, they would appear identical; however, a careful glance
reveals that the scores for Group B are more widely scattered than those
for Group A.
 This difference in variability, or dispersion of scores, is an essential element
in data analysis.
 The description of a sample is not complete unless we can characterize the
differences that exist among the scores as well as the central tendency of
the data.

Range
• The simplest measure of variability is the
range, which is the difference between the
highest and lowest values in a distribution.
• For the test scores reported in Table, the
range for Group A is 88 - 78 = 10, and for
Group B, 98 - 65 = 33. *
• These values suggest that the first group was
more homogeneous.
• Although the range is a relatively simple
statistical measure, its applicability is limited
because it is determined using only the two
extreme scores in the distribution.

• It reflects nothing about the dispersion of scores between the two
extremes.
• One aberrant extreme score can greatly increase the range, even
though the variability within the rest of the data set is unchanged.
• In addition, the range of scores tends to increase with larger
samples.
• Therefore, although it is easily computed, the range is usually
employed only as a rough descriptive measure, and is typically

Percentile
 Percentiles are used to describe a score's position within a
distribution.
 Percentiles divide data into 100 equal portions.
 A particular score is located in one of these portions, which
represents its position relative to all other scores.
 For example, if a student taking a college entrance examination scores
in the 92nd percentile (P92), that individual's score was higher than
92% of those who took the test.
 Percentiles are helpful for converting actual scores into
comparative scores or for providing a reference point for
interpreting a particular score.
 For instance, a child who scores in the 20th percentile for weight in his
age group can be evaluated relative to his peer group, rather than

Quartiles
 Quartiles divide a distribution into four equal parts, or quarters.
 Therefore, three quartiles exist for any data set.
 Quartiles Q1, Q2, and Q3 correspond to percentiles at 25%, 50%,
and 75% of the distribution (P25, P50, P75).
 The score at the 50th percentile or Q2 is the median.
 The distance between the first and third quartiles, Q3 - Q1, is
called the interquartile range, which represents the boundaries of
the middle 50% of the distribution.

Box Plot
• A box plot graph, also called a box-and-
whisker plot, (Figure) is a useful way to
demonstrate visually the spread of scores in
a distribution, including the median and
interquartile range.
• 1 Box plots may be drawn with the
"whiskers" representing highest and lowest
scores.
• The whiskers may also be drawn to
represent the 90th and 10th percentiles, and

VARIANCE
• Measures of range have limited application as indices of
variability because they are not influenced by every score
in a distribution and they are sensitive to extreme scores.
• Variance is a is the sum of the squared differences
between each data point and the mean, divided by the
number of data values.
• Variance reflects the variation within a full set of scores.
• Variance is small if scores are close together and large if
they are spread out.
• It should also be objective so that we can compare
samples of
different sizes and determine if one is more variable than
another
• Obviously, samples with larger deviation scores will be

Standard Deviation
• The limitation of variance as a descriptive measure of a sample's
variability is that it was calculated using the squares of the deviation
scores.
• It is generally not useful to describe sample variability in terms of squared
units
• Therefore, to bring the index back into the original units of measurement,
we take the positive square root of the variance.
• This value is called the standard deviation, symbolized by s.

• The standard deviation of sample data is
usually reported along with the mean so
that the data are characterized according
to both central tendency and variability.
• A mean may be expressed as X = 83.63 ±
1 2.22, which tells us that the average of
the deviations on either side of the mean is
12.22.
• An error bar graph shows these values for
both groups, illustrating the difference in

Coefficient of Variation
• The coefficient of variation (CV) is another measure of variability that can
be used to describe data measured on the interval or ratio scale.
• It is the ratio of the standard deviation to the mean, expressed as a
percentage:
• There are two major advantages to this index.
• First, it is independent of units of measurement because units will
mathematically cancel out. Therefore, it is a practical statistic for comparing
distributions recorded in different units.
• Second, the coefficient of variation expresses the standard deviation as a
proportion of the mean, thereby accounting for differences in the
magnitude of the mean.
• The coefficient of variation is, therefore, a measure of relative variation,
most meaningful when comparing two distributions.

Example
 a study of normal values of lumbar spine range of
motion, in which data were recorded in both degrees
and inches of excursion.
 The mean ranges for 20- to 29-year-olds were X = 41
.2 ± 9.6 degrees, and X = 3.7 ± 0.72 inches,
respectively.
 The absolute values of the standard deviations for
these two measurements suggest that the measure of
inches, using a tape measure, was much less

• because the means and units are
substantially different, we would
expect the standard deviations to
be different as well.
• By calculating the coefficient of
variation, we get a better idea of
the relative variation of these two
measurements:

RMBS M1 Lecture 1a.pptx

More Related Content

Similar to RMBS M1 Lecture 1a.pptx (20)

Recently uploaded (20)

RMBS M1 Lecture 1a.pptx

Editor's Notes