4)central tendency and dispersion biostatistics

Measures of central tendency
 Measure of central tendency are summary statistics used to indicate the central location of
group a of data values.
 It may also be called a center or location of the distribution.
Objectives of central tendency
To facilitate comparison.
To describe characteristic of entire group.
To help in decision making.
To know about universe from a sample.

2. Requisites for an ideal measure of central tendency:
It should be rigidly defined.
It should be simple to understand.
It should be easy to calculate.
It should be suitable for further mathematical treatment.
It should be least affected by fluctuation of sample.
It should not be affected by extreme observations.
3. Types of central tendency:
Mean (Mathematical average)
Median (Positional average)
Mode (Positional average)

A. Arithmetic mean:
Arithmetic mean (A.M) of a set of data may be defined as the sum of observation divided
by the number of observation.
The mean is the most commonly used measure of central tendency.
=
=
1. Individual series
=
(a) (direct method)
(b) = A + (Short cut method)

2.Discrete series
=
(a) (Direct method)
= (Shortcut method)
(b)
3. Continuous series
=
(a)
Where, A=assumed mean
d=X-A and N=total frequency
Where, m=mid value
(b) =
(Direct method)
A +
Where, A = assumed mean
d=X-A and N=total frequency
(c)
A +
= A + X h
Where, d'
=
h=class size

The health expenditure of 5 families in rupees are given
below.
Family A B C D
Health expenditure (RS) 3000 4000 1500 3500
Calculate arithmetic mean
Solution:
Family Items Health expenditures(RS)
A 3000
B 4000
C 1500
D 3500
Total Ʃ𝑋 = 12000
Here, n=4, Ʃ𝑋 = 12000
A.M = 𝑋 =
Ʃ𝑋
𝑛
=
12000
4
=3000

4)central tendency and dispersion biostatistics

Combined mean
If X1 and X2 are the mean of two different groups having frequencies N1 and
N2,the combined mean is given by:
= X1N1
+ X2 N2
N1+N2

Example:
Q. The mean monthly salary of 10 lady doctors is Rs. 400 per month and that of 20 male doctors
is Rs. 600 per month, calculate the mean monthly salary of all doctors taken together.
solution
Here, N1 = 10
N2= 20
X1
X2
=
=
400
600
Hence,
X1 X2
N1 + N2
N1+N2
=
= 10 x 400 20 x 600
+
10 + 20
= 4000 + 12000
30
533.33
=
Therefore mean monthly salary = Rs.533.33

Weighted arithmetic mean
The arithmetic average discussed above is simple arithmetic average in which all the items are
assumed to be equally important in the distribution. But in practice, this may not be so. The
importance of some items in a distribution may be greater than the other. so in such cases
proper weightage should be given to various items. Now, we defined the following weighted
arithmetic average in which proper weight is considered.
Let be the weights given to the variate values X1,x2,x3……..;xn
respectively,
Then, their weighted arithmetic mean denoted by xw
is defined by,
W1 ,W2 ,W3,………;Wn
xw
= w1x1+w2x2+……+wnxn
w1+w2+……+wn
=

Example:
Q. A contractor employees three types of workers- male, female and children .To a male
Worker he pays Rs.10 per day, to a female worker Rs.8 per day and to a child worker
Rs.3 per day . If the number of male, female and child workers employees is 20, 15 and
5 respectively . What is the average wage?
solution
Here ,the suitable average is weighted mean.
We have,
xw =
Calculation of weighted average
Wages per day(X) No. of workers(W) (WX)
10 20 200
8 15 120
3 5 15
W=40 WX=335
Hence, the average wage is Rs.8.38

Geometric mean:
The geometric mean of the n non-zero and non-negative variate values is the nth
root of their product.
G=
It is used for rates ,ratios ,percentage variate values and exponentially expressed values.
x1.x2.x3…………..xn
n

Harmonic mean:
Harmonic mean is the reciprocal of arithmetic mean of the reciprocal of the set of non-zero
Variate values .Harmonic mean is used for rates and ratios type of variables.
H=
n
∑

Merits And Demerits Of Arithmetic Mean
MERITS :
It is rigidly defined.
It is based on all observations.
It is simple to understand and easy to calculate.
It is suitable for further mathematical treatments.
It is least affected by fluctuation of sampling.
2.DEMERITS :
It is very much affected by extreme observations.
It cannot be computed in case of open end classes.
It gives sometimes fallacious conclusion.
It cannot be determined by inspection or by graphical method.
It cannot be used if we are dealing with qualitative characteristics which cannot be
measured quantitatively.

B. MEDIAN
The values which divides the distribution into two equal
parts, provided the observations are arranged in the order
of magnitude.
If the number of observations in a series is odd , then the median is
the middle value and if the number of the observation is even, then
the median is the midpoint between the two middle values.
A.For Individual series:
Arrange the data in ascending order or descending order
of magnitude and apply the formulae,
Median= Size of
th
item

B. For Discrete series :
Arrange the data in ascending or descending order of magnitude.
Obtain c.f. (cumulative frequency).
Apply the formulae,
M.d. = size of
Now look at the c.f. column and see that value which is either
equal to or
th
item
greater than ,this gives the value of
C. For continuous series:
Median = L+
_ c.f
f
x h
Median.

1.Merits:
Easy to understand and simple to compute.
Is rigidly defined.
Not influenced by the extreme values.
Can be computed even for open .
Median is only the average to be used while dealing with while
Dealing with qualitative characteristics such as intelligence, beauty
etc.
2.Demerits:
Arrangement of data is necessary.
Not based on all the observations.
Not suitable for further mathematical treatment.
Exact value cannot be determined in case of even numbers.

C. Mode:
Mode is that variate value which repeats maximum number of times .
It is used in business for forecasting rates of goods ,But rarely used in
medical sciences.
In case of individual and discrete series the mode can easily found out by
inspection But in case of continuous series, we use the following formula
to calculate the mode.
Mode= L + 1
1 2
+
x h
Where L= lower limit of modal class
1 = f1 – f0
2
=
f1 – f2
f1
= maximum frequency,
f2
= frequency following the modal class,
f0
= frequency preceding modal class
,
h= size of modal class

Merits:
It is easy to calculate and simple to understand.
It is not affected by extreme observations.
It can be calculated for open end classes.
It can be obtained by inspection or by graph.
Demerits :
Mode is not rigidly defined.
mode is not based on all observations.
Mode is not suitable for further mathematical treatment.
Mode is affected by fluctuation of sampling.

Relation between various measures of central tendency:
Mode = 3Median – 2Mean ( Empirical relationship)
A.M ≥ G.M ≥ H.M
1.
2.

The mean weight of 150 students in a certain class is 60 kg .The mean
weight of boys in the class is 70 kg and that of girls is 55 kg .Find the
no. of boys and girls in the class.
Q.
Here,
= 60
N1 = 150
N2
+
N1 = N2
150 -
X1 = 70
X2 = 55
N1
N2
= ?
= ?
We have,
=
N1X1 + N2X2
N1 + N2
……….. ( i )

Or, 60 =
70 N1 + 55 N2
150
Or, 9000 = 70 N1 + 55 N2……………. ( ii )
Putting N1 = ( 150 - N2 ) in eq
n
( ii )
9000= 70 (150 -N2) + 55 N2
9000 = 10500 - 70 N2+ 55 N2
15N2 = 10500 - 9000
15 N2 = 1500
N2 = 100
Or,
Or,
Or,
Or,
.
. .
Putting value of N2 In eq
n
( i )
N1 N2
+ = 150
N1 + 100 = 150
. .
. N1 = 50
Hence, N1= 50 and N2 = 100

Selection of an average:
No single average is suitable for all conditions. The selection of an
average depends upon the nature of the data.
1. All mean, median and mode can be used for symmetrical data.
2. Mean is suggested when:
i) The average of quantitative data is to be calculated but it should
not be used on the following condition:
a. When the data is highly skewed.
b. When the distribution have open end classes.
c. When the distribution have extreme observations.
3. Median is used when:
i) Qualitative data such as intelligence, beauty etc.
ii)Open end classes
iii)Highly skewed distribution.
4. Mode: It is useful for most repeated value, particularly in business
and for highly skewed distribution.

5.Geometric Mean:
i) It is used to calculate average rates ratio and percentage.
ii) Construction of index number.
6.Harmonic Mean:
i) It is used in computing averages related to rates and ratios where
time factor is available.

Numerical problems:
1. In a class of 50 students 10 have failed and their average of marks is
2.5 .The total marks secured by the entire class were 281. Find the
average marks of those who have passed. (Ans=6.4)
2. The mean age of combined group of men and women is 30 years.
If the mean age of the group of men is 32 and that of group of women is
27. Find out the percentage of men and women in the group.(60 n 40)
3. Mean of 100 items was 50. Later on it was found that two items
were misread as 92 and 8 instead of 192 and 88. Find the correct
mean. (Ans=51.8)
4. Arithmetic mean of 98 items is 50. Two items 60 and 70 were left out
at the time of calculation. What is the correct mean of all the items?
(Ans=50.3)

Measure of position (partition) values:
The partition value are those variate values which divides the total
number of observations into equal number of parts .The equal number of
parts may be four, ten, hundred etc.
1. Quartiles
Quartiles are those positional values which divides the ordered series
into four equal parts, so there are 3 quartiles.
Q1 Q2 Q3
Lowest value Highest value
0% 25% 75%
50% 100%
Presentation of data by quartiles
Md
=
fig:

a. First quartile (lower quartile):
Q1 has 25% observation before it and 75% observation after it.
b. Second quartile (Median):
Q2 Coincides with Median. Median divides the total number of
observations into two halves.
c. Third quartile:
Q3 has 75% observation before it and 25% observation after it.

a. For individual and discrete series:
i(N+1)
4
th
Qi = Value of item
Where i =1,2,3
b. For continuous series
Q i = L +
iN
4
f
- c.f
Where i= 1,2,3
x h

Deciles: Deciles are those positional values which divides the ordered
series into 10 equal parts. So there are 9 deciles.
a. For individual series and discrete series:
Dj
= Value of j (N+1)
10
th
item
Where, j =1,2,3……9
b. For continuous series:
Dj = L +
jN
- c.f
10
f
x h
Where, j =1,2,3……9

Percentiles:
Percentiles are those positional values which divides the ordered series
into 100 equal parts. So, there are 99 percentiles.
a. For individual series and discrete series:
PK = Value of k (N+1)
100
th
item
Where, k =1,2,3……99
b. For continuous series:
Pk= L +
kN
100
f
x h
Where, k =1,2,3……99

Measures of dispersion(variability)
Averages gives us the idea of concentration of the items around the
central part of distribution. But the averages do not give the clear
picture about the distribution because two distribution with same
averages may differ in the scatterness of the items from the central value.
X Mo Md
A 25 26 27 27 27 28 29 27 27 27
B 0 10 18 27 27 27 80 27 27 27
From the above table, We see that mean , median and mode of two
series A and B are same. Only with these results we cannot say that the
two series A and B are similar. Because, the difference of the items from
the average in B is more in comparison to A. so, in series A, items are
concentrating more around the central value but the scatterness of the
items from the central value in series B is more. Hence, though two
series A and B have same averages, they cannot be said similar because
they are differently constituted.

Definition :
Dispersion is the scatterness of the items from central value or
measure of variation of the items from the central value.
Main objects of measuring variability are:
1. To determine the reliability of an average.
2. To compare two or more series with regard to their variability.
3. To help in using other statistical terms.

Absolute Measure Of Dispersion:
A measure of dispersion is said to be an absolute if it is
expressed in terms of original units of data.
Relative Measure Of Dispersion:
A measure of dispersion is said to be relative if it is independent of
units of the data.

Requisites of ideal measure of dispersion:
It should be rigidly defined.
It should be simple to understand and easy to calculate.
It should be based on all observations.
It should be suitable for further mathematical treatment.
It should be least affected by fluctuation of sampling.
It should not be affected by extreme values.
Method of measuring dispersion:
1. Range
2. Quartile deviation (Semi-interquartile range)
3. Mean deviation(Average deviation)
4. Standard deviation

Range:
Range is the simplest measure of dispersion. Range is defined as the
differences between the largest item and smallest item in a set of
observation.
Range= Largest item – Smallest item
= L - S
Coefficient of Range:
Coefficient of range is the relative measure correspond to range.
It can be used to compare two distribution with different units.
L - S
L + S
Coefficient of Range =

Merits:
It is rigidly defined.
It is simple to understand and easy to calculate.
Variation can be understood in short time.
Demerits
It is not based on all observation.
It is affected by fluctuation by sampling.
It is affected by extreme values.
It is not suitable for further mathematical treatment.
It cannot be calculated in case of open end classes.

Quartile deviation (Semi-interquartile range)
Interquartile range = Q3-Q1
Quartile deviation(Q.D) =
𝟏
𝟐
(Q3−Q1)
Coefficient of Q.D =
Q3−Q1
Q3+Q1
Merits
• Rigidly defined
• Not affected by extreme values
• Can be calculated in open-end class distribution
• It is better measure of dispersion in comparison to range as
it is based on 50% of central values.

Demerits:
• It is not based on all observation
• It is affected by fluctuation of sampling.
• It is not suitable for further mathematical
treatment.
Example: find Q.D of given data.
2, 4, 6, 8, 10, 12, 14
Q3 = value of 3(
𝑵+𝟏
𝟒
) item
= 6th item
= 12
th
We have, Q1 = value of (
𝑵+𝟏
𝟒
)th item
= (
𝟕+𝟏
𝟒
)𝐭𝐡 item
= 2nd item
= 4
Quartile deviation(Q.D) =
𝟏
𝟐
(Q3−Q1)
= 4

Mean deviation (M.D)
Mean deviation from AM
For individual series
For discrete and continuous series
Where A= mean, median and mode
Coefficient of mean deviation from mean =
𝒎𝒆𝒂𝒏 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝒇𝒓𝒐𝒎 𝒎𝒆𝒂𝒏
𝒎𝒆𝒂𝒏
Coefficient of mean deviation from median =
𝒎𝒆𝒂𝒏 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝒇𝒓𝒐𝒎 𝒎𝒆𝒅𝒊𝒂𝒏
𝒎𝒆𝒅𝒊𝒂𝒏

Merits:
1. Based on all observation.
2. It is easy to understand and calculate.
3. It is less affected by extreme items as compared to other
dispersions.
Demerits:
1. Cannot be computed in open end classes.
2. It is less reliable because of ignoring the sign.
3. It is not suitable for further mathematical treatment.
4. Not suitable measure when mode is ill defined.

Standard Deviation (Root-Mean square deviation)
It is said to be the best measure of dispersion as it satisfies most of the
requisites of good measure of dispersion.
Definition:
It is defined as the positive square root of the mean of the square of
deviations taken from the arithmetic mean .It is denoted by .
1. For individual series:
=
∑ (x-x)2
n
Where, n= total no. of observation

2. For discrete series:
=
f(x-x) 2
N
∑
Where, N= total frequency
3. For continuous series:
= f(m-x )
∑
N
2
Where, N= total frequency

Merits:
 It is rigidly defined.
 It is based on all observation.
 It is least affected by fluctuation of sampling.
 It is suitable for further mathematical treatment.
 It helps in calculating standard error.
Demerits:
It is difficult to compute.
It is very much affected by extreme values.
It cannot be calculated for open end classes.

Variance:
The square of standard deviation is called variance.
2
=
∑f ( x – x )
N
2
Coefficient of variance:
Coefficient of variance is relative measure of finding out dispersion .
It is the ratio of standard deviation and mean expressed as percent.
Coefficient of variance (C.V) =
X
X 100
It is independent of unit .So, two distributions can be easily compared
with the help of coefficient of variance.
Less the C.V , more will be the uniformity, consistency etc.
More the C.V , less will be the uniformity, consistency etc.

In two series of adults aged 21 years and children 3 month old following values were obtained
for height. Find which series shows greater variation?
Persons Mean height SD
Adults 160 cm 10 cm
children 60 cm 5 cm
C.V =
X
X 100
C.V of adults= 10
160
X 100 = 6.25%
C.V of children= 5
60
X 100 =8.33%
Hence C.V of children > C.V of adults i.e. the height of children shows greater variation than
the height of adults.
Q. Example:

Q. Suppose two group of human males yield the following information.
Group A Group B
Age 24years 15years
Mean weight 145lbs 80lbs
Variance 100lbs 100lbs
Find which is more variable, the weight of 24 years old or the weight of 15 years old?
Solution,
For group A For group B
X1 = 145lbs X2 =80lbs
1
2
=100
2
2
=100
1 2
=10 =10
C.V = 1
X1
X 100 C.V = 2
X2
X 100
= 10
145
X 100 = 6.9% = 10
80
X 100 =12.5%

Here ,C.V of B is greater than C.V of A i.e. the weight of 15 years has more
variation than the weight of 24 years old.

4)central tendency and dispersion biostatistics

More Related Content

Similar to 4)central tendency and dispersion biostatistics (20)

Recently uploaded (20)

4)central tendency and dispersion biostatistics