SlideShare a Scribd company logo
NURSING Dream ● Discover ● Deliver
Lemma Derseh (BSc., MPH)
1
University of Gondar
College of medicine and health science
Department of Epidemiology and
Biostatistics
Descriptive statistics
NURSING Dream ● Discover ● Deliver
Statistical Methods (branches of statistics)
collection
organizing
summarizing
presenting of data
Descriptive Statistics
making inferences
hypothesis testing
determining relationship
making the prediction
Inferential Statistics
Biostatistics
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Descriptive Statistics
1. Involves
– Collecting Data
– Presenting Data
– Characterizing
Data
2. Purpose
– Describe Data
x = 74.5, S2 = 213
0
50
100
1St 2nd 3rd 4th
Class
size
Batch (one department)
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Descriptive statistics cont…
Types of descriptive statistics
 Tables/charts/graphs …………..
 Measures of central tendency
 Measures of variability
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
Numerical summary
measures
Pictorial measure
NURSING Dream ● Discover ● Deliver
Tables/charts/graphs
 Tables are used in categorical variables or
categorized numerical data
 Tables:
 Frequency (for nominal and ordinal data)
 Relative frequency (for nominal and ordinal data)
 Cumulative frequencies (for ordinal data)
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
The methods of describing data differ depending on the
type of the data itself (i.e. Numerical or Categorical).
NURSING Dream ● Discover ● Deliver
Describing categorical variables … cont
 Frequency is the number of observations in each category
 The relative frequency of a class is the portion or
percentage of the data that falls in that class
E.g. 1: The blood type of 30 patients were given as follows:
A AB B B A O O AB AB B O A A B B A AB A O AB
B AB AB O A AB AB O A O
Construct a table for it
6
Type Frequency Relative frequency
A 8 0.267
B 6 0.20
AB 9 0.30
O 7 0.233
Total 30 1.00
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Distribution of birth weight of newborns between 1976-1996 at TAH.
BWT Freq. Rel.Freq(%) Cum. Freq Cum.rel.freq.(%)
Very low 43 0.4 43 0.4
Low 793 8.0 836 8.4
Normal 8870 88.9 9706 97.3
Big 268 2.7 9974 100
Total 9974 100
7
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
Cumulative relative frequency is relevant for ordinal data
Consider for example, the variable birth weight with levels
‘Very low ’, ‘Low’, ‘Normal’ and ‘Big’.
The cumulative frequency of a class is the sum of the
frequency for that class and all the previous classes.
NURSING Dream ● Discover ● Deliver
Charts
 Charts are used only for categorical variables
 Bar charts
The successive bars are separated (not continuous)
 Pie charts
Each sector of a circle indicates a category of data
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Charts cont…
Bar Chart
 Bar charts: display the frequency distribution for
nominal or ordinal data.
 The various categories into which the observation fall
are represented along horizontal axis and
9
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Fig. 1 Bar chart for blood type of 30 patients
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Pie cart
 Pie chart displays the frequency of nominal or ordinal
variables.
 The various categories of the variable will be represented
by the sector of the circle.
 The area of each sector is proportional to the frequency
of the corresponding category of the variable
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Fig. 3. Pie chart showing the frequency distribution of the
variable blood group
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Categorizing Numeric data
 In order to present and organize numeric type of data using tables or
graphs, we need to group the dataset as follows:
 Number of class: the number of categories the table will have
 Class limit: The range for each class
 Lower class limit
 Upper class limit
 Class boundary: Continuous range of the class limit and it is obtained by
subtracting and adding 0.5 from lower and upper class limit respectively (for
non-decimal data but for decimal 0.05)
 Lower class boundary
 Upper class boundary
 Class mark: The average of lower and upper class limit.
13
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Struge’s rule
 Select a set of continuous, non-overlapping intervals such
that each value in the set of observations can be placed in
one, and only one, of the intervals.
– Where K = number of class intervals
– n = number of observations
– W = width of the class interval
– L = the largest value
– S = the smallest value
14
K 1 3.322(logn)
W
L S
K
 


Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Struge’s rule cont…
 For datasets with integral values subtracted or add 05.from
class limits to find class boundaries
 The answer obtained by applying Sturge’s rule should not be
regarded as final, but should be considered as a guide only.
 The number of class intervals specified by the rule should be
increased or decreased for convenience and clear presentation
15
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Example 1
 The blood lead level measured in μg/dl for 88 sample
individuals living in a region are given as follows(numbers
with blue color are for females and the black for males)
20,21, 22,22,23,23,23,24,24,24,24,25,25,25,25,25,26,26,26,26,26,27,
27,27,27,27,27,28,28,28,28,28,28,28,28,29,29,29,29,29,30,30,30,30,
30,30,30,30,30,31,31,31,31,31,31,31,32,32,32,32,32,33,33,33,33,33,
33,33,34,34,34,34,35,35,35,35,36,36,36,36,36,37,37,37,37,38,38,39
 Construct frequency distribution for the data.
Solution:
16
7
.
2
7
19
7
20
39
K
S
L
W
46
.
7
88)
3.322(log(
1
)
3.322(logn
1
K











Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
≈ 3
NURSING Dream ● Discover ● Deliver
Solution
Blood lead level
Mi frequency RF CF RCF
Class
Limit
Class
Boundaries
20-22 19.5-22.5 21 4 4/88 4 4/88
23-25 22.5-25.5 24 12 12/88 16 16/88
26-28 25.5-28.5 27 19 19/88 35 35/88
29-31 28.5-31.5 30 21 21/88 56 56/88
32-34 31.5-34.5 33 16 16/88 72 72/88
35-37 34.5-37.5 36 13 13/88 85 85/88
38-40 37.5-40.5 39 3 3/88 88 88/88
17
Where:
 RF = relative frequency
 Mi = class mark
 CF = cumulative frequency
 RCF = relative cumulative frequency
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Graphs
 Some examples are:
 Histogram,
 Frequency polygon,
 Cumulative Relative Frequency Curve etc
18
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Histograms
 Histograms are frequency distributions with continuous class
interval that have been turned into graphs.
 The area of each column is proportional to the number of
observations in that interval
19
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Example
The distribution of the blood lead level of 88 individuals
Blood LL No. of Individuals
19.5-22.5 4
22.5-25.5 12
25.5-28.5 19
28.5-31.5 21
31.5-34.5 16
34.5-37.5 13
37.5-40.5 3
20
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
19.5 22.5 25.5 28.5 31.5 34.5 37.5 40.5
Blood lead level
NURSING Dream ● Discover ● Deliver
Frequency polygons
 Instead of drawing bars for each class interval, sometimes
a single point is drawn at the mid point of each class
interval and consecutive points joined by straight line.
 Graphs drawn in this way are called frequency polygons
(line graphs).
21
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Frequency polygons cont…
Frequency polygon for the blood lead level of study
participants
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Frequency polygon of blood lead level for
males and females
23
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
Frequency polygons are superior to histograms for
comparing two or more sets of data.
NURSING Dream ● Discover ● Deliver
Cumulative frequency curve (ogive)
 The horizontal axis displays the different categories/intervals
 The vertical axis displays cumulative (relative) frequency.
 A point is placed at the true upper limit of each interval; the
height represents the cumulative relative frequency
associated with that interval. The points are then connected
by straight lines.
 Like frequency polygons, cumulative frequency curve may be
used to compare sets of data.
 Cumulative frequency curve can also be used to obtain
percentiles of a set of data.
24
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Cumulative frequency curve cont…
 Cumulative relative frequency curve for the blood lead
level of study participants
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
Cumulative
frequency
(prportion
of
individuals
)
The graph ends
at the upper
boundary of the
last class.
The graph begins at the lower
boundary of the first class.
NURSING Dream ● Discover ● Deliver
Box plots
 A visual picture called box (box-and-whisker )plot can be
used to convey a fair amount of information about the
distribution of a set of data.
 It is used as an exploratory data analysis tool
 The box shows the distance between the first and the
third quartiles,
 The median is marked as a line within the box and
 The end lines show the minimum and maximum values
respectively
26
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Box plot is the five-number summary:
The minimum entry
Q1
Q2 (median)
Q3
The maximum entry
Box plots cont…
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
The quartiles are sets of values which divide the distribution
into four parts such that there are an equal number of
observations in each part.
Q1 = [(n+1)/4]th
Q2 = [2(n+1)/4]th
Q3 = [3(n+1)/4]th
NURSING Dream ● Discover ● Deliver
Example: Use the following age data of 15 patients to draw
a box-and-whisker plot.
35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
Box plots cont…
Q3
Q2
Q1
Max
Min
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Illustration of Box-plot using the age of 15 patients
29
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
Notice the
distribution of
data in each
quarter(distance
between
quartiles)
NURSING Dream ● Discover ● Deliver
A box-plot indicating the distribution of blood
lead level of individuals by sex
30
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Measures of central tendency
 It is often useful to summarize, in a single number or statistic,
the general location of the data or the point at which the data
tend to cluster.
 Such statistics are called measures of location or measures of
central tendency.
 We describe them mean, median and mode.
Arithmetic mean
 The arithmetic mean, usually abbreviated to ‘mean’ is the sum of
the observations divided by the number of observations.
31
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Arithmetic Mean
32
.
n
x
=
x
then
,
sample
a
of
values
observed
n
are
x
...,
,
x
,
x
If
n
1
=
i
i
n
2
1

Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
a) Ungrouped mean
Population mean: , if x’s are population observations
x
μ
N


92
.
29
88
9)
3
...
22
21
(20
n
x
=
x
88
1
=
i
n
1
=
i
i








Example: Blood lead level for 88 sample individuals
NURSING Dream ● Discover ● Deliver
Arithmetic Mean cont…
 b) Grouped data
 In calculating the mean from grouped data, we assume that
all values falling into a particular class interval are located
at the mid-point of the interval. It is calculated as follow:
 where,
k = the number of class intervals
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
33


k
1
=
i
i
k
1
=
i
i
i
f
f
m
=
x
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Arithmetic Mean cont…
Blood lead
level
( CB)
Class
mark
(Mi)
frequency
19.5-22.5 21 4
22.5-25.5 24 12
25.5-28.5 27 19
28.5-31.5 30 21
31.5-34.5 33 16
34.5-37.5 36 13
37.5-40.5 39 3
86
.
29
)
3
..
.
12
(4
x3)
39
...
24x12
(21x4
=
x 7
1
=
i
7
1
=
i









Example: Arithmetic mean for grouped data of blood
lead level
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Properties of the arithmetic mean
 The mean can be used as a summary measure for both discrete
and continuous data, in general however, it is not appropriate
for either nominal or ordinal data.
 For a given set of data there is one and only one arithmetic
mean.
 Algebraic sum of the deviations of the given values from their
arithmetic mean is always zero.
 The arithmetic mean is greatly affected by the extreme values.
 In grouped data if any class interval is open, arithmetic mean
cannot be calculated.
35
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Median
 With the observations arranged in an increasing or decreasing order,
the median is defined as the middle observation.
Ungrouped data
 If the number of observations is odd, the median is defined as the
[(n+1)/2]th observation.
 If the number of observations is even the median is the average of
the two middle (n/2)th and [(n/2)+1]th values i.e
 Example , where n is even: 19, 20, 20, 21, 22, 24, 27, 27, 27, 34
 Then, the median = (22 + 24)/2 = 23
 The ungrouped median for the blood lead level data is the average
of the 44th & 45th observation; which is (30+30)/2 =30
36
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Median Cont…
Grouped data
 In calculating the median from grouped data, we assume that
the values within a class-interval are evenly distributed
through the interval.
– The first step is to locate the class interval in which it is
located.
– Find n/2 and see a class interval with a minimum
cumulative frequency which contains n/2.
(Note:- All class intervals with cumulative frequencies ≥ n/2
contain the median)
37
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Median for Grouped data …cont
To find a unique median value, use the following interpolation formal.
 where,
 Lm = lower true class boundary of the interval containing the median
 Fc = cumulative frequency of the interval just bellow the median class
interval
 fm = frequency of the interval containing the median
 W= class interval width
 n = total number of observations
38
W
f
F
2
n
L
=
x
~
m
c
m














Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Median for grouped data cont…
Example
Using the data on the blood lead level of 88 individuals, the
grouped median is:
79
.
29
3
21
35
44
28.5
W
f
F
2
n
L
=
x
~
m
c
m 





 
















Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Properties of median
 The median can be used as a summary measure for
ordinal, discrete and continuous data, in general
however, it is not appropriate for nominal data.
 There is only one median for a given set of data
 Median is a positional average and hence it is not
drastically affected by extreme values (It is robust or
resistant to extreme values)
 Median can be calculated even in the case of open end
intervals
 It is not a good representative of data if the number of
items is small
40
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Mode
 Any observation of a variable at which the distribution reaches a
peak is called a mode.
 Most distributions encountered in practice have one peak and
are described as uni-modal.
 E.g. Consider the example of ten numbers
19 21 20 20 34 22 24 27 27 27
In the above data set, the mode is 27
 The mode of grouped data, usually refers to the modal class,
(the class interval with the highest frequency)
 If a single value for the mode of grouped data must be
specified, it is taken as the mid point of the modal class interval
41
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Properties of mode
 The mode can be used as a summary measure for
nominal, ordinal, discrete and continuous data, in general
however, it is more appropriate for nominal and ordinal
data.
 It is not affected by extreme values
 It can be calculated for distributions with open end classes
 Sometimes its value is not unique
 The main drawback of mode is that it may not exist
42
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Measures of variability (Dispersion)
 In order to fully understand the nature of the distribution of data set,
both measures of location and dispersion are important
 Some measures of variability are: range, inter-quartile range,
variance, standard deviation and the coefficient of variation.
Range:
 The range is the difference between the largest and the smallest
observations in the data set.
 Being determined by only the two extreme observations, use of the
range is limited because it tells us nothing about how the data
between the extremes are spread.
Example1 : We use the data set of 10 numbers:
19 , 21,20, 20, 34, 22, 24, 27, 27, 27
The range = 34 – 19 = 15

43
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Quartiles and Inter-quartile Range, Percentiles
• The inter-quartile range (IQR) is the difference between the
third and the first quartiles.
Q3 – Q1
• Example: Consider the age data of 15 patients to find IQR
• IQR = 48 – 37 = 11
44
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
Q3
Q2
Q1
NURSING Dream ● Discover ● Deliver
Quartiles and Inter-quartile Range, Percentiles
 Percentiles divide the data into 100 parts of observations in
each part.
 It follows that the 25th percentile is the first quartile, the 50th
percentile is the median and the 75th percentile is the third
quartile.
45
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Variance
 A good measure of dispersion should make use of all the data.
 Intuitively, a good measure could be derived by combining, in
some way, the deviations of each observation from the mean.
 The variance achieves this by averaging the sum of the squares
of the deviations from the mean.
46
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Variance cont…
 The population variance of a population data set of N entries is
2
2 ( )
.
x μ
N
  

 The sample variance of the set x1, x2, ..., xn of n
observations with mean x is
S
(x x)
n -1
2
i
2
i=1
n



 Note : The sum of the deviations from the mean is zero, thus it
is more useful to square the deviations, add them, find the
mean (to get the variance).
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Standard Deviation
 Being the square of the deviations, the variance is limited as
a descriptive statistic because it is not in the same units as
in the observations.
 By taking the square root of the variance, we obtain a
measure of dispersion in the original units.
 It is usually denoted by s.d or simply s and the formula is
given by:
48
1
-
n
)
x
(x
S
n
1
=
i
2
i
 

Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Examples
Example 1: Let us use the age data of 15 individuals
Example 2: consider the example of the blood lead level of 88
individuals given before . Find its variance
Solution
49
86
.
29
88
9)
3
...
22
21
(20
n
x
=
x
88
1
=
i
n
1
=
i
i








46
.
20
1
-
88
)
x
(x
S
88
1
=
i
2
i
2




Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
47
.
42
,
,
12
.
38
1
-
15
)
x
(x
S
15
1
=
i
2
i
2





X
Where
NURSING Dream ● Discover ● Deliver
Coefficient of variation
 When we want to compare the variability in two sets of data, the
standard deviation which calculates the absolute variation may
mislead us especially if the two data sets are:
with different units of measurement ,or
have widely different means
 The coefficient of variation (CV) gives relative variation & is the
best measure used to compare the variability in two sets of data.
 CV is often presented as the given ratio multiplied by 100%.
50
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Mean, standard deviation and the
normal distribution
 For unimodal, moderately symmetrical, sets of data
approximately:
 68% of observations lie within 1 standard deviation of
the mean.
 95% of observations lie within 2 standard deviations of
the mean.
i.e. Normally Distributed Data
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
x
The Empirical
Rule
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
x - s x x + s
68% within
1 standard deviation
34% 34%
The Empirical Rule
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
x - 2s x - s x x + 2s
x + s
68% within
1 standard deviation
34% 34%
95% within
2 standard deviations
The Empirical Rule
13.5% 13.5%
NURSING Dream ● Discover ● Deliver
x - 3s x - 2s x - s x x + 2s x + 3s
x + s
68% within
1 standard deviation
34% 34%
95% within
2 standard deviations
99.7% of data are within 3 standard deviations of the mean
The Empirical Rule
0.1% 0.1%
2.4% 2.4%
13.5% 13.5%
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Choosing Appropriate measures
 If data are symmetric, with no serious outliers, use mean
and standard deviation.
 If data are skewed, and/or have serious outliers, use IQR
and median.
 If comparing variation across two variables, use coefficient
of variation if the variables are in different units and/or
scales or the means are significantly different.
 If the scales/units and mean are roughly the same direct
comparison of the standard deviation is fine.
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
NURSING Dream ● Discover ● Deliver
Median Mode Mean
Fig. 2(a). Symmetric Distribution
Mean = Median = Mode
Mode Median Mean
Fig. 2(b). Distribution skewed to the right
Mean > Median > Mode
Mean Median Mode
Fig. 2(c). Distribution skewed to the left
Mean < Median < Mode
57
Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar

More Related Content

PPT
statistic.ppt
PPT
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
PPT
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
PPTX
day two.pptx
PPT
Tabular _ Graphical Presentation of data(Sep2020).ppt
PPT
Tabular & Graphical Presentation of data(2019-2020).ppt
PPTX
lupes presentation epsf mansursadjhhjgfhf.pptx
PPTX
BIOSTAT.pptx
statistic.ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
day two.pptx
Tabular _ Graphical Presentation of data(Sep2020).ppt
Tabular & Graphical Presentation of data(2019-2020).ppt
lupes presentation epsf mansursadjhhjgfhf.pptx
BIOSTAT.pptx

Similar to Lecture-2 (discriptive statistics).ppt (20)

DOC
Ch 3 DATA.doc
PPTX
Methods of data presention
PPT
Ch1 The Nature of Statistics
PPTX
Lecture-2{This tell us about the statics basic info}_JIH.pptx
PPTX
Presentation of data
PDF
Principlles of statistics
PPT
data presentation....................ppt
PPT
2. Data organization and presentaion.ppt
PPTX
2 Lecture 2 organizing and displaying of data.pptx
PDF
BIOSTATICS & RESEARCH METHODOLOGY UNIT-1.pdf
PPTX
Biostatistics and research methodology
PPT
Biostatics introduction
PPTX
Statistics "Descriptive & Inferential"
PDF
Biostatistics (L2)...................pdf
PPT
Basic statistical Measues.ppt
PPTX
Biostatistics Presentation Assignment.pptx
PDF
2. Descriptive Statistics.pdf
PPTX
Basic statistics for marketing management
PDF
Statics_Final_Pdf.pdf
PPT
Data presentation
Ch 3 DATA.doc
Methods of data presention
Ch1 The Nature of Statistics
Lecture-2{This tell us about the statics basic info}_JIH.pptx
Presentation of data
Principlles of statistics
data presentation....................ppt
2. Data organization and presentaion.ppt
2 Lecture 2 organizing and displaying of data.pptx
BIOSTATICS & RESEARCH METHODOLOGY UNIT-1.pdf
Biostatistics and research methodology
Biostatics introduction
Statistics "Descriptive & Inferential"
Biostatistics (L2)...................pdf
Basic statistical Measues.ppt
Biostatistics Presentation Assignment.pptx
2. Descriptive Statistics.pdf
Basic statistics for marketing management
Statics_Final_Pdf.pdf
Data presentation
Ad

More from habtamu biazin (20)

PPTX
PARACOCCIDIOIDOMYCOSIS.pptx
PDF
Chapter10_part1_slides.pdf
PPT
Lecture-8 (Demographic Studies and Health Services Statistics).ppt
PPT
Lecture-7 (Chi-Square test).ppt
PPT
Lecture-6 (t-test and one way ANOVA.ppt
PPT
Survival Analysis Lecture.ppt
PPT
Logistic Regression.ppt
PPT
Linear regression.ppt
PPT
Lecture-3 Probability and probability distribution.ppt
PPTX
Anti Fungal Drugs.pptx
PPTX
Opportunistic fungal infection.pptx
PPTX
7-Immunology to infection.pptx
PPT
5,6,7. Protein detection Western_blotting DNA sequencing.ppt
PPT
6. aa sequencing site directed application of biotechnology.ppt
PPT
7. Recombinat DNa & Genomics 1.ppt
PPT
3. RTPCR.ppt
PPTX
2. Prokaryotic and Eukaryotic cell structure.pptx
PPTX
1.Introduction to Microbiology MRT.pptx
PPTX
Mycobacterium species.pptx
PPT
Medical Important G+ cocci.ppt
PARACOCCIDIOIDOMYCOSIS.pptx
Chapter10_part1_slides.pdf
Lecture-8 (Demographic Studies and Health Services Statistics).ppt
Lecture-7 (Chi-Square test).ppt
Lecture-6 (t-test and one way ANOVA.ppt
Survival Analysis Lecture.ppt
Logistic Regression.ppt
Linear regression.ppt
Lecture-3 Probability and probability distribution.ppt
Anti Fungal Drugs.pptx
Opportunistic fungal infection.pptx
7-Immunology to infection.pptx
5,6,7. Protein detection Western_blotting DNA sequencing.ppt
6. aa sequencing site directed application of biotechnology.ppt
7. Recombinat DNa & Genomics 1.ppt
3. RTPCR.ppt
2. Prokaryotic and Eukaryotic cell structure.pptx
1.Introduction to Microbiology MRT.pptx
Mycobacterium species.pptx
Medical Important G+ cocci.ppt
Ad

Recently uploaded (20)

PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Lesson notes of climatology university.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Yogi Goddess Pres Conference Studio Updates
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Supply Chain Operations Speaking Notes -ICLT Program
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
A systematic review of self-coping strategies used by university students to ...
O7-L3 Supply Chain Operations - ICLT Program
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Chinmaya Tiranga quiz Grand Finale.pdf
VCE English Exam - Section C Student Revision Booklet
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Yogi Goddess Pres Conference Studio Updates
Microbial diseases, their pathogenesis and prophylaxis
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf

Lecture-2 (discriptive statistics).ppt

  • 1. NURSING Dream ● Discover ● Deliver Lemma Derseh (BSc., MPH) 1 University of Gondar College of medicine and health science Department of Epidemiology and Biostatistics Descriptive statistics
  • 2. NURSING Dream ● Discover ● Deliver Statistical Methods (branches of statistics) collection organizing summarizing presenting of data Descriptive Statistics making inferences hypothesis testing determining relationship making the prediction Inferential Statistics Biostatistics Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 3. NURSING Dream ● Discover ● Deliver Descriptive Statistics 1. Involves – Collecting Data – Presenting Data – Characterizing Data 2. Purpose – Describe Data x = 74.5, S2 = 213 0 50 100 1St 2nd 3rd 4th Class size Batch (one department) Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 4. NURSING Dream ● Discover ● Deliver Descriptive statistics cont… Types of descriptive statistics  Tables/charts/graphs …………..  Measures of central tendency  Measures of variability Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar Numerical summary measures Pictorial measure
  • 5. NURSING Dream ● Discover ● Deliver Tables/charts/graphs  Tables are used in categorical variables or categorized numerical data  Tables:  Frequency (for nominal and ordinal data)  Relative frequency (for nominal and ordinal data)  Cumulative frequencies (for ordinal data) Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar The methods of describing data differ depending on the type of the data itself (i.e. Numerical or Categorical).
  • 6. NURSING Dream ● Discover ● Deliver Describing categorical variables … cont  Frequency is the number of observations in each category  The relative frequency of a class is the portion or percentage of the data that falls in that class E.g. 1: The blood type of 30 patients were given as follows: A AB B B A O O AB AB B O A A B B A AB A O AB B AB AB O A AB AB O A O Construct a table for it 6 Type Frequency Relative frequency A 8 0.267 B 6 0.20 AB 9 0.30 O 7 0.233 Total 30 1.00 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 7. NURSING Dream ● Discover ● Deliver Distribution of birth weight of newborns between 1976-1996 at TAH. BWT Freq. Rel.Freq(%) Cum. Freq Cum.rel.freq.(%) Very low 43 0.4 43 0.4 Low 793 8.0 836 8.4 Normal 8870 88.9 9706 97.3 Big 268 2.7 9974 100 Total 9974 100 7 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar Cumulative relative frequency is relevant for ordinal data Consider for example, the variable birth weight with levels ‘Very low ’, ‘Low’, ‘Normal’ and ‘Big’. The cumulative frequency of a class is the sum of the frequency for that class and all the previous classes.
  • 8. NURSING Dream ● Discover ● Deliver Charts  Charts are used only for categorical variables  Bar charts The successive bars are separated (not continuous)  Pie charts Each sector of a circle indicates a category of data Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 9. NURSING Dream ● Discover ● Deliver Charts cont… Bar Chart  Bar charts: display the frequency distribution for nominal or ordinal data.  The various categories into which the observation fall are represented along horizontal axis and 9 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 10. NURSING Dream ● Discover ● Deliver Fig. 1 Bar chart for blood type of 30 patients Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 11. NURSING Dream ● Discover ● Deliver Pie cart  Pie chart displays the frequency of nominal or ordinal variables.  The various categories of the variable will be represented by the sector of the circle.  The area of each sector is proportional to the frequency of the corresponding category of the variable Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 12. NURSING Dream ● Discover ● Deliver Fig. 3. Pie chart showing the frequency distribution of the variable blood group Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 13. NURSING Dream ● Discover ● Deliver Categorizing Numeric data  In order to present and organize numeric type of data using tables or graphs, we need to group the dataset as follows:  Number of class: the number of categories the table will have  Class limit: The range for each class  Lower class limit  Upper class limit  Class boundary: Continuous range of the class limit and it is obtained by subtracting and adding 0.5 from lower and upper class limit respectively (for non-decimal data but for decimal 0.05)  Lower class boundary  Upper class boundary  Class mark: The average of lower and upper class limit. 13 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 14. NURSING Dream ● Discover ● Deliver Struge’s rule  Select a set of continuous, non-overlapping intervals such that each value in the set of observations can be placed in one, and only one, of the intervals. – Where K = number of class intervals – n = number of observations – W = width of the class interval – L = the largest value – S = the smallest value 14 K 1 3.322(logn) W L S K     Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 15. NURSING Dream ● Discover ● Deliver Struge’s rule cont…  For datasets with integral values subtracted or add 05.from class limits to find class boundaries  The answer obtained by applying Sturge’s rule should not be regarded as final, but should be considered as a guide only.  The number of class intervals specified by the rule should be increased or decreased for convenience and clear presentation 15 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 16. NURSING Dream ● Discover ● Deliver Example 1  The blood lead level measured in μg/dl for 88 sample individuals living in a region are given as follows(numbers with blue color are for females and the black for males) 20,21, 22,22,23,23,23,24,24,24,24,25,25,25,25,25,26,26,26,26,26,27, 27,27,27,27,27,28,28,28,28,28,28,28,28,29,29,29,29,29,30,30,30,30, 30,30,30,30,30,31,31,31,31,31,31,31,32,32,32,32,32,33,33,33,33,33, 33,33,34,34,34,34,35,35,35,35,36,36,36,36,36,37,37,37,37,38,38,39  Construct frequency distribution for the data. Solution: 16 7 . 2 7 19 7 20 39 K S L W 46 . 7 88) 3.322(log( 1 ) 3.322(logn 1 K            Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar ≈ 3
  • 17. NURSING Dream ● Discover ● Deliver Solution Blood lead level Mi frequency RF CF RCF Class Limit Class Boundaries 20-22 19.5-22.5 21 4 4/88 4 4/88 23-25 22.5-25.5 24 12 12/88 16 16/88 26-28 25.5-28.5 27 19 19/88 35 35/88 29-31 28.5-31.5 30 21 21/88 56 56/88 32-34 31.5-34.5 33 16 16/88 72 72/88 35-37 34.5-37.5 36 13 13/88 85 85/88 38-40 37.5-40.5 39 3 3/88 88 88/88 17 Where:  RF = relative frequency  Mi = class mark  CF = cumulative frequency  RCF = relative cumulative frequency Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 18. NURSING Dream ● Discover ● Deliver Graphs  Some examples are:  Histogram,  Frequency polygon,  Cumulative Relative Frequency Curve etc 18 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 19. NURSING Dream ● Discover ● Deliver Histograms  Histograms are frequency distributions with continuous class interval that have been turned into graphs.  The area of each column is proportional to the number of observations in that interval 19 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 20. NURSING Dream ● Discover ● Deliver Example The distribution of the blood lead level of 88 individuals Blood LL No. of Individuals 19.5-22.5 4 22.5-25.5 12 25.5-28.5 19 28.5-31.5 21 31.5-34.5 16 34.5-37.5 13 37.5-40.5 3 20 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar 19.5 22.5 25.5 28.5 31.5 34.5 37.5 40.5 Blood lead level
  • 21. NURSING Dream ● Discover ● Deliver Frequency polygons  Instead of drawing bars for each class interval, sometimes a single point is drawn at the mid point of each class interval and consecutive points joined by straight line.  Graphs drawn in this way are called frequency polygons (line graphs). 21 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 22. NURSING Dream ● Discover ● Deliver Frequency polygons cont… Frequency polygon for the blood lead level of study participants Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 23. NURSING Dream ● Discover ● Deliver Frequency polygon of blood lead level for males and females 23 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar Frequency polygons are superior to histograms for comparing two or more sets of data.
  • 24. NURSING Dream ● Discover ● Deliver Cumulative frequency curve (ogive)  The horizontal axis displays the different categories/intervals  The vertical axis displays cumulative (relative) frequency.  A point is placed at the true upper limit of each interval; the height represents the cumulative relative frequency associated with that interval. The points are then connected by straight lines.  Like frequency polygons, cumulative frequency curve may be used to compare sets of data.  Cumulative frequency curve can also be used to obtain percentiles of a set of data. 24 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 25. NURSING Dream ● Discover ● Deliver Cumulative frequency curve cont…  Cumulative relative frequency curve for the blood lead level of study participants Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar Cumulative frequency (prportion of individuals ) The graph ends at the upper boundary of the last class. The graph begins at the lower boundary of the first class.
  • 26. NURSING Dream ● Discover ● Deliver Box plots  A visual picture called box (box-and-whisker )plot can be used to convey a fair amount of information about the distribution of a set of data.  It is used as an exploratory data analysis tool  The box shows the distance between the first and the third quartiles,  The median is marked as a line within the box and  The end lines show the minimum and maximum values respectively 26 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 27. NURSING Dream ● Discover ● Deliver Box plot is the five-number summary: The minimum entry Q1 Q2 (median) Q3 The maximum entry Box plots cont… Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar The quartiles are sets of values which divide the distribution into four parts such that there are an equal number of observations in each part. Q1 = [(n+1)/4]th Q2 = [2(n+1)/4]th Q3 = [3(n+1)/4]th
  • 28. NURSING Dream ● Discover ● Deliver Example: Use the following age data of 15 patients to draw a box-and-whisker plot. 35 35 36 37 37 38 42 43 43 44 45 48 48 51 55 Box plots cont… Q3 Q2 Q1 Max Min Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 29. NURSING Dream ● Discover ● Deliver Illustration of Box-plot using the age of 15 patients 29 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar Notice the distribution of data in each quarter(distance between quartiles)
  • 30. NURSING Dream ● Discover ● Deliver A box-plot indicating the distribution of blood lead level of individuals by sex 30 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 31. NURSING Dream ● Discover ● Deliver Measures of central tendency  It is often useful to summarize, in a single number or statistic, the general location of the data or the point at which the data tend to cluster.  Such statistics are called measures of location or measures of central tendency.  We describe them mean, median and mode. Arithmetic mean  The arithmetic mean, usually abbreviated to ‘mean’ is the sum of the observations divided by the number of observations. 31 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 32. NURSING Dream ● Discover ● Deliver Arithmetic Mean 32 . n x = x then , sample a of values observed n are x ..., , x , x If n 1 = i i n 2 1  Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar a) Ungrouped mean Population mean: , if x’s are population observations x μ N   92 . 29 88 9) 3 ... 22 21 (20 n x = x 88 1 = i n 1 = i i         Example: Blood lead level for 88 sample individuals
  • 33. NURSING Dream ● Discover ● Deliver Arithmetic Mean cont…  b) Grouped data  In calculating the mean from grouped data, we assume that all values falling into a particular class interval are located at the mid-point of the interval. It is calculated as follow:  where, k = the number of class intervals mi = the mid-point of the ith class interval fi = the frequency of the ith class interval 33   k 1 = i i k 1 = i i i f f m = x Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 34. NURSING Dream ● Discover ● Deliver Arithmetic Mean cont… Blood lead level ( CB) Class mark (Mi) frequency 19.5-22.5 21 4 22.5-25.5 24 12 25.5-28.5 27 19 28.5-31.5 30 21 31.5-34.5 33 16 34.5-37.5 36 13 37.5-40.5 39 3 86 . 29 ) 3 .. . 12 (4 x3) 39 ... 24x12 (21x4 = x 7 1 = i 7 1 = i          Example: Arithmetic mean for grouped data of blood lead level Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 35. NURSING Dream ● Discover ● Deliver Properties of the arithmetic mean  The mean can be used as a summary measure for both discrete and continuous data, in general however, it is not appropriate for either nominal or ordinal data.  For a given set of data there is one and only one arithmetic mean.  Algebraic sum of the deviations of the given values from their arithmetic mean is always zero.  The arithmetic mean is greatly affected by the extreme values.  In grouped data if any class interval is open, arithmetic mean cannot be calculated. 35 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 36. NURSING Dream ● Discover ● Deliver Median  With the observations arranged in an increasing or decreasing order, the median is defined as the middle observation. Ungrouped data  If the number of observations is odd, the median is defined as the [(n+1)/2]th observation.  If the number of observations is even the median is the average of the two middle (n/2)th and [(n/2)+1]th values i.e  Example , where n is even: 19, 20, 20, 21, 22, 24, 27, 27, 27, 34  Then, the median = (22 + 24)/2 = 23  The ungrouped median for the blood lead level data is the average of the 44th & 45th observation; which is (30+30)/2 =30 36 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 37. NURSING Dream ● Discover ● Deliver Median Cont… Grouped data  In calculating the median from grouped data, we assume that the values within a class-interval are evenly distributed through the interval. – The first step is to locate the class interval in which it is located. – Find n/2 and see a class interval with a minimum cumulative frequency which contains n/2. (Note:- All class intervals with cumulative frequencies ≥ n/2 contain the median) 37 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 38. NURSING Dream ● Discover ● Deliver Median for Grouped data …cont To find a unique median value, use the following interpolation formal.  where,  Lm = lower true class boundary of the interval containing the median  Fc = cumulative frequency of the interval just bellow the median class interval  fm = frequency of the interval containing the median  W= class interval width  n = total number of observations 38 W f F 2 n L = x ~ m c m               Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 39. NURSING Dream ● Discover ● Deliver Median for grouped data cont… Example Using the data on the blood lead level of 88 individuals, the grouped median is: 79 . 29 3 21 35 44 28.5 W f F 2 n L = x ~ m c m                         Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 40. NURSING Dream ● Discover ● Deliver Properties of median  The median can be used as a summary measure for ordinal, discrete and continuous data, in general however, it is not appropriate for nominal data.  There is only one median for a given set of data  Median is a positional average and hence it is not drastically affected by extreme values (It is robust or resistant to extreme values)  Median can be calculated even in the case of open end intervals  It is not a good representative of data if the number of items is small 40 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 41. NURSING Dream ● Discover ● Deliver Mode  Any observation of a variable at which the distribution reaches a peak is called a mode.  Most distributions encountered in practice have one peak and are described as uni-modal.  E.g. Consider the example of ten numbers 19 21 20 20 34 22 24 27 27 27 In the above data set, the mode is 27  The mode of grouped data, usually refers to the modal class, (the class interval with the highest frequency)  If a single value for the mode of grouped data must be specified, it is taken as the mid point of the modal class interval 41 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 42. NURSING Dream ● Discover ● Deliver Properties of mode  The mode can be used as a summary measure for nominal, ordinal, discrete and continuous data, in general however, it is more appropriate for nominal and ordinal data.  It is not affected by extreme values  It can be calculated for distributions with open end classes  Sometimes its value is not unique  The main drawback of mode is that it may not exist 42 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 43. NURSING Dream ● Discover ● Deliver Measures of variability (Dispersion)  In order to fully understand the nature of the distribution of data set, both measures of location and dispersion are important  Some measures of variability are: range, inter-quartile range, variance, standard deviation and the coefficient of variation. Range:  The range is the difference between the largest and the smallest observations in the data set.  Being determined by only the two extreme observations, use of the range is limited because it tells us nothing about how the data between the extremes are spread. Example1 : We use the data set of 10 numbers: 19 , 21,20, 20, 34, 22, 24, 27, 27, 27 The range = 34 – 19 = 15  43 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 44. NURSING Dream ● Discover ● Deliver Quartiles and Inter-quartile Range, Percentiles • The inter-quartile range (IQR) is the difference between the third and the first quartiles. Q3 – Q1 • Example: Consider the age data of 15 patients to find IQR • IQR = 48 – 37 = 11 44 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar 35 35 36 37 37 38 42 43 43 44 45 48 48 51 55 Q3 Q2 Q1
  • 45. NURSING Dream ● Discover ● Deliver Quartiles and Inter-quartile Range, Percentiles  Percentiles divide the data into 100 parts of observations in each part.  It follows that the 25th percentile is the first quartile, the 50th percentile is the median and the 75th percentile is the third quartile. 45 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 46. NURSING Dream ● Discover ● Deliver Variance  A good measure of dispersion should make use of all the data.  Intuitively, a good measure could be derived by combining, in some way, the deviations of each observation from the mean.  The variance achieves this by averaging the sum of the squares of the deviations from the mean. 46 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 47. NURSING Dream ● Discover ● Deliver Variance cont…  The population variance of a population data set of N entries is 2 2 ( ) . x μ N      The sample variance of the set x1, x2, ..., xn of n observations with mean x is S (x x) n -1 2 i 2 i=1 n     Note : The sum of the deviations from the mean is zero, thus it is more useful to square the deviations, add them, find the mean (to get the variance). Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 48. NURSING Dream ● Discover ● Deliver Standard Deviation  Being the square of the deviations, the variance is limited as a descriptive statistic because it is not in the same units as in the observations.  By taking the square root of the variance, we obtain a measure of dispersion in the original units.  It is usually denoted by s.d or simply s and the formula is given by: 48 1 - n ) x (x S n 1 = i 2 i    Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 49. NURSING Dream ● Discover ● Deliver Examples Example 1: Let us use the age data of 15 individuals Example 2: consider the example of the blood lead level of 88 individuals given before . Find its variance Solution 49 86 . 29 88 9) 3 ... 22 21 (20 n x = x 88 1 = i n 1 = i i         46 . 20 1 - 88 ) x (x S 88 1 = i 2 i 2     Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar 35 35 36 37 37 38 42 43 43 44 45 48 48 51 55 47 . 42 , , 12 . 38 1 - 15 ) x (x S 15 1 = i 2 i 2      X Where
  • 50. NURSING Dream ● Discover ● Deliver Coefficient of variation  When we want to compare the variability in two sets of data, the standard deviation which calculates the absolute variation may mislead us especially if the two data sets are: with different units of measurement ,or have widely different means  The coefficient of variation (CV) gives relative variation & is the best measure used to compare the variability in two sets of data.  CV is often presented as the given ratio multiplied by 100%. 50 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 51. NURSING Dream ● Discover ● Deliver Mean, standard deviation and the normal distribution  For unimodal, moderately symmetrical, sets of data approximately:  68% of observations lie within 1 standard deviation of the mean.  95% of observations lie within 2 standard deviations of the mean. i.e. Normally Distributed Data Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 52. NURSING Dream ● Discover ● Deliver x The Empirical Rule Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 53. NURSING Dream ● Discover ● Deliver x - s x x + s 68% within 1 standard deviation 34% 34% The Empirical Rule Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 54. NURSING Dream ● Discover ● Deliver x - 2s x - s x x + 2s x + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations The Empirical Rule 13.5% 13.5%
  • 55. NURSING Dream ● Discover ● Deliver x - 3s x - 2s x - s x x + 2s x + 3s x + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations 99.7% of data are within 3 standard deviations of the mean The Empirical Rule 0.1% 0.1% 2.4% 2.4% 13.5% 13.5% Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 56. NURSING Dream ● Discover ● Deliver Choosing Appropriate measures  If data are symmetric, with no serious outliers, use mean and standard deviation.  If data are skewed, and/or have serious outliers, use IQR and median.  If comparing variation across two variables, use coefficient of variation if the variables are in different units and/or scales or the means are significantly different.  If the scales/units and mean are roughly the same direct comparison of the standard deviation is fine. Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar
  • 57. NURSING Dream ● Discover ● Deliver Median Mode Mean Fig. 2(a). Symmetric Distribution Mean = Median = Mode Mode Median Mean Fig. 2(b). Distribution skewed to the right Mean > Median > Mode Mean Median Mode Fig. 2(c). Distribution skewed to the left Mean < Median < Mode 57 Lemma Derseh, Department of Epidemiology and Biostatistics, University of Gondar

Editor's Notes

  • #53: page 79 of text
  • #54: Some student have difficulty understand the idea of ‘within one standard deviation of the mean’. Emphasize that this means the interval from one standard deviation below the mean to one standard deviation above the mean.