What is an estimate with details regarding it's use in biostatistics

Estimation
 The process of using sample information to draw
conclusion about the value of a population parameter is
known as estimation.
 A point estimate is a specific numerical value estimate of a
parameter.
 The best point estimate of the population mean µ is the
sample mean
2

 But how good is a point estimate for the
population mean µ?
 There is no way of knowing how close the
point estimate is to the population mean
 therefore Statisticians prefer another type of
estimate called an interval estimate
3

Sample
(observation)
Make guesses about
the whole
population
Truth (not
observable)
N
x
N
i
i
2
1
2
)
( 





N
x
N
i


 1

Population parameters
1
)
(
ˆ
2
1
2
2






n
X
x
s
n
n
i
i

n
x
X
n
i
n



 1
̂
Sample statistics
*hat notation ^ is often used to
indicate “estitmate”

• Suppose that the true mean height of male
undergraduate students equals 5’11”.
• But we do not know this mean.
• We want to estimate it by selecting at random a
sample of 100 male undergraduate students and
measuring their height.
5

Procedure:
• Select at random a sample of 100 male undergraduate
students.
• Measure their heights.
• Calculate the mean height for the 100 students.
6

Problem: Sampling Error
• What if our sample of 100 students happens to have
some very tall males, or some very short ones?
• Then, our estimate of the mean height of all male
undergraduate students would be a bit taller or shorter
than the true mean.
• This type of error in estimating a statistic (i.e., the
mean height) is called sampling error.
7

Solution 1: Multiple Samples
• One solution to these potential problem of sampling
error is to collect, for example, 100 samples of 100
male undergraduate students.
• Then, we would calculate 100 means for these 100
samples to get a better idea of the true mean height for
all male undergraduate students.
8

Solution: Multiple Samples
• Imagine these 100 means for the 100 samples, where
each one is plotted around the true mean.
• Some of the means from the 100 samples would be a
bit too tall, some a bit too short. Most would be very
close to the true mean; some might be far away from it.
9

Solution: Multiple Samples
• That is, we would have a Normal Distribution of
estimated means scattered around the true mean.
• So, imagine a “baby” bell-shaped curve of estimates
(i.e., the 100 means from the 100 samples) located
inside the large bell-shaped curve of observations
(i.e., the heights of all male undergraduate students).
10

• The large curve shows the distribution of all observations
about their true mean.
• The intervals located on each side of the mean represent the
standard deviation of the observations.
• The small curve shows the distribution of the 100 estimates
[100 sample means] of the true mean.
• This curve represents the standard error of the mean.
11

The Central Limit Theorem:
If all possible random samples, each of size n, are
taken from any population with a mean  and a
standard deviation , the sampling distribution
of the sample means (averages) will:

 
x
1. have mean:
n
x

 
2. have standard deviation:
3. be approximately normally distributed regardless of the shape of the
parent population (normality improves with larger n).
12
The mean of the sample means is the same as population mean
The standard deviation of the sample means is representative of the population
standard deviation but smaller than it and express it by s/√ n

Standard Error
• A simple way of thinking of the difference between
standard deviation and standard error is:
• The distribution of observations in the population
with respect to the normal curve is standard
deviation.
• The distribution of an estimate (i.e., sample mean)
with respect to the normal curve is standard error.
13

• This standard error shows the boundaries in which the
true mean might be located.
14
Standard Error
Standard error is the extent to which an estimate, such as
a mean of a distribution, can vary, given a level of
confidence (typically, 95% in research).

15
Example
Data of a finite population ; 1,2,3,4,5
Mean ; [1+2+3+4+5] / 5
15 /5 = 3
Standard Deviation  (Xi –x)2
/n-1
If we plot these values in a bar chart
1 2 3 4 5
f
It looks like a rectangular distribution

16
1,2
1,3
1,4
1,5
2,3
2,4
2,5
3,4
3,5
4,5
Values of samples
1.5
2
2.5
3
2.5
3
3.5
3.5
4
4.5
Lets take all possible samples of size ‘n’= 2 i.e.
Means
Now plot these means and we see that
they tends towards a normal
distribution
1.5 2 2.5 3 3.5 4 4.5
Calculate the mean of these means and also calculate the
standard deviation
(1.5,2,2.5,3,2.5,3,3.5,3.5,4,4.5)/10 =30 /10= 3
Where have I seen this figure???

Estimation; Standard Error of Mean
 Central Limit Theorem (CLT)
 Consider a population (n> 30) with a µ and S
 If repeated sampling is done then relative
frequency histogram for the sample means will be
normal and bell shaped.
 Now using empirical rule we know that interval
between + 1.96 * S.D./ n includes
95% of on repeated sampling
 CLT states that on repeated sampling and
constructing the + 1.96 * S.D./ n we
would expect the µ to fall with In this interval 95%
times 17


Recap

18
Recap:- suppose we have a population, we want to estimate its
mean. what should we do?
We may have to collect suppose many samples and draw a
frequency polygon of these means, then according to the CLT,
these means will follow a normal distribution curve, its mean will
be exactly equal to the population mean, and its standard dev is
denoted by s/√n.(standard error)
now we can say that if we construct a 95% CI for the mean it will
contain 95% of the sample means, or we can say that in these
95% sample means if we construct the CI of any sample
mean, it will contain the population mean.

 We wish to estimate mean volume of oxygen uptake
for joggers
 Random sampling of 100 joggers is done.
 Mean is 47.5 ml/kg and s= 4.8 ml/kg.
 Construct 95% confidence interval of µ
20
Example 1

Solution
 Formula
+ 1.96 S.D. / n
Putting the values;
47.5 + 1.96 * 4.8/ 100
= 47.5 + .94
= 46.56 - 48.44
Conclusion: On repeated sampling if we
construct 100 such intervals 95% will
contain population mean
``
21

 We wish to estimate average height of students of
KMC.
 Random sampling of 49 students is done.
 Mean height is 170 cms and s= 7 cms.
22

23
 Formula
+ 1.96* S.D. / n
Putting the values;
170 + 1.96 * 7 / 49
= 170 + 1.96
= 168.04 - 171.96

KMC.
24

25
 Formula
+ 2.58 S.D.* / n
Putting the values;
170 + 2.58 * 7 / 49
= 170 + 2.58
= 167.42 - 172.58

KMC.
26

27
 Formula
+ 2.58 S.D. / n
Putting the values;
170 + 2.58 * 7 / 196 196
=14
= 170 + 2.58 * 0.5
= 168.71 - 171.29

What is an estimate with details regarding it's use in biostatistics

More Related Content

Similar to What is an estimate with details regarding it's use in biostatistics (20)

More from bilalkhanafridi582 (9)

Recently uploaded (20)

What is an estimate with details regarding it's use in biostatistics

Editor's Notes