L estimation

ESTIMATIONESTIMATION
Dr Htin Zaw SoeDr Htin Zaw Soe
MBBS, DFT, MMedSc (P & TM), PhD, DipMedEdMBBS, DFT, MMedSc (P & TM), PhD, DipMedEd
Associate ProfessorAssociate Professor
Department of BiostatisticsDepartment of Biostatistics
University of Public HealthUniversity of Public Health

 Statistical InferenceStatistical Inference: The procedure by which we reach a: The procedure by which we reach a
conclusion about a population on the basis of the informationconclusion about a population on the basis of the information
contained in a sample drawn from that populationcontained in a sample drawn from that population
 Two general areas ofTwo general areas of Statistical InferenceStatistical Inference
(1) Estimation(1) Estimation
(2) Hypothesis Testing(2) Hypothesis Testing

 Process of estimation- statistic (sample ---Process of estimation- statistic (sample ---→→ parameter (pop)parameter (pop)
 Compute 2 estimatesCompute 2 estimates
1. Point estimate (Single numerical value)1. Point estimate (Single numerical value)
2. Interval estimate (Two numerical values – an2. Interval estimate (Two numerical values – an
interval with a specified degree of confidence)interval with a specified degree of confidence)
 The rule how to compute estimate is an estimatorThe rule how to compute estimate is an estimator
(( x =x = ∑ x∑ xii / n/ n))
 Sampled populationSampled population
 Target populationTarget population
They may or may not be the sameThey may or may not be the same
Random sample (representative to population)Random sample (representative to population)
Nonrandom sampleNonrandom sample

 I. Confidence interval for a population meanI. Confidence interval for a population mean
 Sample mean (x) is point estimate ofSample mean (x) is point estimate of μμ. Not equal to. Not equal to μμ. So an. So an
interval is needed.interval is needed.
 Sampling distribution and CLTSampling distribution and CLT
-- (a)(a) distribution of sample means (x) is normaldistribution of sample means (x) is normal
- (b)(b) μμxx == μμ
- (c)(c) σσ22
x=x= σσ22
/n/n
- 95% of sample means (x) within 2SD of95% of sample means (x) within 2SD of μμ (see fig)(see fig)
- μμ± 2± 2 σσxx will contain 95% of all possible values of sample meanswill contain 95% of all possible values of sample means
(x)(x)
-

 Construct interval using sample means (x) (which is known)Construct interval using sample means (x) (which is known)
ie. x ± 2ie. x ± 2 σσxx (instead of(instead of μμ± 2± 2 σσxx))
Several no. of x ± 2Several no. of x ± 2 σσxx with same width of interval about unknownwith same width of interval about unknown
μμ -- obtained.-- obtained.
95% of these intervals have centres falling within95% of these intervals have centres falling within ± 2± 2 σσxx aboutabout μμ..
Each of interval whose centre fall withinEach of interval whose centre fall within 22 σσxx ofof μμ will containwill contain μμ..
(See Fig)(See Fig)

Example 1: In a study investigating an enzyme level in manExample 1: In a study investigating an enzyme level in man
n = 10, a sample mean, x = 22 ,n = 10, a sample mean, x = 22 , σσ22
= 45= 45
We wish to estimateWe wish to estimate μμ..
Answer 1: 95% Confidence Interval forAnswer 1: 95% Confidence Interval for μμ is:is:
x ± 2x ± 2 σσxx ==22 ± 222 ± 2 √√ 45/1045/10
==22 ± 2 (2.1213)22 ± 2 (2.1213)
17.76, 26.2417.76, 26.24
[[ xx isis point estimate ofpoint estimate of μμ ]]
[[ 22 is a value from standard normal dist.;is a value from standard normal dist.; c95%c95% of x lie; this valueof x lie; this value
ofof zz isis reliability coefficientreliability coefficient ]]
[[ σσxx is SD of sampling distribution]is SD of sampling distribution]
Interval estimateInterval estimate is expressed in general as follows:is expressed in general as follows:
Estimator ± (reliability coefficient) (standard error)Estimator ± (reliability coefficient) (standard error)

 x ± 2x ± 2 σσxx
 When sampling is from a normal distribution with knownWhen sampling is from a normal distribution with known σσ22
,,
interval estimate forinterval estimate for μμ is expressed as followsis expressed as follows
x ± zx ± z(1-(1-αα/2)/2) σσxx
[ z[ z(1-(1- αα/2)/2) = a value of z ]= a value of z ]
(1 -(1 - αα/2) lies on left side of z under the curve./2) lies on left side of z under the curve.
((αα/2) lies on right side of z under the curve./2) lies on right side of z under the curve.

 Interpreting Confidence IntervalInterpreting Confidence Interval
Probabilistic interpretation:Probabilistic interpretation:
In a repeated sampling from a normally distributedIn a repeated sampling from a normally distributed
population with a known standard deviation, 100(1 -population with a known standard deviation, 100(1 - αα) percent) percent
of all interval of the formof all interval of the form x ± zx ± z(1-(1-αα/2)/2) σσxx will in long run include thewill in long run include the
population mean,population mean, μμ
Practical interpretation:Practical interpretation:
When sampling is from a normally distributed populationWhen sampling is from a normally distributed population
with a known standard deviation, we are 100(1 -with a known standard deviation, we are 100(1 - αα) percent) percent
confident that theconfident that the singlesingle computed interval,computed interval, x ± zx ± z(1-(1-αα/2)/2) σσxx ,, containscontains
the population mean,the population mean, μμ

Confidence CoefficientsConfidence Coefficients
.90 .95 .99.90 .95 .99
Reliability factorsReliability factors 1.645 1.96 2.581.645 1.96 2.58
[ In example 1, value of reliability coefficient 2 is used, but more[ In example 1, value of reliability coefficient 2 is used, but more
exact value is 1.96 for confidence coefficient of .95 ]exact value is 1.96 for confidence coefficient of .95 ]

 Precision:Precision:
Quantity obtained by multiplying the reliability factor by theQuantity obtained by multiplying the reliability factor by the
SE of the mean is called theSE of the mean is called the precisionprecision of the estimate. Thisof the estimate. This
quantity is also called thequantity is also called the margin of errormargin of error
Example2: Variance of muscular strength score = 144Example2: Variance of muscular strength score = 144
Mean of muscular strength score in a sample (n = 15)Mean of muscular strength score in a sample (n = 15)
is 84.3is 84.3
Find 99 % confidence interval for pop mean.Find 99 % confidence interval for pop mean.
Answer2:Answer2: x ± zx ± z(1-(1-αα/2)/2) σσxx
84.384.3 ± 2.58 ( √ 144/ 15)± 2.58 ( √ 144/ 15)
84.384.3 ± 8.0± 8.0
76.3, 92.376.3, 92.3
We are 99% confident that pop. mean,We are 99% confident that pop. mean, μμ lies between 76.3 andlies between 76.3 and
92.392.3

 Alternative estimates of central tendency:Alternative estimates of central tendency:
Median instead of mean when there are outliers in a data set.Median instead of mean when there are outliers in a data set.
Median is used as a point estimate and in an interval estimateMedian is used as a point estimate and in an interval estimate
with different formula.with different formula.
Trimmed mean:Trimmed mean:
It is one of robust estimators of central tendency for a data setIt is one of robust estimators of central tendency for a data set
with outlierswith outliers
Steps to compute trimmed meanSteps to compute trimmed mean
- Order the measurements- Order the measurements
- Discard smallest 100- Discard smallest 100 αα % and largest 100% and largest 100 αα % (% (αα = 0.1- 0.2)= 0.1- 0.2)
- Compute Art. Mean of the remaining measurements- Compute Art. Mean of the remaining measurements
Note: Median may be regarded as aNote: Median may be regarded as a 50% trimmed mean50% trimmed mean


II. t distributionII. t distribution
z = (x -z = (x - µ) / (µ) / ( σσ / √/ √ n)n)
σσ is usually unknownis usually unknown
SoSo ss is used insteadis used instead
tt = (x -= (x - µ) / (µ) / ( ss / √/ √ n)n) followsfollows t distributiont distribution
t distribution is used in a small sample sizet distribution is used in a small sample size
CalledCalled Student’s t distributionStudent’s t distribution oror t distributiont distribution

 Properties of t distribution:Properties of t distribution:
1. It has mean, 01. It has mean, 0
2. It is symmetrical about mean2. It is symmetrical about mean
3. It has variance > 1. Variance approaches 1 as n becomes3. It has variance > 1. Variance approaches 1 as n becomes
large.large.
Variance is df/(df-2) for df > 2Variance is df/(df-2) for df > 2
Alternatively, variance is (n-1)/(n-3) for n >3Alternatively, variance is (n-1)/(n-3) for n >3
4. Variable t ranges from -4. Variable t ranges from - ∞∞ to +to + ∞∞
5. Family distribution since a different distribution for each sample5. Family distribution since a different distribution for each sample
of n-1 (divisor used in computing sof n-1 (divisor used in computing s22
) (See fig)) (See fig)
6. t dist. is less peaked and higher tails when compared to normal6. t dist. is less peaked and higher tails when compared to normal
dist.dist.
7. t dist. approaches normal dist. as n-1 approaches7. t dist. approaches normal dist. as n-1 approaches ∞∞

 Table E or Table of t dist. is used for confidence coefficient andTable E or Table of t dist. is used for confidence coefficient and
df in calculationdf in calculation
 Confidence intervals using t :Confidence intervals using t :
 When sampling is from a normal dist. whoseWhen sampling is from a normal dist. whose σσ is unknown,is unknown,
100(1-100(1- αα) % CI for pop mean,) % CI for pop mean,µ,µ, is given byis given by x ± tx ± t(1-(1-αα/2)/2) s /√ ns /√ n
[Reliability coefficient is obtained from table of t dist. Or Table[Reliability coefficient is obtained from table of t dist. Or Table
E]E]

 Example 3: In a study to estimate the pop. mean of muscularExample 3: In a study to estimate the pop. mean of muscular
strength. Sample mean (x) = 250.8strength. Sample mean (x) = 250.8
Sample SD (s) = 130.9Sample SD (s) = 130.9
Sample size (n) = 19 subjectsSample size (n) = 19 subjects
Find 95%CI for pop. meanFind 95%CI for pop. mean
x ± tx ± t(1-(1-αα/2)/2) s /√ ns /√ n
250.8 ± 2.10009250.8 ± 2.10009 (130.9 /√ 19)(130.9 /√ 19)
250.8 ± 63.1250.8 ± 63.1
187.7, 313.9187.7, 313.9
Interpretation: We are 95% confident that the true pop. Mean, µInterpretation: We are 95% confident that the true pop. Mean, µ
lies between 187.7 and 313.9 because in repeated samplinglies between 187.7 and 313.9 because in repeated sampling
95% of interval constructed in like manner will include µ.95% of interval constructed in like manner will include µ.

 Deciding between z and t :Deciding between z and t :
(See flowchart)(See flowchart)
See – Distribution of pop. (normal or not)See – Distribution of pop. (normal or not)
- Sample size ( large or not)- Sample size ( large or not)
- Pop. variance (known or not)- Pop. variance (known or not)
Choose z or tChoose z or t
III. Confidence interval for difference between two pop. meansIII. Confidence interval for difference between two pop. means
xx11 – x– x22 ± t± t(1-(1-αα/2)/2) √√ σσ22
11 /n/n11 ++ σσ22
22 /n/n22
If the interval includes zero, two pop. means are likely to be equalIf the interval includes zero, two pop. means are likely to be equal
and vice versaand vice versa

 Example 4: In a study to determine difference bet. serum uricExample 4: In a study to determine difference bet. serum uric
acid level of two groups of patients (with Down’s Syndrome andacid level of two groups of patients (with Down’s Syndrome and
without Down’s Syndrome )without Down’s Syndrome ) (Two pop of values – normally dist.)(Two pop of values – normally dist.)
 Group 1: Mean uric acid level (xGroup 1: Mean uric acid level (x11) = 4.5 mg/100 ml) = 4.5 mg/100 ml
Variance (Variance (σσ11
22
) = 1) = 1
Sample size (nSample size (n11) = 12 subjects) = 12 subjects
 Group 2: Mean uric acid level (xGroup 2: Mean uric acid level (x22) = 3.4 mg/100 ml) = 3.4 mg/100 ml
Variance (Variance (σσ22
22
) = 1.5) = 1.5
Sample size (nSample size (n22) = 15 subjects) = 15 subjects
Find 95% CI for difference between two pop. means (µFind 95% CI for difference between two pop. means (µ11--µµ22))
xx11 – x– x22 ±± zz(1-(1-αα/2)/2) √√ σσ22
11 /n/n11 ++ σσ22
22 /n/n22
4.5 – 3.4 ± 1.964.5 – 3.4 ± 1.96 √ 1 /12√ 1 /12+ 1.5 /15+ 1.5 /15
= 1.1 ± 1.96 (0.4282)= 1.1 ± 1.96 (0.4282)
= 1.1 ± 0.84= 1.1 ± 0.84

 Interpretation: We are 95% confident that difference betweenInterpretation: We are 95% confident that difference between
two serum uric acid levels lies between 0.26 and 1.94 mg/ 100two serum uric acid levels lies between 0.26 and 1.94 mg/ 100
ml. Since the intervalml. Since the interval does not include zerodoes not include zero we conclude thatwe conclude that
two pop. means aretwo pop. means are not equalnot equal
Sampling from nonnormal population:Sampling from nonnormal population:
- Apply CLT if sample sizes are large- Apply CLT if sample sizes are large
- use s- use s22
ifif σσ22
is unknownis unknown

 TheThe t distribution and the difference between meanst distribution and the difference between means
(A) When pop variances are equal(A) When pop variances are equal
(B) When pop variances are not equal(B) When pop variances are not equal
(A)(A) When pop variances are equalWhen pop variances are equal
1.Find1.Find pool estimatepool estimate of common variance byof common variance by
ss22
pp = (n= (n11-1)s-1)s11
22
+ (n+ (n22-1)s-1)s22
22
/ (n/ (n1 +1 + nn22 – 2)– 2)
2. Find SE of estimate by2. Find SE of estimate by
ss x1-x2x1-x2 == √(√( ss22
pp / n/ n11 ) +) +(( ss22
pp / n/ n22 ))
3. Find 100(1-3. Find 100(1-αα) % confidence interval by) % confidence interval by
xx11 – x– x22 ± t± t(1-(1-αα/2)/2) √(√( ss22
pp / n/ n11 ) +) +(( ss22
pp / n/ n22 ))
[Note: number of df is[Note: number of df is (n(n1 +1 + nn22 – 2)]– 2)]

 (B) When pop variances are not equal(B) When pop variances are not equal
1.Find reliability factor,1.Find reliability factor, t’t’(1-(1-αα/2)/2) byby
t’t’(1-(1-αα/2)/2) = (w= (w11tt11 ++ ww22tt22) / w) / w11 ++ ww22
[w[w11 = s= s11
22
/n/n11 ,, ww22 = s= s22
22
/n/n22 ]]
2. Find 100(1-2. Find 100(1-αα) % confidence interval by) % confidence interval by
xx11 – x– x22 ± t’± t’(1-(1-αα/2)/2) √(√( ss22
11 / n/ n11 ) +) +(( ss22
22 / n/ n22 ))
SEE THE EXAMPLE IN TEXTSEE THE EXAMPLE IN TEXT
See Flowchart to choose z, t, or t’See Flowchart to choose z, t, or t’

 IV. Confidence interval for a population proportionIV. Confidence interval for a population proportion
-What proportion of patients who receive a particular type of-What proportion of patients who receive a particular type of
treatment recover?treatment recover?
-What proportion of some pop. has a certain disease?-What proportion of some pop. has a certain disease?
-What proportion of a pop. is immune to a certain disease?-What proportion of a pop. is immune to a certain disease?
 Use same manner as in finding 100(1-Use same manner as in finding 100(1-αα) CI for pop. mean) CI for pop. mean
Find 100(1-Find 100(1-αα) CI for pop. proportion,) CI for pop. proportion, pp by the following:by the following:
pp ± z± z(1-(1-αα/2)/2) √√ pp (1-(1- pp) /n) /n
[Note: When np and n(1-p) are greater than 5, it is considered that[Note: When np and n(1-p) are greater than 5, it is considered that
sampling distribution of p is quite close to normal distributionsampling distribution of p is quite close to normal distribution
and Reliability Coefficient is some value of z from standardand Reliability Coefficient is some value of z from standard
normal distribution]normal distribution]

 Example 5: In a study finding population proportion of thoseExample 5: In a study finding population proportion of those
searching health information among internet users.searching health information among internet users.
Sample prop, (p) = 0.18Sample prop, (p) = 0.18
Sample size = 1220 usersSample size = 1220 users
Find 95% CI for pop. prop.Find 95% CI for pop. prop.
pp ± z± z(1-(1-αα/2)/2) √√ pp (1-(1- pp) /n) /n
0.18 ± 1.960.18 ± 1.96 √ 0.18 (1- 0.18) /1220√ 0.18 (1- 0.18) /1220
0.18 ± 1.96 (0.0110)0.18 ± 1.96 (0.0110)
0.18 ± 0.0220.18 ± 0.022
0.158, 0.2020.158, 0.202
Interpretation: We are 95% confident thatInterpretation: We are 95% confident that population proportion ofpopulation proportion of
those searching health information among internet users liesthose searching health information among internet users lies
between 0.158 and 0.202between 0.158 and 0.202

 V. Confidence interval for the difference between twoV. Confidence interval for the difference between two
population proportionspopulation proportions
pp11 – p– p22 ± z± z(1-(1-αα/2)/2) √ p√ p11(1 – p(1 – p11) /n) /n11 + p+ p22 (1 – p(1 – p22) /n) /n22
SEE THE EXAMPLE IN TEXTSEE THE EXAMPLE IN TEXT
If the interval includes zero, two pop. proportions are likely toIf the interval includes zero, two pop. proportions are likely to
be equal and vice versabe equal and vice versa

 VI. Determination of Sample Size for Estimating MeansVI. Determination of Sample Size for Estimating Means
A larger sample sizeA larger sample size →→ waste of resourceswaste of resources
A very small sample sizeA very small sample size →→ no practical useno practical use
It is essential to get aIt is essential to get a sufficient/optimum sample sizesufficient/optimum sample size
Objectives:Objectives: The objectives in interval estimation are to getThe objectives in interval estimation are to get
narrow intervals with high reliabilitynarrow intervals with high reliability
See againSee again Estimator ± (reliability coefficient) (standard error)Estimator ± (reliability coefficient) (standard error)
Width of interval = (reliability coefficient) (standard error)Width of interval = (reliability coefficient) (standard error)
d = z (SE)d = z (SE)
dd = z (= z (σσ /√n/√n))
dd22
= z= z22
((σσ /√n/√n))22
n =n = zz22
σσ22
// dd22

 Sample size formula when sampling is without replacementSample size formula when sampling is without replacement
from a small finite population is follow:from a small finite population is follow:
n = Nzn = Nz22
σσ22
/ d/ d22
(N-1) + z(N-1) + z22
σσ22
This formula is derived by using finite population correctionThis formula is derived by using finite population correction
√√ N-n /N-1N-n /N-1 (See text)(See text)
EstimatingEstimating σσ22
::
n =n = zz22
σσ22
// dd22
1. Using a pilot or preliminary sample1. Using a pilot or preliminary sample →→ σσ22
2.2. Using previous or similar studiesUsing previous or similar studies →→ σσ22
3. Using3. Using σσ ≈ R/6≈ R/6 if pop. is approx. normally distributedif pop. is approx. normally distributed (largest and(largest and
smallest value known)smallest value known) →→ σσ

 Example 6: In a study determining the average daily intake ofExample 6: In a study determining the average daily intake of
protein in teenage girls,protein in teenage girls, what is the required sample size?what is the required sample size?
-Protein intake is measured in gram (g)-Protein intake is measured in gram (g) →→ based onbased on
mean/averagemean/average
-Investigator assumed that width of interval is 10 g-Investigator assumed that width of interval is 10 g
(ie within about 5 g of pop. mean in either direction)(ie within about 5 g of pop. mean in either direction)
(ie. Margin of error is 5 g)(ie. Margin of error is 5 g)
-Pop. SD = 20 g-Pop. SD = 20 g
-Confidence coefficient = 0.95 (so reliability factor = 1.96)-Confidence coefficient = 0.95 (so reliability factor = 1.96)
-Ignoring finite pop correction as the pop. is large, required n is:-Ignoring finite pop correction as the pop. is large, required n is:
n =n = zz22
σσ22
// dd22
= 1.96= 1.9622
(20)(20)22
/ 5/ 522
= 61.47= 61.47
So,So, required sample size isrequired sample size is 6161 teenage girlsteenage girls

 VII. Determination of Sample Size for Estimating ProportionsVII. Determination of Sample Size for Estimating Proportions
 Assuming distribution ofAssuming distribution of pp is approx. normal andis approx. normal and
-When sampling is with replacement,-When sampling is with replacement,
-When sampling is from an infinite pop.,-When sampling is from an infinite pop.,
-When sampled pop. is large enough,-When sampled pop. is large enough, finite pop. correction is notfinite pop. correction is not
needneed
So we useSo we use n = zn = z22
pq / dpq / d22
IfIf finite pop. correction is used,finite pop. correction is used, useuse n = Nzn = Nz22
pq /pq / dd22
(N-1) + z(N-1) + z22
pqpq
EstimatingEstimating p :p :
1.1. Use a pilot sampleUse a pilot sample
2.2. Use upper bound forUse upper bound for pp (eg. true p not greater than 0.3)(eg. true p not greater than 0.3)
3.3. Use 0.5 forUse 0.5 for pp

 Example 7: In a study determining proportion of medicallyExample 7: In a study determining proportion of medically
indigent families in an area, what is the sample size?indigent families in an area, what is the sample size?
It is believed thatIt is believed that pp cannot be greater than 0.35. a 95% CI iscannot be greater than 0.35. a 95% CI is
desired withdesired with dd = 0.05= 0.05
n = zn = z22
pq / dpq / d
==1.961.9622
(0.35) (0.65) / 0.05(0.35) (0.65) / 0.0522
= 350= 350
So required sample size is 350 familiesSo required sample size is 350 families

 VIII. Confidence interval for the variance of normally distributedVIII. Confidence interval for the variance of normally distributed
populationpopulation (see text)(see text)
 IX. Confidence interval for the ratio of the variances of twoIX. Confidence interval for the ratio of the variances of two
normally distributed populationsnormally distributed populations (see text)(see text)

L estimation

More Related Content

What's hot (20)

Similar to L estimation (20)

More from Mmedsc Hahm (20)

Recently uploaded (20)

L estimation