Review on probability distributions, estimation and hypothesis testing

Review on Probability distributions, Estimation and Hypothesis Testing Lecture Note
By Meselu Tegenie (MSc in Natural Resource Economics, Policy, and MSc in Mathematics,
School of Natural Resource, and Environmental Studies, WGCF-NR, Hawassa University
____________________________________________________________________________
Contents
1.1 Discrete and Continuous Probability Distributions
1.1.1 Binomial Distribution
1.1.2 Poisson Distribution
1.1.3 Hypergeometric Distribution
1.1.4 Normal Distribution
1.1.5 Uniform Distribution
1.1.6 Exponential Distribution
1.2 Other Distributions related to normal distributions
1.2.1 Chi-square Distribution
1.2.2 F-distribution
1.2.3 Student (t) Distribution
1.3 Estimation
1.4 Hypothesis Testing
1.2 Discrete and continuous probability distributions
Reading Assignment
Reference: Elementary Statistics: A Step by Step Approach Allan G. Bluman, Chapter 5,6,7 & 8
Answer the following sample test questions
1. Define random variable and discuss its characteristics and classifications?
2. Discuss the characteristic and use of each of the following distributions?

a. Binomial Distribution
b. Poisson Distribution
c. Hypergeometric Distribution
d. Normal Distribution
e. Uniform Distribution
f. Exponential Distribution
1.2 Other continuous probability distributions: The Chi-square and F Distributions
The following questions will be answered in the discussion of this sub-topic of discussion
• What is the chi-square distribution? How is it related to the Normal?
• How is the chi-square distribution related to the sampling distribution of the variance?
• How the F distribution is related the Normal? To Chi-square?
• What are the characteristics of the F-distribution?
• There are many theoretical distributions, both continuous and discrete as indicated in
section in 1.1. Well calls these test statistics. Four test statistics are commonly used: z
(unit normal), student (t), chi-square ( ), and F tests. Unit normal and t-distributions
are addresses in 1.1. Z and t are closely related to the sampling distribution of means;
chi-square and F are closely related to the sampling distribution of variances.
In this sub section, the characteristics of the chi-square and F-distributions will be discussed.
1.2.1 The Chi-square Distribution
Give the standard normal distribution , derived from a normal distribution X with
mean μ and standard deviation σ , results a new distribution called chi-
square distribution with one degree of freedom.
2


)( 

X
z
2
)1(2
2
2 )(





X
z

Given , is
chi-square distribution with two degrees of freedom.
Characteristics of chi-square distribution
 Chi-square is the distribution of a sum of squares.
 Each squared deviation is taken from the unit normal: N(0,1).
 Its value ranges from zero to infinity,  ,0
 Most values are between zero and one
 The shape of the chi-square distribution depends on the number of squared deviates
that are added together.
 The shape of the distribution is skewed to the right
 The distribution of chi-square depends on one parameter, its degrees of freedom (df or
v). As df gets large, curve is less skewed, more normal.
2
2
2
2
2
12
2
2
1
2
)2(2
2
22
22
2
12
1
)()(
,
)(
;
)(

















XX
zz
X
z
X
z

 The expected value or mean of chi-square is df.
 The expected variance of the distribution is 2df.
 There are tables of chi-square so you can read desired critical values
 Chi-square is additive. The sum of two chi-square distributions is chi square distribution
with degree of freedom equal to the sum of the individual degree of freedoms
Provided that a sample of size n is taken from a normal distribution and
is chi square distribution with degree of freedom n-1. As a result, it is useful
distribution with regard to hypothesis testing and estimation of the variance of normal populations.
1
)( 2
2


 
n
xx
s
2
2
2
)1(
)1(


sn
n



1.2.2 The F-distribution
• The F distribution is the ratio of two variance estimates:
• Also the ratio of two chi-squares, each divided by its degrees of freedom:
• In our applications, v2 will be larger than v1 and v2 will be larger than 2. In such a case, the mean
of the F distribution (expected value) is v2 / (v2 -2).
• F depends on two parameters: v1 and v2 (df1 and df2). The shape of F changes with these.
Range is 0 to infinity. Shaped a bit like chi-square.
• F tables show critical values for df in the numerator and df in the denominator.
• F tables are 1-tailed
• The F distribution is used in many statistical tests
• Test for equality of variances.
• Tests for differences in means in ANOVA.
• Tests for regression models (slopes relating one continuous variable to another).
Relations among Distributions – the Children of the Normal
• Chi-square is drawn from the standard normal. N(0,1) deviates squared and summed.
• F is the ratio of two chi-squares, each divided by its df. A chi-square divided by its df is a
variance estimate, that is, a sum of squares divided by degrees of freedom.
• F = t2
, If you square t, you get an F with 1 df in the numerator
2
2
2
1
2
2
2
1
.
.


est
est
s
s
F 
2
2
(
1
2
)(
/)
/
2
1
v
v
F
v
v



),1(
2
)( vv Ft 

1.3 Estimation
Given a population, a summary statistic calculated from a representative sample is a point estimate
of the population parameter. And an interval computed based on point estimate, population or
sample characteristic and a given % level of confidence is called confidence interval estimate.
Population parameter Point estimate
Mean μ
Variance σ2
S2
Standard deviation
Proportion π p
A point estimate is a single number; statistic value calculated from representative sample. An
interval gives a range of values: it takes into consideration of variation in sample statistics from
sample to sample based on observation from one sample.it also gives information about closeness
to unknown population parameters stated in terms of level of confidence less than 100%.
x

1.3.1 Confidence Interval for Means when Population Standard Deviation is known
Given a population with standard deviation σ. If mean calculated from sample of size n is a
(1-α) 100% confidence interval of the true population mean is given by
Where n is large or the population is normal.
x
n
zx


2

rightitson todistributi-z
ofcurveunder theareatheof
2
leaveson thatdistributi-zinvaluetheis
2

z

Finding critical values of z using excel 2010
)
2
-(1NORM.S.INV
2

 z
1.3.2 Confidence Interval for Means when Population Standard Deviation is unknown
Given a population with unknown standard deviation. If mean and standard deviation calculated
from a sample of size n is and s respectively a (1-α) 100% confidence interval of the true
population mean is given by
Where n is large or the population is normal.
E Finding critical values of z using excel 2010
1)-n,
2
T.INV.2T(=)1,
2
(

nt
1.3.3 Confidence Interval for Population Proportion
The distribution of the sample proportion (p) can be approximated by normal distribution if the sample
size is large or np≥5, with standard deviation a . The standard deviation estimate is
. Hence a (1-α) 100% confidence interval is given by
x
n
s
x t
n )1(
2

 
freedomofdegrees1-nright withitson todistributi-t
ofcurveunder theareatheof
2
leaveson thatdistributi-in tvaluetheis1,
2
t







n
n
)(1
σp
 

n
)(1
σp
pp 

n
p)p(1
Zp
2

 

1.3.4 Confidence Interval for Population Variance
If the variance of n size sample taken from normal population is s2
, a (1-α)100% confidence interval of
the population variance is :
Note: As the chi-square distribution is not symmetric, the lower boundary of the confidence
interval should be computed independently.
Finding chi-square critical values using Excel
1. 






2
1,12

 n =CHISQ.INV(
2

,n-1),
2. 






2
,12

 n =CHISQ.INV.RT(
2

,n-1)
1.4 Hypothesis Testing
A hypothesis is a claim (assumption) about a population parameter such as population mean,
proportion, or variance. In addition, when a researcher challenges a claim hypothesis testing
may be made. For example, 2011/2012 Agricultural Survey Report of CSA reports that the
mean productivity of teff in Oromiya region is 12.96 quintals per hectare. If a researcher,
challenges this fact either assuming the report is not right, the productivity is under estimated
or the productivity is over estimated, he/she is supposed to take sample data and test the claim
of the report with some significance level α. If the claim of the report is rejected it mean the
















2
1;1
2
2
2
2
;1
2
2
)1()1(



 nn
snsn

researcher has sufficient evidence that support his /her alternative claim, otherwise the claim
of the report is valid.
As can be inferred from the narrative, there are two clams, the status quo (claim of the report)
called the null hypothesis and the claim of the researcher, which is the negative of the null
hypothesis, called alternative hypothesis. In addition, there are three alternatives of test
testing, the productivity per hectare is different from what is reported, two tail, because the
productivity may be either over estimated or under estimated by the report, the productivity is
under estimated by the report so it is significantly greater than what is reported, right tail(or
upper tail) test and the productivity is over estimated by the report hence the productivity per
hectare is significantly below what is reported, left tail(or lower tail) test.

Steps of Hypothesis Testing
a. State the null hypothesis, H0 and the alternative hypothesis, H1(Ha)
b. n
c. Determine the appropriate test statistic and sampling distribution
d. Determine the critical values that divide the rejection and non-rejection regions
e. Collect data and compute the value of the test statistic
f. Make the statistical decision and state the managerial conclusion. If the test statistic falls into
the non-rejection region, do not reject the null hypothesis H0. If the test statistic falls into the
rejection region, reject the null hypothesis. Express the managerial conclusion in the context of
the problem
Exercise: Take more reading about type I error, Type II error and nature of alternative null
hypothesis

1.4.1. Hypothesis testing about population mean when population standard deviation is known
Given a population with standard deviation σ. If mean calculated from sample of size n is a α significance level test is as shown
below:
Test type Hypothesis
criticalz read from
table or use
software
calcultedz Decision
Two tail









cesignifican
H
H
,
:
:
01
00
2
z
 
n
x
zcritical



If
2
zzcritical  reject H0, where
2
z is the value of z
that leaves
2

of the area under the curve of the z-
distribution to its right
Left tail 








cesignifican
H
H
,
:
:
01
00
- z
 
n
x
zcritical



If zzcritical  reject H0, where z is the value of z
that leaves α of the area under the curve of the z-
Right 








cesignifican
H
H
,
:
:
01
00
z
 
n
x
zcritical



If zzcritical  reject H0, where z is the value of z
distribution to its right, otherwise accept H0
x

1.4.1. Hypothesis testing about population mean when population standard deviation is unknown
Given a population with unknown standard deviation. If mean calculated from sample of size n and standard deviation is and s
respectively, a α significance level test is as shown below:
criticalt read from
table or use
software
calcultedt Decision
Two tail









cesignifican
H
H
,
:
:
01
00






1,
2
n
t 
 
n
s
x
tcritical


If








1,
2
n
critical tt 
reject H0, where






1,
2
n
t  is the
value of t that leaves
2

of the area under the curve
of the t-distribution with n-1 degrees of freedom to
its right
Left tail 








cesignifican
H
H
,
:
:
01
00
-  1, nt 
 
n
s
x
tcritical


If  1,  ncritical tt  reject H0, where t is the value of z
that leaves α of the area under the curve of the t-
distribution with n-1 degrees of freedom to its right
Right 








cesignifican
H
H
,
:
:
01
00
 1, nt 
 
n
s
x
tcritical


If  1,  ncritical tt  reject H0, where  1, nt  is the value
of t that leaves α of the area under the curve of the
t-distribution with n-1 degrees of freedom to its
right, otherwise accept H0
x

1.4.1. Hypothesis testing about population proportion when np ≥5
Given a population, if proportion of the population with certain characteristic is π and If proportion of the population with particular
characteristic from a sample of size n is p, a hypothesis test with regard to the population proportion based on sample proportion with
significance level α is as shown below:
criticalz read
from table or
use software
calcultedz Decision
Two tail









cesignifican
pH
pH
,
:
:
1
0
2
z
 
n
p
zcritical
)1( 




If
2
zzcritical  reject H0, where
2
z is the value of z
that leaves
2

of the area under the curve of the z-
Left tail 








cesignifican
pH
pH
,
:
:
1
0
- z
 
n
p
zcritical
)1( 




If zzcritical  reject H0, where z is the value of z
Right 








cesignifican
pH
pH
,
:
:
1
0
z
 
n
p
zcritical
)1( 




If zzcritical  reject H0, where z is the value of z
distribution to its right, otherwise accept H0
x

1.4.1. Hypothesis testing about population Variance
Given a population, if variance of the population is σ2
and if variance of the sample data with size n is s2
, a hypothesis test with regard to
the population variance(α2
) based on sample variance (s2
) with significance level α is as shown below:
Test type Hypothesis criticalz read from
table or use software
calcultedz Decision
Two tail 








cesignifican
H
H
,
:
:
2
0
2
1
2
0
2
0












 1,
2
1
2
1,
2
2
, nn

 2
0
2
2 )1(


sn
critical


If 











  1,
2
1
22
1,
2
22
or, nn criticalcritical


reject H0, where 





1,
2
2
n

 is the value of
2
 that leaves
2

of the area under the curve of
the 2
 -distribution with n-1 degrees of freedom
to its right and 





 1,
2
1
2
n


is the value of
2
 that leaves
2
1

 of the area under the curve
of the 2
 -distribution with n-1 degrees of
freedom to its right
Left tail









cesignifican
H
H
,
:
:
2
0
2
1
2
0
2
0
 1,1
2
 n 2
0
2
2 )1(


sn
critical


If 





 1,
2
1
22
ncritical

 reject H0, wher 





 1,
2
1
2
n


is the value of 2
 that leaves
2
1

 of the area
under the curve of the 2
 -distribution with n-1
degrees of freedom to its right
x

Right tail









cesignifican
H
H
,
:
:
2
0
2
1
2
0
2
0
 1,
2
n 2
0
2
2 )1(


sn
critical


If 





 1,
2
22
ncritical

 reject H0, where 





1,
2
2
n

 is
the value of
2
 that leaves
2

of the area under the curve of
the 2
 -distribution with n-1 degrees of freedom
to its right

Review on probability distributions, estimation and hypothesis testing

More Related Content

What's hot (20)

Similar to Review on probability distributions, estimation and hypothesis testing (20)

Recently uploaded (20)

Review on probability distributions, estimation and hypothesis testing