2. CONFIDENCE INTERVALS
• An interval estimate of a parameter is an interval or a range of
values used to estimate the parameter.This estimate may or may
not contain the value of the parameter being estimated.
3. INTRODUCTION
Stress and the College Student
A recent poll conducted by the mtvU/Associated Press found that
85% of college students reported that they experience stress daily.
The study said,“It is clear that being stressed is a fact of life on college
campuses today.”
The study also reports that 74% of students’ stress comes from
school work, 71% from grades, and 62% from financial woes.The
report stated that 2240 undergraduate students were selected and
that the poll has a margin of error of 3.0%.
4. CONFIDENCE INTERVALS
• The confidence level of an interval estimate of a
parameter is the probability that the interval estimate will
contain the parameter, assuming that a large number of
samples are selected and that the estimation process on the
same parameter is repeated.
• A confidence interval is a specific interval estimate of a
parameter determined by using data obtained from a
sample and by using the specific confidence level of the
estimate.
7. MARGIN OF ERROR
• The margin of error,
also called the
maximum error of the
estimate, is the
maximum likely
difference between the
point estimate of a
parameter and the
actual value of the
parameter.
8. ASSUMPTIONS FOR FINDING A CONFIDENCE INTERVAL
FOR A MEAN WHEN σ IS KNOWN
1. The sample is a random sample.
2. Either n ≥ 30 or the population is normally distributed when n <
30.
9. ROUNDING RULE FOR A CONFIDENCE INTERVAL FOR
A MEAN
• When you are computing a confidence interval for a population mean by using
raw data, round off to one more decimal place than the number of decimal
places in the original data.
• When you are computing a confidence interval for a population mean by using
a sample mean and a standard deviation, round off to the same number of
decimal places as given for the mean.
10. SAMPLE PROBLEM: DAYS IT TAKES TO
SELL A CAMARO
• A researcher wishes to estimate the number of days it takes an automobile
dealer to sell a Chevrolet Camaro. A random sample of 50 cars had a mean
time on the dealer’s lot of 54 days.Assume the population standard deviation
to be 6.0 days. Find the best point estimate of the population mean and the
95% confidence interval of the population mean.
• Source: Based on information obtained from Power Information Network.
Ans: 52.3 <
11. SAMPLE PROBLEM: NUMBER OF
CUSTOMERS
• A large department store found that it averages 362 customers per
hour.Assume that the standard deviation is 29.6 and a random
sample of 40 hours was used to determine the average. Find the
99% confidence interval of the population mean.
Ans: 350 < < 374
15. SAMPLE PROBLEM
• The following data represent a random sample of the assets (in
millions of dollars) of 30 credit unions in southwestern Pennsylvania.
Assume the population standard deviation is 14.405. Find the 90%
confidence interval of the mean.
12.23 16.56 4.39 2.89 1.24 2.17
13.19 9.16 1.42 73.25 1.91 14.64
11.59 6.69 1.06 8.74 3.17 18.13
7.92 4.78 16.85 40.22 2.42 21.58
5.01 1.47 12.24 2.27 12.77 2.76
Ans: 6.752 < < 15.43
17. CHARACTERISTICS OF THE t - DISTRIBUTION
• The t distribution shares some characteristics of the
standard normal distribution and differs from it in others.
The t distribution is similar to the standard normal
distribution in these ways:
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are
located at the center of the distribution.
4. The curve approaches but never touches the x axis.
18. CHARACTERISTICS OF THE t - DISTRIBUTION
• The t distribution differs from the standard normal
distribution in the following ways:
1. The variance is greater than 1.
2. The t distribution is actually a family of curves based on
the concept of degrees of freedom, which is related to
sample size.
3. As the sample size increases, the t distribution approaches
the standard normal distribution.
20. DEGREES OF FREEDOM
• The degrees of freedom are the number of values that
are free to vary after a sample statistic has been computed,
and they tell the researcher which specific curve to use
when a distribution consists of a family of curves.
22. ASSUMPTIONS FOR FINDING A CONFIDENCE
INTERVAL FOR A MEAN WHEN σ IS UNKNOWN
1. The sample is a random sample.
2. Either n ≥ 30 or the population is normally distributed when n <
30.
23. SAMPLE PROBLEM: INFANT GROWTH
• A random sample of 10 children found that their average growth
for the first year was 9.8 inches.Assume the variable is normally
distributed and the sample standard deviation is 0.96 inch. Find the
95% confidence interval of the population mean for growth during
the first year.
24. SAMPLE PROBLEM: HOME FIRES
STARTED BY CANDLES
• The data represent a random sample of the number of home fires
started by candles for the past several years. (Data are from the
National Fire Protection Association.) Find the 99% confidence
interval for the mean number of home fires started by candles each
year.
5460 5900 6090 6310 7160 8440 9930
28. SAMPLE PROBLEM: COVERING COLLEGE COSTS
• A survey conducted by Sallie Mae and Gallup of 1404 respondents
found that 323 students paid for their education by student loans.
Find the 90% confidence interval of the true proportion of students
who paid for their education by student loans.
29. SAMPLE PROBLEM: LAWN WEEDS
•A survey of 1898 adults with lawns conducted by
Harris Interactive Poll found that 45% of the adults
said that dandelions were the toughest weeds to
control in their yards. Find the 95% confidence
interval of the true proportion who said that
dandelions were the toughest weeds to control in
their yards.
35. SAMPLE PROBLEM: NICOTINE CONTENT
• Find the 95% confidence interval for the variance and
standard deviation of the nicotine content of cigarettes
manufactured if a random sample of 20 cigarettes has a
standard deviation of 1.6 milligrams.Assume the variable is
normally distributed.
36. SAMPLE PROBLEM: NAMED STORMS
• Find the 90% confidence interval for the variance and standard
deviation for the number of named storms per year in the Atlantic
basin.A random sample of 10 years has been used.Assume the
distribution is approximately normal.
10 5 12 11 13
15 19 18 14 16
38. PREDICTION INTERVALS
• Used to predict the possible value of a future observation
• Example: In quality control, an engineer may need to use the
observed data to predict a new observation.
39. Prediction Interval for Future Observation
PREDICTION INTERVALS
The prediction interval for Xn+1 will always be longer than the confidence interval for .
40
40. EGR 252 Ch. 9 Lecture1 MDH 2015 9th edition Slide 41
PREDICTION INTERVAL
• For a normal distribution of unknown mean μ, and standard deviation σ, a
100(1-α)% prediction interval of a future observation, x0 is
if σ is known, and
if σ is unknown
n
z
X
x
n
z
X
1
1
1
1 2
/
0
2
/
n
s
t
X
x
n
s
t
X n
n
1
1
1
1 1
,
2
/
0
1
,
2
/
42. PREDICTION INTERVALS
Consider the tensile adhesion tests on 22 specimens of U-700 alloy.
The load failure for the samples was observed and it was found that
the mean is 13.71 and the standard deviation is 3.55.We plan to test
a twenty third specimen. Find the load failure for this specimen at
95% prediction interval.
43
43. EXAMPLE 12
• Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot
dogs (“Sensory and Mechanical Assessment of the Quality of Frankfurters,” J. ofTexture Studies,
1990: 395–409):
• Find the fat content of the 17th
sample at 90% prediction level.
45. TOLERANCE LIMITS (INTERVALS)
• What if you want to be 95% sure that the interval contains 95% of the values? Or 90% sure
that the interval contains 99% of the values?
• These questions are answered by a tolerance interval.To compute, or understand, a
tolerance interval you have to specify two different percentages. One expresses how sure
you want to be, and the other expresses what fraction of the values the interval will
contain.
EGR 252 Ch. 9 Lecture1 MDH 2015 9th edition Slide 46
47. EGR 252 Ch. 9 Lecture1 MDH 2015 9th edition Slide 48
9.7: TOLERANCE LIMITS
• For a normal distribution of unknown mean μ, and unknown standard
deviation σ, tolerance limits are given by
x + ks
where k is determined so that one can assert with 100(1-γ)%
confidence that the given limits contain at least the proportion 1-
α of the measurements.
• Table A.7 (page 745) gives values of k for (1-α) = 0.9,
0.95, or 0.99 and γ = 0.05 or 0.01 for selected
values of n.
48. TOLERANCE LIMITS
• How to determine 100(1-γ)% and 1-α.
For a sample size of 8, find the tolerance interval that gives two-sided 95%
bounds on 90% of the distribution or population. X is 15.6 and s is 1.4
From table on pg. 745, find the corresponding value:
n = 8, g = .05, a = 0.1 corresponding k…k = 3.136
x + ks = 15.6 + (3.136)(1.4)
Tolerance interval 19.99 – 11.21
We are 95% confident that 90% of the population falls within the limits
of 11.21 and 19.99
EGR 252 Ch. 9 Lecture1 MDH 2015 9th edition Slide 49
1-g (boundary or the limits) 1-a (proportion of the distribution)
49. EGR 252 Ch. 9 Lecture1 MDH 2015 9th edition Slide 50
CASE STUDY 9.1C (PAGE 281)
• Find the 99% tolerance limits that will contain 95% of
the metal pieces produced by the machine, given a
sample mean diameter of 1.0056 cm and a sample
standard deviation of 0.0246.
• Table A.7 (page 745)
– (1 - α ) = 0.95
– (1 – Ƴ ) = 0.99
– n = 9
– k = 4.550
– x ± ks = 1.0056 ± (4.550) (0.0246)
• We can assert with 99% confidence that the
tolerance interval from 0.894 to 1.117 cm will contain
95% of the metal pieces produced by the machine.
51. TOLERANCE INTERVALS
• Consider a population of automobiles of a certain type, and suppose that under specified
conditions, fuel efficiency (mpg) has a normal distribution with = 30 and = 2.
Then since the interval from –1.645 to 1.645 captures 90% of the area under the z curve, 90%
of all these automobiles will have fuel efficiency values between – 1.645 = 26.71 and +
1.645 = 33.29.
But what if the values of and are not known? We can take a sample of size n, determine
the fuel efficiencies, and s, and form the interval whose lower limit is – 1.645s and whose
upper limit is + 1.645s.
52. TOLERANCE INTERVALS
• However, because of sampling variability in the estimates of and , there is a good chance
that the resulting interval will include less than 90% of the population values.
Intuitively, to have an a priori 95% chance of the resulting interval including at least 90% of the
population values, when and s are used in place of and we should also replace 1.645 by
some larger number.
For example, when n = 20, the value 2.310 is such that we can be 95% confident that the
interval 2.310s will include at least 90% of the fuel efficiency values in the population.
53. TOLERANCE INTERVALS
• Let k be a number between 0 and 100.A tolerance interval for capturing at least k% of the
values in a normal population distribution with a confidence level 95% has the form
• (tolerance critical value) s
•
Tolerance critical values for k = 90, 95, and 99 in combination with various sample sizes are
given in Appendix Table A.6.This table also includes critical values for a confidence level of 99%
(these values are larger than the corresponding 95% values).
54. TOLERANCE INTERVALS
• Replacing by + gives an upper tolerance bound, and using – in place of results in a lower
tolerance bound. Critical values for obtaining these one-sided bounds also appear in Appendix
Table A.6.
55. EXAMPLE 14
• As part of a larger project to study the behavior of stressed-skin panels, a structural
component being used extensively in North America, the article “Time-Dependent Bending
Properties of Lumber” (J. ofTesting and Eval., 1996: 187–193) reported on various mechanical
properties of Scotch pine lumber specimens.
• Consider the following observations on modulus of elasticity (MPa) obtained 1 minute after
loading in a certain configuration:
56. EXAMPLE 14
• There is a pronounced linear pattern in a normal probability plot of the data. Relevant summary
quantities are n = 16,
= 14,532.5, s = 2055.67. For a confidence level of 95%, a two-sided tolerance interval for capturing
at least 95% of the modulus of elasticity values for specimens of lumber in the population sampled
uses the tolerance critical value of 2.903.
• The resulting interval is
• 14,532.5 (2.903)(2055.67) = 14,532.5 5967.6
• = (8,564.9, 20,500.1)
cont’d
57. EXAMPLE 14
• We can be highly confident that at least 95% of all lumber specimens have modulus of
elasticity values between 8,564.9 and 20,500.1.
• The 95% CI for is (13,437.3, 15,627.7), and the 95% prediction interval for the modulus of
elasticity of a single lumber specimen is (10,017.0, 19,048.0).
•
Both the prediction interval and the tolerance interval are substantially wider than the
confidence interval.
cont’d
Editor's Notes
#1:Thus the
interval estimate indicates, by its length, the accuracy of the point estimate.
#2:In an interval estimate, the parameter is specified as being between two values. For
example, an interval estimate for the average age of all students might be 21.9 m
22.7, or 22.3 0.4 years.
Either the interval contains the parameter or it does not. A degree of confidence (usually
a percent) must be assigned before an interval estimate is made. For instance, you
may wish to be 95% confident that the interval contains the true population mean.
Another question then arises. Why 95%? Why not 99 or 99.5%?
If you desire to be more confident, such as 99 or 99.5% confident, then you must
make the interval larger. For example, a 99% confidence interval for the mean age of
college students might be 21.7 m 22.9, or 22.3 0.6. Hence, a tradeoff occurs. To
be more confident that the interval contains the true population mean, you must make the
interval wider.
#4:Intervals constructed in this way are called confidence intervals. Three common confidence
intervals are used: the 90%, the 95%, and the 99% confidence intervals. The wider the confidence interval is, the more
confident we can be that the given interval contains the unknown parameter.
Of
course, it is better to be 95% confident that the average life of a certain television
transistor is between 6 and 7 years than to be 99% confident that it is between 3
and 10 years. Ideally, we prefer a short interval with a high degree of confidence.
#6:The term za2(s ) is called the margin of error (also called the maximum error
of the estimate). For a specific value, say, a 0.05, 95% of the sample means will fall
within this error value on either side of the population mean, as previously explained.
See Figure 7–1.
When n 30, s can be substituted for s, but a different distribution is used.
#10:Hence, one can say with 95% confidence that the interval between 52.3 and 55.7 days
does contain the population mean, based on a sample of 50 automobiles.
#11:Hence, one can be 99% confident (rounded values) that the mean number of customers that
the store averages is between 350 and 374 customers per hour.
#12:Anotherway of looking at a confidence interval is shown in Figure 7–2.According to the
central limit theorem, approximately 95% of the sample means fall within 1.96 standard
deviations of the population mean if the sample size is 30 or more, or if s is known when n
is less than 30 and the population is normally distributed. If it were possible to build a confidence
interval about each sample mean, as was done in Examples 7–1 and 7–2 for m, then
95% of these intervals would contain the population mean, as shown in Figure 7–3. Hence,
you can be 95% confident that an interval built around a specific sample meanwould contain
the population mean. If you desire to be 99% confident, you must enlarge the confidence intervals
so that 99 out of every 100 intervals contain the population mean.
#14:Since other confidence intervals (besides 90, 95, and 99%) are sometimes used in
statistics, an explanation of how to find the values for za2 is necessary. As stated previously,
the Greek letter a represents the total of the areas in both tails of the normal distribution.
The value for a is found by subtracting the decimal equivalent for the desired confidence
level from 1. For example, if you wanted to find the 98% confidence interval, you would
change 98% to 0.98 and find a 1 0.98, or 0.02. Then a2 is obtained by dividing a by
2. So a2 is 0.022, or 0.01. Finally, z0.01 is the z value that will give an area of 0.01 in the
right tail of the standard normal distribution curve. See Figure 7–4.
Once a2 is determined, the corresponding za2 value can be found by using the procedure
shown in Chapter 6, which is reviewed here. To get the za2 value for a 98% confidence interval, subtract 0.01 from 1.0000 to get 0.9900. Next, locate the area that is
closest to 0.9900 (in this case, 0.9901) in Table E, and then find the corresponding z value.
In this example, it is 2.33. See Figure 7–5.
For confidence intervals, only the positive z value is used in the formula.
When the original variable is normally distributed and s is known, the standard normal
distribution can be used to find confidence intervals regardless of the size of the sample.
When n 30, the distribution of means will be approximately normal even if the
original distribution of the variable departs from normality.
When s is unknown, s can be used as an estimate of s, but a different distribution is
used for the critical values. This method is explained in Section 7–2.
#16:When s is known and the sample size is 30 or more, or the population is normally distributed
if the sample size is less than 30, the confidence interval for the mean can be found by
using the z distribution, as shown in Section 7–1. However, most of the time, the value of
is not known, so it must be estimated by using s, namely, the standard deviation of the
sample. When s is used, especially when the sample size is small, critical values greater
than the values for are used in confidence intervals in order to keep the interval at a
given level, such as the 95%. These values are taken from the Student t distribution, most
often called the t distribution.
To use this method, the samples must be simple random samples, and the population
from which the samples were taken must be normally or approximately normally distributed,
or the sample size must be 30 or more.
#20:For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary. But
once 4 values are selected, the fifth value must be a specific number to get a sum of 50,
since 50 5 10. Hence, the degrees of freedom are 5 1 4, and this value tells the
researcher which t curve to use.
The symbol d.f. will be used for degrees of freedom. The degrees of freedom for a
confidence interval for the mean are found by subtracting 1 from the sample size. That is,
d.f. n 1. Note: For some statistical tests used later in this book, the degrees of
freedom are not equal to n - 1.
#21:When d.f. is greater than 30, it may fall between two table values. For example, if
d.f. it falls between 65 and 70. Many texts say to use the closest value, for
example, 68 is closer to 70 than 65; however, in this text a conservative approach is
used. In this case, always round down to the nearest table value. In this case, 68 rounds
down to 65.
Note: At the bottom of Table F where d.f. is large or , the za2 values can be found
for specific confidence intervals. The reason is that as the degrees of freedom increase,
the t distribution approaches the standard normal distribution.
#26:Students sometimes have difficulty deciding whether to use za2 or ta2 values when
finding confidence intervals for the mean. As stated previously, when s is known,
za2 values can be used no matter what the sample size is, as long as the variable is normally
distributed or n30. When s is unknown and n30, then s can be used in the formula
and ta2 values can be used. Finally, when s is unknown and n 30, s is used in the
formula and ta2 values are used, as long as the variable is approximately normally
distributed. These rules are summarized in Figure 7–8.
#27:One of the most common types of confidence intervals is one that uses proportions. Many
statistical studies involve finding a proportion of the population that has a certain characteristic.
In this section, you will learn how to find the confidence interval for a population.
A USA TODAY Snapshots feature stated that 12% of the pleasure boats in the United
States were named Serenity. The parameter 12% is called a proportion. It means that of all
the pleasure boats in the United States, 12 out of every 100 are named Serenity. A proportion
represents a part of a whole. It can be expressed as a fraction, decimal, or percentage.
In this case, 12% 0.12 or . Proportions can also represent probabilities. In this
case, if a pleasure boat is selected at random, the probability that it is called Serenity is 0.12.
Proportions can be obtained from samples or populations. The following symbols
will be used.
For example, in a study, 200 people were asked if they were satisfied with their jobs
or professions; 162 said that they were. In this case, n 200, X 162, and ˆp Xn
162200 0.81. It can be said that for this sample, 0.81, or 81%, of those surveyed were
satisfied with their jobs or professions. The sample proportion is 0.81.
The proportion of people who did not respond favorably when asked if they were
satisfied with their jobs or professions constituted ˆ q, where ˆq (n X)n. For this survey,
ˆq (200 162)200 38200, or 0.19, or 19%.
When ˆp and ˆq are given in decimals or fractions, ˆp ˆq 1. When ˆp and ˆq are given
in percentages, ˆp ˆq 100%. It follows, then, that ˆq 1 ˆ p, or ˆp 1 ˆ q, when ˆp and
ˆq are in decimal or fraction form. For the sample survey on job satisfaction, ˆq can also be
found by using ˆq 1 ˆ p, or 1 0.81 0.19.
#28:Rounding Rule for a Confidence Interval for a Proportion Round off to three
decimal places.
#31:In Sections 7–1 through 7–3 confidence intervals were calculated for means and proportions.
This section will explain how to find confidence intervals for variances and standard
deviations. In statistics, the variance and standard deviation of a variable are as
important as the mean. For example, when products that fit together (such as pipes) are
manufactured, it is important to keep the variations of the diameters of the products as
small as possible; otherwise, they will not fit together properly and will have to be
scrapped. In the manufacture of medicines, the variance and standard deviation of the
medication in the pills play an important role in making sure patients receive the proper
dosage. For these reasons, confidence intervals for variances and standard deviations are
necessary.
#35:Recall that s2 is the symbol for the sample variance and s is the symbol for the sample
standard deviation. If the problem gives the sample standard deviation s, be sure to square
it when you are using the formula. But if the problem gives the sample variance s2, do not
square it when you are using the formula, since the variance is already in square units.
Rounding Rule for a Confidence Interval for a Variance or Standard Deviation
When you are computing a confidence interval for a population variance or standard
deviation by using raw data, round off to one more decimal place than the number of decimal
places in the original data.
When you are computing a confidence interval for a population variance or standard
deviation by using a sample variance or standard deviation, round off to the same number
of decimal places as given for the sample variance or standard deviation.
#38:Many practical problems are phrased in terms of individual measurements
rather than parameters of distributions. We take two such examples. The
first one will require, what is to be called a prediction interval, instead of a
confidence interval.
A consumer is considering buying a car. Then this person should
be far more interested in knowing whether a full tank on a particu-
lar automobile will suffice to carry her/him the 500 kms to her/his
destination than in learning that there is a 95% confidence inter-
val for the mean mileage of the model, which is possible to use
to project the average or total gasoline consumption for the ma-
nufactured fleet of such cars over their first 5000 kilometers of
use.
A different situation appears in the following. This will require, what is to
be called a tolerance interval, instead of a confidence interval.
A design engineer is charged with the problem of determining how
large a tank the car model really needs to guarantee that 99% of
the cars produced will have a cruising range of 500 kilometers.
1
What the engineer really needs is a tolerance interval for a fraction
of 100 × = 99% mileages of such automobiles.
In many applications, the objective is to predict a single value of a variable to be observed at some future time, rather than to estimate the mean value of that variable.
The interval containing the next single response.
What if you want to make a claim about the resistance of a future cable, or the
average resistance of a group of cables that you are going to manufacture in the
future? The confidence intervals for the mean and standard deviation (Figure 2) that
you calculated refer to the population of cables manufactured during the month
in which the 40 cables were sampled, not to an individual observation or group
of observations in the future. A prediction interval for a single future observation
resembles a confidence interval for the mean, but it is wider because it takes into
account the prediction noise by adding a 1 to the expression inside the square root:
#45:Tolerance Interval
The interval which contains at least a given proportion of the population.
A tolerance interval is an enclosure interval for a specified proportion of the sampled
population, not its mean or standard deviation. For a specified confidence level,
you may want to determine lower and upper bounds such that 99 percent of
the population is contained within them. Tolerance bounds allow you to set up
specification limits by finding the lower and upper values, which correspond to stated
yield or process capability goals.
#46:a (1-α) proportion of the measurements can be estimated with 100( 1-Ƴ )% confidence